An Improved DFA for Fast Regular Expression Matching

By: 
Domenico Ficara, Stefano Giordano, Gregorio Procissi, Fabio Vitucci, Gianni Antichi, and Andrea Di Pietro
Appears in: 
CCR October 2008

Modern network devices need to perform deep packet inspection at high speed for security and application-specific services. Finite Automata (FAs) are used to implement regular expressions matching, but they require a large amount of memory. Many recent works have proposed improvements to address this issue. This paper presents a new representation for deterministic nite automata (orthogonal to previous solutions), called Delta Finite Automata (dFA), which considerably reduces states and transitions and requires a transition per character only, thus allowing fast matching. Moreover, a new state encoding scheme is proposed and the comprehensive algorithm is tested for use in the packet classification area.

Public Review By: 
Chadi Barakat

Traffic classification and application identification are important functions in network administration. Traditionally, this was always done by using the port number in the transport header, but with the emerging of new applications using non standard port numbers, there is a need for another more general method. And even for legacy applications, there is a need to control the traffic to check whether it contains any harmful code as a virus or a worm. The most common approach to counter these two problems is to rely on deep packet inspection. Each undesirable (or even desirable) event is characterized by some series of characters, then the payload of packets is browsed for these specific strings seen as fingerprints of events. This method is largely used by today intrusion detection systems. The main problem with it is in the cost of browsing the memory to interpret the content of the payload and to check whether it contains some specific fingerprints or not. This is made more complex by the large number of fingerprints that exist.
To reduce the memory usage and memory access rate, it is often proposed to structure the memory in a tree form where each character in the payload triggers a transition on the tree. The technique doing that, called DFA, is known to improve memory storage and memory speed. The authors in this paper pushes this memory structuring issue to another limit by adding more levels of aggregation of states and transitions so that the memory is not accessed for each character. A local fast memory is used as a cache to store neighboring states. This new idea has been welcomed by all reviewers and recognized to be original and important in this context. The validation part has been also appreciated even though some factors were not considered as the CPU cost and the parallel memory access that one can use. It is clear that the idea will profit from a validation in a more general setting. For now, the paper has the merit of presenting a novel idea in this area with a good coverage of the state of the art.