Scalable Lookahead Regular Expression Detection System for Deep Packet Inspection

Size: px

Start display at page:

Download "Scalable Lookahead Regular Expression Detection System for Deep Packet Inspection"

Violet Jackson
5 years ago
Views:

1 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE Scalable Lookahead Regular Expression Detection System for Deep Packet Inspection Masanori Bando, Associate Member, IEEE, N. Sertac Artan, and H. Jonathan Chao, Fellow, IEEE Abstract Regular expressions (RegExes) are widely used, yet their inherent complexity often limits the total number of RegExes that can be detected using a single chip for a reasonable throughput. This limit on the number of RegExes impairs the scalability of today s RegEx detection systems. The scalability of existing schemes is generally limited by the traditional detection paradigm based on per-character-state processing and state transition detection. The main focus of existing schemes is on optimizing the number of states and the required transitions, but not on optimizing the suboptimal character-based detection method. Furthermore, the potential benefits of allowing out-of-sequence detection, instead of detecting components of a RegEx in the order of appearance, have not been explored. Lastly, the existing schemes do not provide ways to adapt to the evolving RegExes. In this paper, we propose Lookahead Finite Automata (LaFA) to perform scalable RegEx detection. LaFA requires less memory due to these three contributions: 1) providing specialized and optimized detection modules to increase resource utilization; 2) systematically reordering the RegEx detection sequence to reduce the number of concurrent operations; 3) sharing states among automata for different RegExes to reduce resource requirements. Here, we demonstrate that LaFA requires an order of magnitude less memory compared to today s state-of-the-art RegEx detection systems. Using LaFA, a single-commodity field programmable gate array (FPGA) chip can accommodate up to (25 k) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimate that a 34-Gb/s throughput can be achieved. Index Terms Deep packet inspection, DPI, LaFA, Lookahead Finite Automata, network intrusion detection and prevention system, NIDPS, regular expressions. I. INTRODUCTION R EGULAR expressions (RegExes) are used to flexibly represent complex string patterns in many applications ranging from network intrusion detection and prevention systems (NIDPSs) [1], [2] to compilers [3] and DNA multiple sequence alignment [4], [5]. In particular, NIDPSs Bro [1] and Snort [2] and Linux Application Level Packet Classifier (L7 filter) [6] use RegExes to represent attack signatures or packet classifiers. A RegEx detection system for NIDPS for high-speed Manuscript received February 25, 2010; revised January 08, 2011 and July 22, 2011; accepted August 11, 2011; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor T. Wolf. Date of publication January 18, 2012; date of current version June 12, This work is an extended version of papers presented at the ACM/IEEE Symposium on Architectures for Networking and Communications System (ANCS), Princeton, NJ, October 19 20, 2009, and the Reconfigurable Architecture Workshop (RAW), Atlanta, GA, April 19 20, The authors are with the Department of Electrical and Computer Engineering, Polytechnic Institute of New York University, Brooklyn, NY USA ( mbando01@students.poly.edu; sartan@poly.edu; chao@poly.edu). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TNET networks should satisfy the following two requirements: scalability and high throughput. A scalable detection system can accommodate more policies (rules) that increase flexibility of the NIDPS, which also need to support ever increasing traffic demand. The number of RegExes in the Snort signature set increasedfrom1131regexesin2006to2290in2009,and the typical RegEx length keeps growing [2]. Recently, major Internet carriers and vendors succeeded in experimenting with 100-Gb/s equipment. In September 2008, Verizon successfully performed a 100-Gb/s transmission for more than 646 mi [7]. Cisco s global IP traffic forecast report states that global IP traffic will double nearly every two years by the end of 2012, and thus annual global IP traffic will exceed half a zettabyte B in 2013 [8]. To the best of our knowledge, no current state-of-the-art regular expression detection system is capable of supporting such a large RegEx set, and even larger RegEx sets are expected in the future with high speed demand. The proposed scheme [Lookahead Finite Automata (LaFA)] [9] can support the current and expected future RegEx sets with small memory requirements. Detection at line speed is another important requirement of NIDPS. Our compact data structure allows using very high-speed on-chip memory even for large RegEx sets, leading to the higher throughput required for today s and tomorrow s line speeds. Finite automata (FA) are the de facto tools to address the RegEx detection problem. ForRegExdetectiononaninput, the FA starts at an initial state. Then, for each character in the input, the FA makes a transition to the next state, which is determined by the previous state and the current input character. If the resulting state is unique, the FA is called a deterministic finite automaton (DFA); otherwise, it is called a nondeterministic finite automaton (NFA) [10]. For the tradeoff between performance and resource usage, NFA and DFA represent two extreme cases. DFA has constant time complexity since it guarantees only one state transition per character by definition. The constant time complexity per character allows DFA to achieve high-speed RegEx detection, which makes DFA the preferred approach. This is especially so for software solutions since DFA can be serially executed quickly on commodity CPUs. NFA, on the other hand, requires massive parallelism, making it harder to implement in software. NFA allows multiple simultaneous state transitions, leading to a higher time complexity. The price paid for the high speed of DFA is its prohibitively large memory requirement. Recent applications, most notably Deep Packet Inspection for NIDPS, have emerged that require RegEx detection scalable to high quantities of complex RegExes. This need for scalability makes the DFA impractical for such applications due to its large memory requirement. Since NFA is not fast, RegEx detection at /$ IEEE

2 700 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Fig. 1. Example illustrating the transformation from a RegEx set into the corresponding LaFA. (a) RegEx set. (b) NFA corresponding to. (c) Separation of simple strings. (d) Reordering of the detection sequence. (e) Sharing of complex detection modules, which leads to the LaFA corresponding to.(f)lafa representation of the RegExes. high speeds scalable to large sets of complex RegExes is still an open issue. Additionally, as attacks get more sophisticated, the need for more structures such as back references, lookahead assertions, lookback assertions, andconditional subpatterns is being added to the RegExes. Traditional approaches do not have mechanisms to seamlessly integrate such extensions to RegExes. We make three observations in this paper that are common to most of today s state-of-the-art RegEx detection systems and, we believe, that limit the scalability of these systems. First of all, RegExes consist of a variety of different components such as character classes or repetitions [11]. Due to this variety, it is hard to identify a method that is efficient for concurrently detecting all of these different components of a RegEx. However, in most cases today, these heterogeneous components are detected by a state machine that consists of homogeneous states. This leads to inefficiency, and as a result, the scalability of such systems is poor. This situation worsens as RegExes evolve with new extensions, while the detection continues to be homogeneous. Second, the order of components in a RegEx is preserved in the state machine detecting this RegEx. However, it may be beneficial to change the order of the detection of components in a RegEx. For instance, it is easy to detect exact strings (simple strings), as they are expected to appear less frequently, whereas others may appear more frequently (for instance, character classes). By reordering the detection of the components, the tradeoff between detection complexity and appearance frequency can be exploited for better scalability. Third, most RegExes share similar components. In the traditional FA approaches, a small state machine is used to detect a component in a RegEx. This state machine is duplicated since the similar component may appear multiple times in different RegExes. Furthermore, most of the time, the RegExes sharing this component cannot appear at the same time in the input. As a result, the repetition of the same state machine for different RegExes introduces redundancy and limits the scalability of the RegEx detection system. Based on the above three observations, we introduce LaFA, a novel detection method that resolves scalability issues of the current RegEx detection paradigm. The scalable and compact LaFA data structure requires a small memory footprint and makes it feasible to implement large RegEx sets using only very fast, small on-chip memory. The rest of this paper is organized as follows. An overview of LaFA is given in Section II. Section III describes the LaFA data structure, and Section IV describes the LaFA architecture. Section V is a discussion on various optimizations. Hardware implementation of LaFA is described in Section VI. Section VII evaluates the performance of our architecture and presents simulation results. Section VIII discusses related work. We conclude the paper in Section IX. II. LaFA LaFA is finite automata optimized for scalable RegEx detection. LaFA is used for representing a set of RegExes,, which can also be called a RegEx database. For instance, this set of RegExes can be a set of Snort signatures, or Bro, or Linux Layer 7 rules. An associated LaFA RegEx detection system can be queried with an input,suchas a network packet. The system will return amatchalongwitha list of matching RegExes if input includes one or more of the RegExes in.otherwise,ano match is returned. In this section, we illustrate the important features of LaFA with the help of a warmup example. A. Warmup Example Let us consider the RegEx set with three RegExes,,,and,alsoidentified as RegEx1, RegEx2, and RegEx3 as showninfig.1(a).thenfarepresentationof is shown in Fig. 1(b). 1) Separation of Simple String Matching: A simple string is a fixed sequence of characters such as and inregex1. In practice, it can be, for instance, a word from the dictionary or an excerpt from an executable file. Consider the state diagram of NFA in Fig. 1(b). A separate state is defined for each character in simple strings. Although a per-character state transition looks simple, it may require many state transitions per character when multiple RegExes are to be matched against the RegEx database. Consider the input characters. The stateofregex1 is matched first, followed by the states of both RegEx1 and RegEx2. Then, as the input character arrives,the states of all FAs are matched. All three FAs concurrently transition to their states. The number of concurrent active states for a

3 BANDO et al.: SCALABLE LOOKAHEAD REGULAR EXPRESSION DETECTION SYSTEM FOR DPI 701 database of hundreds of RegExes would be prohibitively high, translating into high resource requirements and low throughput. Note that the detection of simple strings, namely the exact string matching is a well-studied problem with many optimized solutions [12] [18]. In particular, there are high-throughput hardware solutions with very small memory footprints [16] [18]. In this paper, we argue that neither NFA nor DFA is the most efficient method to detect these simple strings. We propose to merge the states of each simple string portion of the automata (,,,, and ) into a single super state. These simple strings can then be detected using high-speed exact string-matching methods. The resulting FA, as shown in Fig. 1(c), has fewer states than the original FA. By generating one detection event per simple string, rather than making many state transitions for each input character as in an NFA or DFA, LaFA significantly reduces the number of operations, as Section VII shows. 2) Reordering the Detection Sequence: By definition, NFA and DFA are constructed based on the order of components in the RegEx. Therefore, the reordering of the sequence of detection is difficult. In Fig. 1(c), each FA consists of three components: 1) a simple string; 2) a character class ;and3)another simple string. Consider matching the example input string against the states in Fig. 1(c). For this particular input, after matching earlier states for,, and, all three FAs go concurrently to their respective super states of for matching. As the input continues to, however, the input matches only to the first RegEx1. Note that, in fact, there is no possible input that can match to more than one RegEx for this example RegEx set. We observe that the FAs for RegEx2 and RegEx3 made the unnecessary detection just to abandon the search at the end when neither or is found in the input. In LaFA, we propose to reorder the detection sequence as shown in Fig. 1(d). Here, the third component in each RegEx, the simple string, is swapped with the second component, the character class. Therefore, for RegEx1, onlyifboth and are detected in order (not necessarily consecutively), the detection is performed. In this way, for our example input of, exactly one detection is performed rather than three concurrent detections, as in the case before reordering. In general, we can classify components in RegExes based on the information revealed about a RegEx if a certain component in that RegEx is found. Consider RegEx1 as an example again. Detecting the string in the input reveals more information than does detecting a range of characters from to. Intuitively, one expects that more restrictive matching components like can occur much less frequently in inputs than less restrictive components like. In this paper, we call components that reveal more information like as deep components, and components that do not reveal much information like as shallow components. There are deep components that are more complex to detect than simple strings. For example, a string repetition element is deep, but requires more complex operations than an exact string matching. One would prefer to detect simple components before the more complex ones so that the probability of requiring evaluation of the latter ones is reduced. Therefore, we Fig. 2. Matrix of RegEx components based on their detection complexity and depth (information revealed about the RegEx). further classify components according to how complex it is to detect those components. Fig. 2 shows a matrix of the depth and the detection complexity of components with examples. LaFA detects simple and deep components ahead of complex and/or shallow ones. Overall detection complexity is reduced by using this method. We call this core notion of LaFA Lookahead in the rest of this paper. Details of the Lookahead operation are presented in Section IV-B. 3) Event-Based Approach: A typical detection operation involving NFA or DFA requires at least one state transition for each input character. As a result, the handling of state transitions consumes a prominent portion of the operation costs. By using our proposed super states, we can reduce the number of state transitions, on average, to less than one per input character using a highly optimized detector explainedinsectioniv.everytime a component such as a simple string is detected, a detection event is generated. These events are then correlated to see if they correspond to one or more RegExes in the RegEx set. The details of this important operation are covered in Section IV. 4) Variable String Detection and Sharing: As can be seen in the previous steps, detection of some components (in our example, the component) needs only to be performed once at any given time. Instead of duplicating those components in multiple RegExes, we can save resources and memory by sharing those rarely used components. The resulting LaFA representation is shown in Fig. 1(e). Furthermore, we also propose special modules that are specifically designed to efficiently detect certain types of variable strings to replace their corresponding states. Details of such modules are given in Section IV-B. 5) LaFA Component-Based Representations: Each unique component is assigned an identification code called a string ID. The ID consists of two parts. The firstpartidentifies the component type either as (simple string) or (variable string). The notation is presented in Fig. 1(f). Simple string components such as and are each assigned unique string IDs from to. Variable string component is assigned as a string ID. Note that we use the same ID for a component everywhere it appears. B. Summary In this section, we have illustrated the LaFA core ideas, which facilitate the reduction of memory requirements and detection complexity, using simple examples. RegExes are first represented in NFA format. Then, the states of simple strings are identified and merged into super states. The components are reordered according to their depth and complexity. Finally, the states of components that can potentially be shared among

4 702 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Fig. 3. Conversion process flow from original RegEx set to LaFA data structure. RegExes are merged. In Section III, we present the details of the LaFA data structure and the associated scalable RegEx detection architecture. III. LaFA DATA STRUCTURE A. Overview of LaFA Data Structure Generation Like other regular expression detection architectures, LaFA has its own data structure. LaFA does not require an intermediate step to construct traditional FA such as NFA or DFA, so even large RegEx sets can be converted into LaFA data structure quickly using an automated program. The flow of conversion from an original RegEx to the final LaFA data structure is illustrated in Fig. 3 and summarized. I) Preparation: To achieve maximum performance (detection speed), LaFA unrolls alternation operations like, which create branches. Unrolling is widely used by compilers to improve performance. The RegEx unrolling is much simpler than the unrolling in compilers. II) Analyzer 1: We analyze the unrolled RegExes, as we see in Fig. 1. In this procedure, we identify components and analyze each RegEx. From this analysis, we also identify sharable components and assign them to the responsible detection modules. The syntax of the RegExes is also verified in this procedure. Detailed classification and functionality of each module are described later in this section. III) Analyzer 2: Basic RegEx analysis is performed in Analyzer 1; deeper analysis is performed in Analyzer 2. We discuss details of this analyzer in Section V. The following two procedures, IV) and V), are performed for hardware implementation, and are discussed in Sections V and VI. After these procedures, the LaFA is ready to detect malicious traffic on hardware [field programmable gate array (FPGA)]. In this section, we describe the first two procedures, which create fundamental LaFA data structure. The procedures are optimization and hardware-implementation-related conversion, so we will describe them in the following sections. B. Preparation When a RegEx contains an alternation operator,itwillbe translated into a branch in an FA. We replace this type of RegEx with two RegExes, where each of the new RegExes contains one branch of the alternation. We call this operation unrolling due to its resemblance to the loop unrolling operation in compilers. Although the number of RegExes increases after unrolling, the resulting FA after unrolling introduces new venues for optimization. In fact, it is possible to simplify the detection operations, reduce system complexity, and reduce resource usage by unrolling. In general, if there are alternations in a RegEx, where Fig. 4. Component reordering for.. each alternation consists of alternatives, this RegEx will expand to RegExes. Yet, in Section VII, we show the unrolling operation does not cause large overhead. C. Analyzer 1 Here, we present the formal construction of the LaFA data structure. Given a set of RegExes, the LaFA data structure is constructed in three steps: 1) RegEx partitioning; 2) component reordering; and 3) node structure construction. The first step of LaFA data structure construction is to partition each RegEx into simple strings and variable strings. This partitioning facilitates the detection of simple strings and variable strings. The variable strings are further classified as explainedinsectioniv-b. The second step in data structure construction is component reordering. Component reordering is introduced in Section II-A.2. We discuss it in further detail here as we introduce a new term node, shown as square boxes in Fig. 4. In the traditional sequence shown in Fig. 4(a), components are activated one by one following the sequence of RegEx. In contrast, the LaFA sequence shown in Fig. 4(b) skips (lookahead) variable string and activates first. LaFA verifies variable strings (e.g., ) using special modules, for which specific information related to individual components is also needed. The location to store this component verification information (square boxes in the figure) is called a node. The final step in the LaFA data structure construction is to identify the contents of the node. As in any other state machine, the LaFA state machine also has basic information in every state. As discussed, LaFA states are associated with nodes, and these nodes contain the verification information specific to each component. The node also contains the current state, which indicates whether the current state is active or inactive. Conventional FAs, such as NFA and DFA, change states constantly with every character received, hence tracking timing is not an issue. In the LaFA state machine, however, the input (event) is generated by a detection block. Thus, detection timing needs to be verified. Because of this characteristic, we store timing information in the node. A component can have variable length as well, and for this reason a node keeps two timing information pieces: minimum time and maximum time. A simple string must be detected within this time range. To summarize node information, each node has the following: 1) a pointer to the next state; 2) the status of the current state; and 3) detection timing values. The

BANDO et al.: SCALABLE LOOKAHEAD REGULAR EXPRESSION DETECTION SYSTEM FOR DPI 703 Fig. 5. LaFA architecture. detailed node structure is described in Section VI.

5 BANDO et al.: SCALABLE LOOKAHEAD REGULAR EXPRESSION DETECTION SYSTEM FOR DPI 703 Fig. 5. LaFA architecture. detailed node structure is described in Section VI. In the following sections, we show why this information is needed and how it is used. IV. LaFA ARCHITECTURE Fig. 5 is a simplified block diagram of the LaFA architecture. The LaFA architecture consists of two blocks, the detection block and the correlation block. The detection block is responsible for detecting the components in the input string. The simple string detector is a highly optimized exact string matching module used to detect simple strings. The variable string detector consists of highly optimized variable string detection modules proposed in this paper. Recall that in Section II, we mentioned a block that tracks states. The correlation block is responsible for this task. The correlation block inspects the sequence of components in the input string (i.e., order and timing) to determine whether the input matches any of the RegExes in its RegEx database. The detectors in the detection block communicate their findings to the correlation block by sending detection events.adetector in the detection block generates a detection event when it detects a component in the input. Each detection event consists of a unique ID for the detected component and its location in the input packet. Extra information, such as the length of the detected string, may accompany the event if necessary. There can be two sources of an event: 1) astring event is generated when a simple string is detected; 2) the in-line event is generated by in-line lookup modules. The in-line event is described in detail in Section IV-B. The generated events prompt a detection sequence in a correlation block to proceed to the detection state. In the rest of this section, the details of individual blocks in the LaFA architecture and their interaction are presented. LaFA is implemented on hardware, and its performance is shown in Section VII. A. Correlation Block Practical RegEx sets include hundreds of RegExes that need to be programmed in the correlation block. Fig. 6 illustrates how multiple RegExes are organized in a correlation block along with node activation and timing verification operations. The correlation block is organized into columns of nodes where each node is similar to a state in a traditional FA, yet the node includes more information than a state. A node in LaFA is also likely to correspond to multiple states in a traditional FA. Each column corresponds to one component, and each column has one node Fig. 6. Example of simple string-based detection operations and timing verifications. The figures are sorted in chronological order. Variable String. (a) At Time 0. (b) At Time 3. (c) At Time 9. (d) At Time. for each RegEx in which this component appears. Once the correlation block receives a detection event for a particular component, all the nodes in the corresponding column are checked to see whether the detection of this RegEx is in progress and which component is expected next. The proposed Lookahead detection flow must satisfy certain timing constraints. Only when the detection of contents with the proper timing is confirmed is the next node activated. To determine this, each node has a static pointer to its next node. In the example case, RegEx1 and RegEx4 contain the same component, simple string.two nodes are examined once is detected. When is detected, as indicated by a set bit in the node, a node transition occurs to the next node corresponding to the next component in the given RegEx. Correlation Detection Operation and Timing: The sequence of input character and arrival time is shown below along with two example RegExes, RegEx1 and RegEx4. WeaddRegEx4 to our original example to demonstrate a more general case. The unit of time is defined as the interval required to receive one character Input Time Initially, the Active RegEx List is as shown in Fig. 6(a). In the figure, nodes are represented as boxes with 0 or 1. 0 symbolizes that the node is inactive, while 1 is the symbol of an active node. A received event is processed only when the target state is active, otherwise (inactive case) the event will be ignored. The first component of every RegEx is always active to start the state transition, otherwise state transition never occurs/ starts. For instance, the simple string isthefirst component in both RegEx1 and RegEx4. Therefore, they are always active to start the detection of these RegExes, once is detected. Upon detecting a simple string, attime3,the simple string detector generates an event and passes it to the correlation block. When the event is received, based on RegEx stored in the node, the next simple string, i.e., of

6 704 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 TABLE I SUBTYPES OF VARIABLE STRINGS the input character at time 4) was a lowercase alphabetical character or not. In the example, since at time 4,, a lowercase letter, appears in the input, RegEx1 is detected Input Time RegEx1 and ofregex4 is activated by setting the bit corresponding to RegEx1 and RegEx4 to active 1 as shown in Fig. 6(b). Next, at time 9, is detected, and the detection time of is verified based on the distance between and. If the timing verification is successful, then as Fig. 6(c) shows, a query is sent to the variable string detection module to verify the RegEx component. The processing time of the detection module is denoted by. Hence, after finishing the processing and successful verification by the detection module, a RegEx match is generated as Fig. 6(d) shows. B. Detection Block The detection block consists of two main types of modules based on the detection approach in the LaFA architecture: buffered lookup modules and in-line lookup modules. This section describes each module in detail. Readers may refer to [19] for further details of the architectures and design considerations. As mentioned previously, RegEx set is partitioned into simple strings and variable strings. Simple strings are detected using simple string detectors. Because of the wide variety of variable strings, we further classify them into eight subtypes as shown in Table I. Each subtype has a qualifier symbol next to the letter V as shown in the first column of Table I. An example of each type is provided in the last column of the table. Each component is also assigned a unique ID, whereas components that are exactly the same are assigned the same ID even if they belong to different RegExes. Next, we discuss how these variable string subtypes can be detected. Buffered Lookup Modules: The buffered lookup modules are core detection modules in the LaFA architecture. The lookahead operation can be realized by these modules. These modules use history-stored in buffers to verify past activity. Next, we explain various detection modules based on the buffered lookup approach. 1) Timestamps Lookup Module (TLM): TLM stores the incoming character with respect to its time of arrival. This module can detect nonrepetition types of variable strings such as VA, VB, and VC. TLM incorporates an input buffer, which stores characters recently received from a packet in chronological order. Using this buffer, TLM can answer queries such as Does the character at time belong to a (negated) character class C?. Let us consider the detection process of RegEx1 as an example. For the detection of RegEx1, a lowercase alphabetical character (i.e., from to ) must be detected between simple strings and. Let us assume for the time being that detection sequence and detection timing is already verified by the correlation block. The last portion that needs to be verified is whether the input character between and (i.e., In a formal notation, let be an ordered set of the TLM contents, while is the length of the character buffer. In the example, variable string is located at time 4. Receiving character in time 4 is verified. The query is in a statement in the form of to check for its validity. 2) Character Lookup Module (CLM): CLM is responsible for the detection of VH-type variable strings. This is one type of variable string that drags traditional FAs down to impractical architectures. Let us follow an example to see the detection procedure of the CLM using RegEx5. We assume the detection order and timing are already verified at the correlation block. The CLM has multiple memory locations to store detection timestamps of each character. For example, the first character is detected at time 1, so timestamp 1 is stored for character. Timestamp 2 will be stored for character, and so on. To match RegEx5, should not appear from time 4 to time 7. Let us call this time period the inviolable period. To verify that information, the CLM uses timestamps stored previously. In the example, was detected at time 7, and the time is stored at character timestamp memory in the CLM. From this information, we can conclude that the example input does not match RegEx5 because was detected during the inviolable period (at time 7) Input Time In a formal notation of the ASCII values, let ( can be 0 to 255) be an unordered set. Some components can be empty. Each element stores timestamps (e.g., shows component (lowercase ) appeared at time 6). The query is in a statement in the form of Is to check for its validity. For some RegExes, more than one timestamp needs to be stored per ASCII character. To clarify this need, consider following example: Input Time This example illustrates a situation that requires more than one timestamp entry. Character appears four times at times 4, 5, 6, and 7. CLM stores the timestamps. However, timestamps 4 6 are overwritten, and only timestamp 7 is stored because only one timestamp entry is assigned. In this situation, RegEx element cannot be verified correctly. At least two timestamp entries must be reserved in CLM for this example

7 BANDO et al.: SCALABLE LOOKAHEAD REGULAR EXPRESSION DETECTION SYSTEM FOR DPI 705 RegEx. 1 The simulation result in Section VII shows CLM only needs to store a small number of timestamps per each of the 256 ASCII characters regardless of the input traffic. In-Line Lookup Modules: 3) Repetition Detection Module (RDM): RDM is responsible for detecting repetitions that are not detected by CLM (variable strings of type VD, VE, VF, and VG). More formally, RDM detects components in the form by accepting consecutive repetitions of the, to times in the input. Here, base can be a single character, a character class, or a simple string. The shows a range of repetition, where is the minimum and is the maximum repetitions. The RDM is the only in-line detection module. The RDM consists of several identical submodules, and each submodule can detect character classes and negated character classes with repetitions. These submodules operate in an on-demand manner. Assume a RegEx detectionprocessisinprogressand the following component to be inspected is a character class or negated character class repetition. The correlation block sends a request to the RDM for the detection of this component. The request consists of the base ID and the minimum and maximum repetition boundaries, where the base is represented by a pattern ID. The RDM assigns one of the available submodules for detecting this component. The assigned submodule then inspects the corresponding input characters to see whether they all belong to the base range and if the number of repetitions is between the minimum and maximum repetition boundaries. Once the number of repetitions reaches the minimum repetition value, a next simple string is activated. The next simple string inactivates when the number of repetitions reaches to the maximum repetition value. Let us consider an example of detecting RegEx7 against character input abcxxxxop. At time 3, simple string is detected. The following repetition component is programmed in the RDM submodule immediately and starts checking the input character. From time 4 to time 6 consecutively, the module receives three characters that match the base. The minimum repetition value is satisfied, then the submodule sends an activate signal to the next component. Now, simple string is ready to receive an event. However, this is still not the end. The submodule continues checking the input characters. The following characters continuously match the base until the maximum repetition value is reached. If there is no match in this interval, the submodule deactivates the next component, in the example case Input Time Additional Buffered Lookup Modules: 4) Frequently Appearing Repetition Detection Module (FRM): Observe that, in practice, there are frequently used repetition bases (frequent bases). This creates an opportunity for sharing the detection effort for these frequent bases, further reducing resource usage. The FRM can detect any type of 1 In the example case, it is not necessary to store four timestamps because to verify the RegEx component, one appearance of is enough to invalidate the negation. repetition using a buffered lookup approach (variable strings of type VD, VE, VF, VG). When analyzing the RegEx sets, we rank popularity of in the RegEx set. For instance, is the base that most frequently appears in the Snort RegEx set. Both FRM and RDM verify repetition components, but FRM can be shared among multiple RegExes, which improves resource efficiency. Thus, in programming phase, the popular bases are programmed in the FRMs. The base is programmed in memory, so updating the base does not require logic modification. Each FRM has a counter that counts the number of consecutive input characters that match the base. When the sequence breaks, the count and timestamp are stored in a history memory inside of the FRM. Thus, for the query operation, we can find repetition behavior in a particular time range from the number of repetitions with the timestamp. 5) Short Simple String Detection Module (SDM): LaFA classifies simple strings even further; Regular Simple String (simply called simple string) and Short Simple String. As discussed in Section II-A.2, short strings are classified as shallow components. Shorter strings naturally are more likely to generate more events as described in the following section. To reduce occurrence of events generated by the simple string, LaFA uses another buffered lookup type of module called the SDM. To maximize the advantage of the event-based detection approach, it is important to select an appropriate signature size (the number of characters in a simple string signature). Imagine that the signature table in the simple string detector contains many one-character signatures. A simple string detector may generate an event for every single character input. (This is the reason that the traditional RegEx detection architectures suffer.) Assume all characters arrive with the same probability, and the signature table has kinds of signatures. The probability to generate an event by receiving one character is. The probability will be reduced exponentially,, as the number of characters increases. We classify strings with at least characters as a simple string, and less than that is classified as a short simple string. C. Detection of RegEx Extensions In addition to the basic RegEx components, LaFA can also support the following RegEx extensions: 1) back-reference; 2) lookahead assertions; 3) lookback assertions; and 4) conditional subpatterns. 1) Back-Reference: Back-reference is used to refer to a previously detected RegEx portion at a farther point in the RegEx. A backslash followed by a number (e.g., ). For example RegEx or matches to the simple string higher or highest and lower or lowest, but not higher or lowest or lower or highest. PCRE also supports Python syntax for back-references, where a RegEx portion can be referred by name. The previous example can be rewritten in the Python syntax as or, where HL is the name, which is used for the back-reference. By unrolling this RegEx into two RegExes as higher or highest and lower or lowest, such simple back-references that are referring to simple strings can be handled by LaFA easily. Now, let us consider a more sophisticated example for thedetectionoftheregex PinNo is equal to, where the back-reference is to a character class repetition. This RegEx matches to Pin No 1234

8 706 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 is equal to 1234, but it does not match to Pin No 1234 is equal to 2345 or to Pin No 1234 is equal to 23. These more sophisticated back-references are also easy to detect using LaFAs modular structure following these steps: 1) simple string Pin No is detected; 2) the number of consecutive digits are counted in RDM, until the count reaches to 4, or a nonnumeric character arrives causing a no match; 3) if the count in the RDM reaches to 4, the next simple string is equal to is activated; 4) at the same time RDM reports the detected repetition and stores this information in a memory, we call back-reference memory located in correlation block, whose memory fingerprint is negligible (since, assuming RDM has up to 128 characters, and even reserving 32 back-reference memory, each with 128 B requires 4-KB memory overhead per track); 5) simple string is equal to is detected; 6) repetition detection request is submitted to RDM; and 7) finally, if sufficient repetition (four digits) is detected, detected content is verified by comparing it against the previously detected repetition that is stored in back reference memory. 2) Lookahead Assertions: There are two options in the look ahead assertion: 1) positive assertion ( (?= ), and 2) negative assertion ( (?! ). An example for the positive assertion is the RegEx, which means a 10-letter word with no x character in it followed by x. Similarly, RegEx abc(?!xyz) is an example of negative assertion, and it matches to any abc that is not followed by xyz. Lookahead assertion also supports the following type of RegEx (?!abc)xyz, which matches to any xyz proceeded by something other than abc. Next, we show how LaFA can be used to detect these RegExes. Detecting RegExes like withlafais relatively simple. The variable string is programmed into the RDM with the character filter of the RDM is set to x (RDM character filter is described in [19]). Thus, receiving input character of x violates in RDM detection. LaFA can also detect RegExes like (?!abc)xyz very easily. When LaFA detects the simple string xyz, the detection time of the abc is checked, only if abc was not detected right before the xyz or detected at all, and LaFA reports a match for this RegEx. Finally, for detecting the RegExes like abc(?!xyz), CLM can be used. CLM can tell the detection time of a character or simple string. If the simple string xyz is not detected right after the simple string abc, LaFA reports a match for this RegEx. 3) Lookback Assertions: Lookback assertions are similar to Lookahead assertions and can be detected in LaFAinasimilar manner. 4) Conditional Subpatterns: The syntax of the conditional subpatterns are as follows. For example, let us consider matching ID numbers with two possible formats. One format starts with two alphabetical characters followed by an eight-digit number, and the second format contains only numbers. RegEx means condition is satisfied if at least one letter is found in the text. If a letter is found, the text is matched against the yes-pattern, otherwise the text is matched against the no-pattern. LaFA also supports conditional patterns. Actually, the notation of the conditional subpatterns is also used to improve detection performance of traditional detection methods. Since LaFA can perform such detection in deterministic time already, this type Fig. 7. (a) Logical view of the node memory showing two RegExes, RegEx1 and RegEx4. Detectionof causes the activation of both and simultaneously. (b) Physical view of the node memory. of RegEx can be simply unrolled without any significant performance penalty. A. Multiple Detection Tracks V. DISCUSSIONS Up to now, we have shown LaFA data structure and architecture with one set of a detection module (a detection block and a correlation block), which is called a track in the LaFA architecture. For some large RegEx sets, having more than one track allows constant detection time regardless of the input traffic. As we see in Fig. 7(a), two simultaneous activation processes occur for and by a detection event of. This can slow down the detection speed if those operations are performed sequentially. LaFA avoids this drawback by spreading the two RegExes, RegEx1 and RegEx4, into two tracks and performing the detection operation in parallel as shown in Fig. 7(b). As a result, regardless of the input traffic, LaFA will always need at most one operation per track, and all tracks work in parallel. In Section VII, we show that number of required tracks is limited to nine even for large RegEx sets. This is a huge saving with respect to an NFA implementation, which requires a parallelism of up to 409 transitions per input character for the same RegEx sets. Fig. 7(b) also illustrates how LaFA stores component information in a node memory. The nodes of and are stored at consecutive locations in the node memory since they will be verified successively to achieve higher access speed. Simple strings like and are independent from each other and any other component, so they can be stored anywhere in the memory. We store such independent nodes in a way that helps balancing the memory size. Locations of simple strings are resolved by pointers. B. Detection Dependency Fig. 8 shows the timing diagram for the LaFA pipeline. The LaFA pipeline has four stages, namely: 1) BR: bitmap read; 2) NR: node read; 3) EX: execute (timing check); and 4) BW: bitmap write. The BR and NR stages fetch information from the activerulebitmap and the node information memories. Based on the information, the node verification is executed in the EX stage. Finally, in the BW stage, any updates to the bitmap are applied. More detail of operation can be found in Section VI. In the LaFA pipeline, all detection modules take the same fixed processing time (i.e., all modules require the same amount of clock cycles to complete their tasks). In certain cases, the detection of a simple string can be completed before the preceding component is processed and gets a chance to activate. Thus, when is detected, it has not been activated yet, and the

BANDO et al.: SCALABLE LOOKAHEAD REGULAR EXPRESSION DETECTION SYSTEM FOR DPI 707 Fig. 10. Example of time event applied to two RegExes (RegEx10 and RegEx11). (a) Original RegExes.

Block detection optimization on three RegExes (RegEx8 RegEx10). (a) Original RegExes. (b) RegExes in LaFA notation. (c) RegExes after the block detection optimization.

9 BANDO et al.: SCALABLE LOOKAHEAD REGULAR EXPRESSION DETECTION SYSTEM FOR DPI 707 Fig. 10. Example of time event applied to two RegExes (RegEx10 and RegEx11). (a) Original RegExes. (b) RegExes in LaFA notation. (c) RegExes after the time event is applied. Fig. 8. Timing diagram of LaFA state machine and modules. Fig. 9. Block detection optimization on three RegExes (RegEx8 RegEx10). (a) Original RegExes. (b) RegExes in LaFA notation. (c) RegExes after the block detection optimization. proper detection of the corresponding RegEx is disrupted. For instance, consider the example in Fig. 8 (Note: the string IDs used in this example are not related to the string IDs appeared before). The detection of simple strings are handled by the simple string detector. Each successful detection of a simple string is followed by the corresponding correlation operations. To guarantee timely and orderly detection of the simple string and variable string, the detection of any variable string should be completed before the end of the next adjacent simple string. For instance, in this example, variable string should be handled before. The detection states of these three variable strings should be read from the modules, and the state machine should be updated accordingly before the content is handled. This timing condition can be represented as follows:.evenif a RegEx does not satisfy this condition, LaFA can still detect this RegEx by rewriting it. Remember that simple strings that are shorter than five characters are handled by the short simple string detector. The worst case is that is the shortest simple string (five characters), and is a repetition type of component with a minimum repetition of zero (0). Under this condition, we rewrite the RegEx accordingly. Define of as (i.e., ). The RegEx will be rewritten to the following four RegExes:,,, This rewriting operation does not change total memory requirement considerably. C. Block Detection Optimization In this section, we propose an optimized detection method named Block Detection that can reduce NFA s bottleneck for multiple concurrent operations significantly and also reduces requirements of node information memories. Fig. 9(a) shows three RegExes, and Fig. 9(b) shows the same RegExes in LaFA notation. All the RegExes in this example starts with followed by. Detection of activates three simple strings,,,and, and it causes multiple concurrent activation operations. Three RegExes share the same components back to back ( and ). Let us call this pair of components as block and reorder the detection sequence such that,,and are detected first. Only if,,or is detected, then we lookup to verify whether is detected. This can be implemented as follows. Detection time of will be stored in a block detection module, the same architecture as CLM. When the following simple string is detected (for example ), the is verified. Detecting this way, three concurrent operations reduce into one, and the number of tracks can be reduced more than 60% for these RegExes. Although LaFA uses multiple parallel tracks to maintain the number of concurrent operations by one, it is more effective to reduce the concurrent operation in the architecture level so that the number of tracks is kept to a minimum. D. Time Event Fig. 10 shows LaFA representation of two RegExes, RegEx10 and RegEx11. All variable strings,,,,are buffered lookup type. RegEx10 is a normal case, and it can perform verification operation after detecting simple string. On the other hand, RegEx11 does not have any simple string following to the two variable strings,, so LaFA does not know the right timing to verify the variable strings. Supporting this type of RegEx, we exploit a new type of event called Time Event, symbolized as in Fig. 10(c). In the example case, variable strings and need the verification of 301 characters after detection of, which comes from the sum of the two variable strings,. LaFA activates a timer to execute this type of event after detecting.the timer generates a time event after a defined time. At this time, LaFA performs lookup to verify and.evenif is not an exact repetition type of component, LaFA can support to issue a time event in minimum repetition timing. If the intermediate variable string (e.g., ) is not an exact repetition, we assign in-line lookup module instead of a buffered lookup to verify the variable string. Thus, that the type of RegEx is also supported. E. Online RegEx Updates An important property of LaFA is that it provides fast updates. Overall, a RegEx set can be updated in three different ways: 1) new RegExes can be added to the set; 2) RegExes can be deleted from the set; or 3) RegExes in the set can be modified. Typically, new RegExes are added to the set periodically (e.g., daily, or weekly for NIDPS as new attacks are identified). System administrators can also add, delete, or modify RegExes on the fly for performance reasons or to mitigate an ongoing attack or anomaly. Traditionally, when a new RegEx is available, the RegEx Detection System should be taken offline and updated with this new RegEx. This requires either deploying another system for backup or leaving the network open to attacks during the updates. The former solution is costly, whereas the latter is not secure. Thus, fast online updates are crucial.

10 708 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 LaFA can be updated very fast for two reasons: 1) the entire LaFA data structure is stored in the memory, thus any updates to the RegEx set can simply be done with only memory updates and without hardware reconfiguration; 2) the detection of different RegExes is loosely coupled in the LaFA data structure, so the updates for one RegEx can be completed by updating only a fraction of the LaFA data structure. In contrast, updates to both DFA- and other NFA-based methods are slow. In DFAbased methods, the data structures are very strongly coupled, so an update in one RegEx may affect many states/transitions in the DFA. Thus, any update has a potential to change most of the data structure, taking longer to complete. In traditional NFA-based approaches, the data structures are hard-coded on the logic gates, thus any update requires a hardware reconfiguration, which also takes long, and performance of the updated logic is not guaranteed. In updating the RegEx set in LaFA, the first step is to update the data structures offline. Once the sections of the data structure to be updated are determined, the changes are applied to LaFA. The content of the following four memory blocks in LaFA need to be modified to complete the desired updates: 1) active RegEx bitmap; 2) node information memory; 3) timing pointer; and 4) pointer memory. In rare occasions, detection modules may also need to be updated (for instance, if a new RegEx, which has a completely new character class, is added to the set, this range is added to the appropriate detection modules). All these updates can be done by parallel memory accesses since physically these structures reside on independent memory blocks. Moreover, since LaFA treats each RegEx independently, these updates are local. Additionally, the simple string detector needs to be updated, which may also be updated merely by memory updates (for instance, [15] [17]). Since updates are handled locally for each RegEx and they are carried as parallel memory accesses, roughly the update time is proportional to the number of components in a RegEx, which is no more than 36 for Snort RegEx sets. Allowing some margin for future growth, a RegEx update can be completed in clock cycles. Even for an update including 1000 RegExes, which is unlikely, clock cycles is adequate to complete the update. As shown in the performance section, LaFA can reach to 250 MHz clock speed. Thus, even if we reserve 1% of the clock cycles to perform updates, LaFA can complete an update with 1000 RegExes within in a fraction of a second, while the LaFA is operational at 99% capacity. Since usually the update requirements are way less than 1000 RegExes at a given time, and since the updates appear very infrequently (e.g., few updates per day), LaFA can be updated online and very efficiently without any significant performance penalty. F. Resetting Detection State in LaFA In order to avoid RegEx detection errors, the state of detection should be handled properly as LaFA processes packets. If aprefixofaregex appears in a packet, the detection process for that RegEx in LaFA waits in a state for the arrival of the next simple string following that prefix. That simple string may never arrive in this packet, but it may arrive in a following packet and can erroneously give a match to this RegEx. Yet, this RegEx does not appear as a whole neither in the first packet nor the latter packet. For instance, let be a RegEx given as Fig. 11. Example sessions., and two unrelated packets and arrive at LaFA back to back. Let contain and in order, and contain. Upon receiving, first the simple string is detected in.since is the first component in the RegEx, the corresponding node is active initially. Upon detection of, the node for the next component in namely, the node for is activated. Once is detected in, the node of is activated with the expectation of the detection of,andthe node of is deactivated. Since does not contain,the node of stays active after the processing of is completed, and the processing of starts. Once LaFA detects in, since the node of is still active, LaFA falsely reports the detection of in. In fact, does not occur neither in nor. To avoid such false matches, at the end of each packet, all the activated nodes should be deactivated before starting to process a new packet. However, this may be a high processing burden. For LaFA, we propose the following solution. We first divide the arriving trafficinto byte parts, and we call each byte part as a session,andthe as the session period in this paper. As an example, Fig. 11 shows two sessions: Session 0 and Session 1, where each session can contain multiple packets. If a new packet arrives with size larger than the remaining bytes in a session, a new session is enforced. At the end of each session, we reserve a period, where all the nodes are deactivated. This way, we can avoid timestamps from two different sessions to accidentally become valid and falsely identify a match. When a simple string is detected, a quick comparison between the timestamp in the node and the timestamp of the first byte of the packet reveals whether this simple string is activated in this packet or not. In the example above, even is detected when its state is active. This comparison reveals that is actually not activated in this packet. As a result, LaFA correctly reports a no-match to. The length of the session period can be selected based on available memory, line speed, and speed of LaFA. In LaFA, 14 bits are reserved for the timestamps, so the timing of B of network traffic can be distinguished. The largest RegEx set tested, the SnortComb (see Section VII for details about this RegEx set), contains 1596 simple strings. Thus, at the end of each session, only 1596 bits should be reset. (Note that nodes are not needed to be deactivated individually. Resetting an invalidate flag bit for a simple string deactivates all the nodes for that simple string.) Invalidated flags residing on a dual-port on-chip memory can be reset rapidly. For instance, for a typical memory data bus width of 32 bits per port (64 bits in total for the dual-port memory), all the 1596 bits can be reset within clock cycles, which corresponds to less than 0.02% overhead, and thus can be negligible. The additional memory requirement (1596 bits per track to accommodate

11 BANDO et al.: SCALABLE LOOKAHEAD REGULAR EXPRESSION DETECTION SYSTEM FOR DPI 709 Fig. 12. Block diagram of entire LaFA architecture including simple string detector. the invalidate flags) is also negligible compared to the overall memory requirement. VI. LaFA HARDWARE IMPLEMENTATION In the previous sections, we show the operation of two main building blocks of LaFA. In this section, we describe the hardware implementation of LaFA building blocks and related implementation issues [19]. An overview of the entire LaFA block diagram is given in Fig. 12. LaFA consists of structurally identical tracks. Before entering LaFA, input characters (8 bits per character) are processed at a simple string detector that produces 16 bits of a matching result (an event) when it detects a simple string. The event is essentially a string ID (SID) corresponding to the detected string. Once these events reach LaFA, they are broadcast only to the tracks that have this string. Note that some simple strings appear in multiple RegExes, but those RegExes are deployed in separate tracks. Since each track works in parallel, LaFA maintains a constant processing time regardless of the input traffic. The status of the detected string ID is checked in the active RegEx bitmap. Along with the status, node information of the corresponding string ID is fetched from the node information memory. The address of the node memory is stored in the pointer memory. LaFA proceeds with further detection operations if, and only if, the detected simple string is active and the detection timing is correct. Timing history memory is used to retrieve the simple-string detection timing. A variable string is verified next in either a buffered lookup or in-line lookup type of module after satisfying timing and status requirements. If the component is at the end of a RegEx (e.g., goes to the end of after reordering since the detection sequence becomes ) and all components in the RegEx are matched, then LaFA generates a RegEx match signal as a result. The following sections show hardware implementation of each building block. A. Hardware Implementation of Correlation Block The correlation block, shown in Fig. 12, consists of two submodules: 1) timing history memory, and2)node information memory. The descriptions of these two important data structures are as follows. 1) Timing History Memory: Timing history memory consists oftwopartsasshowninfig.13.timing storage memory stores the detection time of simple strings. Timing pointer memory stores pointers to the timing storage memory. These pointers Fig. 13. Timing example of (a) RegEx1 programed in Track 1 and (b) RegEx4 programed in Track 2 (P: pointer). are preprogrammed for each RegEx to specify the distance between the consecutive strings in the RegEx. For each simple string in a RegEx, other than the first string, a pointer is programmed to the timing pointer memory in order to point to the timing storage memory location (previous simple string in the RegEx). The pointers are used to correlate the detection time of two simple strings in a RegEx. Both memory blocks are indexed with string IDs. 2) Node Information Memory: Thedatastructureofthenode information memory, which consists of 96 bits, is shown in Fig. 14. The most significant three bits (bits 95 93) are used to identify the module ID. For example, TLM, CLM, and RDM are assigned module IDs 001, 010, and 011, respectively. The 92nd bit indicates whether the pattern is negated (1) or not (0). The next 16 bits are used as a pattern ID, unique to each pattern. The bitmap pointer (16 bits) contains the address to the next component location in the bitmap memory. For the last component of RegExes, these bits are used to store a RegEx match ID. The minimum and maximum times (12 bits each) maintain the relative detection time of the pattern. The absolute times are computed based on these time values and the previous simple-string detection time, which is obtained from the timing history memory described in the next section. In addition to the above node information, the repetition type of component uses three more fields: repetition rule type indicator (1 bit), minimum repetition value (12 bits), and maximum repetition value (12 bits). The repetition type indicator is used to show whether the repetition is a frequent repetition (FRM) or not (RDM). Note that although we need only 96 bits for the repetition type of rules, we reserve 96 bits for all node types in order to ensure simpler addressing in our prototype design.

12 710 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Fig. 14. Contents of the node memory in detail. 3) Correlation Operation: Using the four memories, we walk through correlation operations by revisiting the detection of RegEx1. As shown in Fig. 13, detection timestamps 3 and 6 are stored at locations corresponding to simple strings and, respectively, once these strings are detected. The contents of the timing pointer memory (corresponding to ) point to the location of the previous simple string, namely in the timing storage memory. Note that RegEx1 and RegEx4 both contain, so these two RegExes are programmed in different tracks, as shown in Fig. 13. For RegEx1, points to.forregex4, also points to.once is detected, the timestamp of the previous simple string can be obtained from this location. From this time information, as well as timestamp,andthe minimum and maximum times from node information memory, the Execute block computes and verifies the detection time. The Execute block also receives the status of the simple string from the active RegEx bitmap. In light of all this information, the simple string can be verified. Another part of the node information is used at the detection modules. B. Hardware Implementation of Detection Modules Buffered Lookup Modules: All buffered lookup modules maintain up-to-date information in the buffers in order to be ready for a query. In our design, all buffers are implemented using dual-port memory to support simultaneous updates and queries. More specifically, one port (update port) is reserved for updates, and the other port (query port) is used for queries. The following part of this section describes the architecture of each buffered lookup module. 1) Hardware Implementation of TLM: In each clock cycle, the received input character is stored in a block RAM called character buffer through the update port. On-demand queries are performed through the query port. For queries in the form Does the character received at time belong to a (negated) character class?, TLM receives a time value as the input and returns the character received at time as a result. The TLM consists of three stages. In Stage1, the TLM resolves target time (address) from the detection time of a string and string length by subtracting the string length from the string detection time to determine the time at which the character class is expected. In Stage2, the character at time is obtained. Finally, in Stage3, the obtained character is compared to the target character class using the character class filter [19] to see whether the character at time belongs to the character class. Hardware Implementation of CLM: CLMisdesignedtoefficiently detect negated character repetition components. CLM stores timestamps of the detection of each ASCII character. Thus, 256 memory locations are reserved, one for each ASCII character. Two comparators are implemented, one to compare the timestamps to the minimum timing, and the other one to the maximum timing. The timestamp memory and two comparators together are named Comparison Block in the figure. For the query, we simply compare the stored timestamps against the inviolable period described in Section IV-B. If the timestamp does not overlap with the inviolable period, we can conclude that the query character is NOT detected during the target range. From this process, negated character repetition can be verified. To prevent timestamp overwrites before they are needed, multiple comparison blocks are implemented. In-Line Lookup Modules: RDM is the only in-line lookup module in the LaFA architecture. Unlike buffered lookup modules, in-line lookup modules are used in an on-demand fashion describedinsectioniv. 2) Hardware Implementation of RDM: The RDM consists of character class filters and counters. A counter is incremented when an input character is matched with the target character class. The RDM resets the counter and frees the submodules for new queries if there is any mismatch in the input. As an in-line module, RDM must verify the input character from immediately after the simple string. To match the timing gap, the input characters are delayed for a fixed number of clock cycles in registers waiting for necessary information such as pattern ID, maximum repetition, and minimum repetition. The delayed character inputs are screened using the character class filter. The match signal goes to the repetition checker, which in turn performs the timing verification process described in Section IV-B. To support multiple simultaneous repetition components, multiple repetition checkers are implemented. Additional Buffered Lookup Modules: 3) Hardware Implementation of FRM: LaFA takes buffered lookup approaches to verify frequently appearing repetitions. Multiple repetition components sharing the same base can share one FRM, making efficient use of resources. For example, base appears very frequently in various repetition types,. A character class filter is used to detect a base. The consecutive repetition is counted in the counter. The count and timestamp are stored in a history memory when the consecutive sequence is broken. In the prototype design, we keep multiple histories in one FRM. For query operation, 24-bit comparators verify the repetitions and timing. 4) Hardware Implementation of SDM: The SDM detects short simple strings. We defineashortsimplestringasonewith less than five characters. The SDM architecture is very similar to that of the TLM, except that in the SDM, there are four memory blocks to support up to four character lookups in a clock cycle so that LaFA can perform queries to this module every clock cycle without delay. The SDM also has a mask function that allows queries for strings from one to four characters long. VII. PERFORMANCE EVALUATION In this section, we evaluate the performance of LaFA for memory requirement and speed and then compare LaFA performance with existing schemes. An important aspect of LaFA is that its performance and resource usage depends solely on RegEx sets and are independent of the input traffic, thus the analysis here is valid for any traffic pattern including the worst case. In other words, LaFA is secure against denial-of-service (DoS) attacks using crafted traffic (Section VII-E).

13 BANDO et al.: SCALABLE LOOKAHEAD REGULAR EXPRESSION DETECTION SYSTEM FOR DPI 711 TABLE II NUMBER OF REGEXES BEFORE AND AFTER UNROLLING TABLE III TIME AND NODE MEMORY REQUIREMENTS (TS: TIME STAMP;SS:SIMPLE STRING) A. RegEx Data Sets We evaluate LaFA for multiple diverse RegEx sets used widely in RegEx research. In particular, from the Snort rules (ver ), we select Web-related rules that are extensive andverycommonintraffic over the Internet (web-client and web-php). We also select the spyware rules (spyware-put) and Back Door rules (backdoor) as NIDPSs. To confirm the scalability of LaFA versus RegEx set sizes, the SnortComb RegEx set (a combination of all the aforementioned Snort rules) is also incorporated in our evaluation. LaFA is also evaluated against sets from Bro (ver ) and Linux Layer 7 (update date ). Finally, to present a fair comparison with other schemes, we also incorporated in our evaluation RegEx sets (Snort24, Snort31, and Snort34), which are used for evaluation of the scheme in [20]. B. System Design Consideration Before presenting the performance results of LaFA, in this section we elaborate on three considerations that have an impact on the memory requirements and speed of LaFA. Namely, we discuss here the impact of the unrolling operation and the timestamp storage on the memory requirements of LaFA, and the impact of multiple tracks on both the memory requirements and speed of LaFA. 1) Unrolling RegEx Sets: Table II provides a comparison in columns 1 and 2, respectively, between the number of original RegExes and the number of unrolled RegExes for each RegEx set. The third column shows the expansion ratio due to unrolling, a ratio never more than 3.16 for practical sets and even below 2 for most sets. 2) Timestamp History for Simple Strings: Upon detecting a simple string, LaFA verifies its time of detection against that calculated using the information obtained from the node. The node provides the distance from the previous simple string in RegEx, and the distance is used to calculate the actual timestamp of detection for the current simple string. This actual calculated timestamp is verified against the detected timestamp of the simple string. The detection process continues as explained in Section IV-A if, and only if, the timing verification for the simple string is successful. To understand why we need to store the timestamp history for the simple string, let us reconsider RegEx with the input string as. In this example, is detected twice, and hence the timestamp entry of the first is overwritten by the timestamp entry of the second. We cannot verify the timestamp of once is detected. Thus, even if the input belongs to the RegEx set, it cannot be detected successfully. To overcome this problem, we can store two timestamp histories for the simple string. Due to the lookahead operation, the maximum (worst-case) number of timestamps needed is determined by the RegEx set and is independent of the input traffic. The reason is that depending on the RegEx, only so many new activations can fit in between two simple strings. This conclusion is one of the reasons why LaFA is more efficient in general. Unlike traditional schemes, which start a new detection thread every time there is a partial match (e.g., each match to above), LaFA considers only the matches that satisfy a certain distance (last two s) from a simple string ( in the example), eliminating the rest of the partial matches without consideration. Table III gives the worst-case timestamp history storage requirements for simple strings. 3) Memory Load Balancing: A simple string may appear in multiple RegExes. Detection of a simple string that appears in RegExes requires activation operations. Distributed nodes that belong to the same string need to be stored in separate tracks. As shown in Section IV-A, we implement tracks to avoid contention for the constant detection time. Table IV shows the number of maximum concurrent operations per simple-string detection for different RegEx sets. For example, in RegEx set Snort24 out of the 48 simple strings: 44 simple strings appear only once, four of them appear twice, and no simple string appears more than two times. Thus, the number of maximum operations for this RegEx set is two. Even for large RegEx sets (SnortComb), a maximum of no more than eight operations per detection is required. In practice, the number of concurrent operations will be smaller compared to the worst case because only active RegExes need to be processed. However, LaFA is designed to support the worst case. Thus, the number of tracks is determined at compile/configuration time to be the highest number of concurrent operations required based on the RegEx set and does not change based on the input traffic. From Table IV, it is observed that for most simple strings (more than 80%), one operation is sufficient. This result simplifies the programming of node information and allows the even distribution of nodes among tracks. This is because after the first high operation components are spread among multiple tracks, the rest of the strings are free to go to almost any track. We distribute these components to different tracks such that the size of node memories in each track is balanced. As a comparison, Table IV also shows the number of simultaneous active RegExes for NFA implementations. It is clear that the number

14 712 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 TABLE IV NUMBER OF OPERATIONS PER SIMPLE-STRING DETECTION of simultaneous operations stays at a very low level for LaFA among all the sets, whereas in NFA, it quickly increases with the number of RegExes and easily becomes unmanageable. This is an important point contributing to the scalability of LaFA. C. LaFA Memory Requirements Table V summarizes the memory required for LaFA for different RegEx sets. The clear separation of detection modules and state storage in LaFA allows ustostorerequiredinformation in separate memories. The detailed memory requirement calculations of the LaFA detection modules can be found in [9]. Detection modules are replicated for each track. Therefore, the number of tracks for each RegEx set is also shown in Table V. The memory figures given in the table already incorporate all the replicas of these detection modules in all of the tracks. Note that four of the RegEx sets (Snort24, Snort31, Snort34, and Linux7) do not contain CLM-type of contents, so no CLM memory is required for these RegEx sets. The last column in Table III lists the number of nodes for each RegEx rule set. The Nodes row in Table V shows the corresponding memory requirement for each RegEx rule set (each node has a fixed width of 96 bits). Since the simple string detector is crucial for the LaFA architecture, estimatedmemoryrequirementsofthe simple string detector are also listed. This estimation is based on previous simple-string detection studies [16] [18]. However, simple-string detection is not a main concern of this paper, and therefore we do not elaborate further on simple-string detection. From the table, it is clear that information for the entire LaFA can be stored in the on-chip memory of the current state-ofthe-art FPGAs. Fig. 15 compares memory requirements with respect to RegEx set size between our scheme and four others: XFA [21], [22], HybridFA [23], SplittingFA [24], and CD FA [25]. We can see that the memory requirement of LaFA is an order of magnitude smaller than other schemes. The LaFA memory requirements trend among different sizes of rulessetsisshownasalineinfig.15.thetrendshowsthatthe memory requirements increase almost linearly with the number of rules. D. Speed To verify the feasibility of LaFA in terms of memory and speed, we prototyped LaFA on a PLDA XpressFX [26] board housing a Xilinx Virtex-4 FX100 FPGA, which has approximately 10 Mb of Block RAM. The logic utilization of the prototype is 3835 slices (or approximately 7%) accommodating Fig. 15. Scalability comparisons with other schemes. 3 K RegExes in one engine, which consists of three tracks. The prototype can reach a 250-MHz clock frequency (i.e., 2-Gb/s throughput per engine). The resource utilization of LaFA is dominated by its memory requirement as expected. Yet, even for SnortComb, the RegEx set with the highest memory requirement (at 2.2 Mb from Table V) can fit four parallel engines into the Virtex 4 FPGA providing 8-Gb/s throughput. We project that when LaFA is implemented on more recent FPGAs with higher memory capacity and logic speed, LaFA can deliver much higher throughput. For instance, 17 engines for SnortComb can fit on a single Xilinx Virtex 6 FPGA with 38 Mb of Block RAM, which results in an excess of 34-Gb/s throughput. E. Resilience Against DoS Attacks A security system itself should be resilient against DoS attacks, and LaFA is no exception. LaFA is designed to operate at deterministic time irrespective of the input traffic by controlling its event-based approach. The simple string detector generates at most one event per clock cycle, which in turn produces at most transitions, where is the number of tracks. The tracks operate in parallel and can perform transitions in total per clock cycle. Since all the events generated in one clock cycle can be processed in one clock cycle (in a pipelined manner), events are not going to be accumulated, and thus they do not need to be buffered irrespective of the input traffic. The detection modules are queried by these events, and so their loads are not accumulated, as well. Thus, the detection time is always deterministic regardless of the input traffic. VIII. RELATED WORK Exact string matching was the principal method used in early DPI systems and has been studied extensively [12] [15], [17], [18]. To keep up with new and highly sophisticated attacks, software-based NIDPSs started using RegEx-based signatures [11] to express these attacks more flexibly. The introduction of RegEx-based signatures increased the complexity

15 BANDO et al.: SCALABLE LOOKAHEAD REGULAR EXPRESSION DETECTION SYSTEM FOR DPI 713 TABLE V TOTAL MEMORY REQUIREMENTS FOR THE 10 REGEX SETS (SSD:SIMPLE STRING DETECTOR) of the DPI, limiting the scalability of Software NIDPS [1], [2] or packet classifiers [6] to high speeds. Current state-of-the-art hardware architectures are either space-efficient (NFA) or high-speed (DFA), but not both. Pure NFA implementations will be too slow in software, but hardware implementations can be feasible with a high level of parallelism [18]. The implementation in [18] uses solely logic gates. Although such schemes achieve high-speed RegEx detection, the inflexibility of implementing signatures on logic gates limits the updatability and scalability of NFA implementations. Mitra et al. [27] proposed a compiler to automatically convert PCRE opcodes into VHDL code to generate NFA on FPGA. The memory requirement of DFA implementation can be very high for certain types of RegExes. Some approaches [22] [25], [28] [34] aim to reduce the memory requirement by reducing the number of states. These approaches replace the DFA for these problematic RegExes with NFA or other architectures to minimize memory consumption, while using DFA to implement the rest of the RegExes [22] [25], [28] [34]. HybridFA [23] keeps the problematic DFA states as NFAs (other parts use DFA). In [24], the authors propose history-based finite automata (H-FA), which remember the transition history to avoid creating unnecessary states, and history-based counting finite automata (H-cFA), which add counters to reduce the number of states. The XFA proposed in [22] formalizes and generalizes the DFA state explosion problem and shows a reduction of number of states. In [28], the authors introduce a DFA-based FPGA solution. Multiple microcontrollers, each of which independently computes highly complex DFA operations, are used to avoid DFA state explosions. Reference [29] is similar to [28] in that it compiles RegExes into simple operation codes so that multiple, specifically designed microengines (microcontroller) work in parallel. However, even after replacing the problematic DFA states with more memory-efficient structures in the above schemes, the memory consumption of the rest of the DFA is still significantly high. Other approaches propose to reduce DFA memory by reducing the number of transitions. For instance, D FA [30] merges multiple common transitions, called default transitions, to reduce the total number of transitions. However, this approach may require a large number of transitions for some cases, leading to an increase in the number of memory accesses per input byte. In addition, D FA construction is complex and requires significant resources. Several researchers follow up on the D FA idea [25], [31]. CD FA [25] and MergeDFA [31] resolve the shortcomings of D FA by proposing multiple state transitions per character. MergeDFA bounds the number of worst-case transitions to,where is the length of the input string. Although bounded, MergeDFA still requires a relatively large number of memory accesses. CD FA can achieve one transition per input character. However, it requires a perfect hash function to do so. Although these schemes successfully address some issues in the D FA, they still cannot achieve satisfactory results for all three design objectives namely, flexibility for adding new signatures, efficient resource usage, and high-speed detection. In [32], the authors focus on optimization at the RegEx level before the FA is generated. They propose rule rewriting for particular RegEx patterns that cause state explosions. They also suggested grouping (splitting) the DFA into multiple groups to reduce the number of states. IX. CONCLUSION In this paper, we presented LaFA, an on-chip RegEx detection system that is highly scalable. The scalability of existing schemes is generally limited by the traditional per-character state processing and state transition detection paradigm. The main focus of existing schemes is on optimizing the number of states and the required transitions, but not on the suboptimal character-based detection method. Furthermore, the potential benefits of allowing out-of-sequence detection instead of detecting components of a RegEx in the order of appearance have not been explored. We propose to clearly separate detection operations from state transitions, opening up opportunities to further optimize traditional FAs. LaFA employs a novel lookahead technique to reorder the sequence of pattern detections, a technique that requires fewer operations and can declare a mismatch before evaluating complex patterns. Independent detection modules that are hardware-compatible are introduced to replace variable strings (i.e., complex patterns and shallow patterns). This solves the scalability problem of traditional FAs and greatly increases the memory efficiency of LaFA. Compared to today s state-of-the-art RegEx detection system, LaFA requires an order of magnitude less memory. A single commercial FPGA chip can accommodate up to (25 K) RegExes. Based on the throughput of our LaFA prototype on FPGA, we estimate that when used with a Snort RegEx signature set, a 34-Gb/s throughput can be achieved. Additionally, the modular structure

714 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 of LaFA provides a more flexible way to adapt to ongoing and future extensions of RegExes.

Available: http://www.snort.org [3] A.V.Aho,M.S.Lam,R.Sethi,andJ. D. Ullman, Compilers: Principles, Techniques, and Tools,2nded. Reading, MA: Addison-Wesley, 2006. [4] A. N.

e, C. Y. Tang,andC.L.Lu, RE-MuSiC:Atool for multiple sequence alignment with regular expression constraints, Nucl. Acids Res., no. 35, pp. W639 W644, 2007. [6] J. Levandoski, E. Sommer, and M.

16 714 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 of LaFA provides a more flexible way to adapt to ongoing and future extensions of RegExes. REFERENCES [1] Bro intrusion detection system, 2011 [Online]. Available: [2] Snort network intrusion detection system, Sourcefire, Columbia, MD, 2010 [Online]. Available: [3] A.V.Aho,M.S.Lam,R.Sethi,andJ. D. Ullman, Compilers: Principles, Techniques, and Tools,2nded. Reading, MA: Addison-Wesley, [4] A. N. Arslan, Multiple sequence alignment containing a sequence of regular expressions, in Proc. IEEE CIBCB, 2005, pp [5] Y. S. Chung, W. H. Lee, C. Y. Tang,andC.L.Lu, RE-MuSiC:Atool for multiple sequence alignment with regular expression constraints, Nucl. Acids Res., no. 35, pp. W639 W644, [6] J. Levandoski, E. Sommer, and M. Strait, Application layer packet classifer for Linux, 2009 [Online]. Available: l7-filter.sourceforge.net [7] Verizon, Verizon and Nokia Siemens networks set new record for 100 Gbps optical transmission, Verizon, Basking Ridge, NJ, Sep. 25, 2008 [Online]. Available: press-releases/verizon/2008/verizon-and-nokia-siemens.html [8] Networking solutions white paper, Cisco, San Jose, CA, 2008 [Online]. Available: [9] M.Bando,N.S.Artan,andH.J. Chao, LaFA: Lookahead finite automata for scalable regular expression detection, in Proc. ACM/IEEE ANCS, 2009, pp [10] N. Wirth, Compiler Construction. Reading, MA: Addison-Wesley, [11] Perl compatible regular expressions, 2011 [Online]. Available: [12] F. Yu, R. Katz, and T. Lakshman, Gigabit rate packet pattern-matching using TCAM, in Proc. IEEE ICNP, 2004, pp [13] S. Dharmapurikar, P. Krishnamurthy, T. S. Sproull, and J. W. Lockwood, Deep packet inspection using parallel bloom filters, Micro, vol. 24, no. 1, pp , [14] Y. H. Cho and W. H. Mangione-Smith, Deep network packet filter design for reconfigurable devices, Trans. Embed. Comput. Sys., vol. 7, no. 2, pp. 1 26, [15] N. S. Artan and H. J. Chao, TriBiCa: Trie bitmap content analyzer for high-speed network intrusion detection, in Proc. IEEE INFOCOM, 2007, pp [16] N. S. Artan, M. Bando, and H. J. Chao, Boundary hash for memory-efficient deep packet inspection, in Proc. IEEE ICC, 2008, pp [17] M. Bando, N. S. Artan, and H. J. Chao, Highly memory-efficient LogLog Hash for deep packet inspection, in Proc. IEEE GLOBECOM, 2008, pp [18] I. Sourdis, D. Pnevmatikatos, and S. Vassiliadis, Scalable multigigabit pattern matching for packet inspection, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 2, pp , Feb [19] M. Bando, N. S. Artan,N.Mehta,Y.Guan,andH.J.Chao, Hardware implementation for scalable lookahead regular expression detection, in Proc. RAW, 2010, pp [20] M. Becchi and P. Crowley, An improved algorithm to accelerate regular expression evaluation, in Proc. ACM/IEEE ANCS, 2007, pp [21] R.Smith,C.Estan,and S. Jha, XFA: Faster signature matching with extended automata, in Proc. IEEE SP, 2008, pp [22] R.Smith,C.Estan,S. Jha, and S. Kong, Deflating the big bang: Fast and scalable deep packet inspection with extended finite automata, in Proc. ACM SIGCOMM, 2008, pp [23] M. Becchi and P. Crowley, A hybrid finite automaton for practical deep packet inspection, in Proc. ACM CoNEXT, 2007, Article no. 1. [24] S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese, Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia, in Proc. ACM/IEEE ANCS, 2007, pp [25] S. Kumar, J. Turner, and J. Williams, Advanced algorithms for fast and scalable deep packet inspection, in Proc. ACM/IEEE ANCS, 2006, pp [26] PLDA XpressFX, PLDA, Aix-en-Provence, France, 2011 [Online]. Available: [27] A. Mitra, W. Najjar, and L. Bhuyan, Compiling PCRE to FPGA for accelerating SNORT IDS, in Proc. ACM/IEEE ANCS, 2007, pp [28] Z. K. Baker, H.-J. Jung, and V. K. Prasanna, Regular expression software deceleration for intrusion detection systems, in Proc. FPL, 2006, pp [29] M. Paolieri, I. Bonesana, and M. D. Santambrogio, ReCPU: A parallel and pipelined architecture for regular expression matching, in Proc. IFIP, VLSI SoC, 2007, pp [30] S.Kumar,S.Dharmapurikar,F.Yu,P.Crowley,andJ.Turner, Algorithms to accelerate multiple regular expressions matching for deep packet inspection, in Proc. ACM SIGCOMM, 2006, pp [31] M. Becchi and S. Cadambi, Memory-efficient regular expression search using state merging, in Proc. IEEE INFOCOM, 2007, pp [32] F.Yu,Z.Chen,Y.Diao,T.V.Lakshman,andR.H.Katz, Fastand memory-efficient regular expression matching for deep packet inspection, in Proc. ACM/IEEE ANCS, 2006, pp [33]D.Ficara,S.Giordano,G.Procissi,F.Vitucci,G.Antichi,andA. Di Pietro, An improved DFA for fast regular expression matching, Comput. Commun. Rev., vol. 38, no. 5, pp , [34] M. Becchi and P. Crowley, Extending finite automata to efficiently match Perl-compatible regular expressions, in Proc. ACM CoNEXT, 2008, Article no. 25. Masanori Bando (A 10) received the B.S. and M.S. degrees in electrical engineering from Tamagawa University, Tokyo, Japan, in 1999 and 2001, respectively, and is currently pursuing the Ph.D. degree, working on high-speed network security and next-generation router architectures, at the High Speed Network Laboratory, Polytechnic Institute of New York University, Brooklyn. N. Sertac Artan received the B.S. and M.S. degrees in electronics and communication engineering from Istanbul Technical University, Istanbul, Turkey, in 1998 and 2001, respectively, and the Ph.D. degree in electrical engineering from the Polytechnic Institute of New York University (NYU-Poly), Brooklyn, in He has been an Industry Assistant Professor with the Department of Electrical and Computer Engineering, NYU-Poly, since January Prior to his current position, he was a Research Assistant Professor in 2009 and a Postdoctoral Researcher from 2007 to 2009 with NYU-Poly. Between 1998 and 2002, he was an Application-SpecificIntegrated Circuit (ASIC) Design Engineer with the ITU ETA ASIC Design Center, Istanbul, Turkey, where he designed integrated circuits for commercial, academic, and military applications. His current research interests include high-speed network intrusion detection and prevention, networks-on-chip, and device and algorithm development for medical implants. H. Jonathan Chao (S 82 M 85 SM 95 F 01) received the Ph.D. degree in electrical engineering from The Ohio State University, Columbus, in He is a Department Head and a Professorofelectrical and computer engineering with the Polytechnic Institute of New York University, Brooklyn, which he joined in January During 2000 to 2001, he was Co-Founder and CTO of Coree Networks, Tinton Falls, NJ, where he ledateamtoimplement a multiterabit multiprotocol label switching (MPLS) switch router with carrier-class reliability. From 1985 to 1992, he was a Member of Technical Staff with Telcordia, Piscataway, NJ. He holds 42 patents with 10 pending and has published over 200 journal and conference papers. He coauthored three networking books, Broadband Packet Switching Technologies (Wiley, 2001), Quality of Service Control in High-Speed Networks (Wiley, 2001), and High-Performance Switches and Routers (Wiley, 2007). He has also served as a consultant for various companies, such as Huawei, Lucent, NEC, and Telcordia. He has been doing research in the areas of network designs in data centers, terabit switches/routers, network security, network on the chip, and biomedical devices. Prof. Chao is a Fellow of the IEEE for his contributions to the architecture and application of VLSI circuits in high-speed packet networks. He received the Telcordia Excellence Award in 1987.

Hardware Implementation for Scalable Lookahead Regular Expression Detection

Hardware Implementation for Scalable Lookahead Regular Expression Detection Masanori Bando, N. Sertac Artan, Nishit Mehta, Yi Guan, and H. Jonathan Chao Department of Electrical and Computer Engineering