An efficient sparse matrix format for accelerating regular expression matching on field-programmable gate arrays

Size: px
Start display at page:

Download "An efficient sparse matrix format for accelerating regular expression matching on field-programmable gate arrays"

Transcription

1 SECURITY AND COMMUNICATION NETWORKS Security Comm. Networks 2015; 8:13 24 Published online 10 May 2013 in Wiley Online Library (wileyonlinelibrary.com)..780 SPECIAL ISSUE PAPER An efficient sparse matrix format for accelerating regular expression matching on field-programmable gate arrays Lei Jiang 1,3 *, Jianlong Tan 2 and Qiu Tang 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2 Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 3 University of Chinese Academy of Sciences, Beijing, China ABSTRACT Regular expression matching is widely used in many programming languages and applications. A regular expression is transformed into a deterministic finite automata (DFA) for processing. However, the DFA requires large memory resources because of the state blowup problem. Many algorithms have been proposed to compress the DFA storage and generally store the compressed DFA in sparse matrix format. For field-programmable gate array (FPGA)-based implementations, operations on sparse matrix consume multiple clock cycles, thus reducing the flexibility and performance of applications. To accelerate the regular expression matching, we present a compact sparse matrix format for storing the compressed DFA transition table on the FPGA. Taking advantage of the special properties of sparse matrices generated by DFAs, we can accomplish one access within a single clock cycle. Furthermore, we develop a regular expression matching engine on a Xilinx (Xilinx Inc. Location: 2100 Logic Dr, San Jose, CA , USA) Virtex-6 FPGA chip using this sparse matrix format. Compared with previous solutions, this regular expression matching engine has more flexibility while keeping high compression ratio. The results show that this regular expression matching engine saves 94% of memory space compared with the original DFA structure while keeping a fast matching speed. By running multiple engines in parallel, our design achieves a throughput up to 29 Gbps. Copyright 2013 John Wiley & Sons, Ltd. KEYWORDS regular expression; DFA; sparse matrix; FPGA *Correspondence Lei Jiang, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. jianglei@ict.ac.cn 1. INTRODUCTION Regular expression provides a powerful and flexible method to match string in the text. Regular expression matching is widely used in many utilities and applications, such as text editors, programming languages, and network processing tools. Some of these languages, including Perl [1], Ruby [2], AWK [3], and Tcl [4], integrate regular expressions into the syntax of the core language itself. Other mainstream languages including C/C++ [5,6] and Java [7] also provide standard libraries for regular expression matching. Current network intrusion detection system (NIDS), such as Bro [8] and Snort [9] widely use regular expressions to depict attack signatures for its high expressiveness. For these network processing applications, regular expression matching is one of the biggest performance bottlenecks. Lots of theories and algorithms [10 12] on regular expressions have been proposed since 1960s. Typically the finite automaton (FAs) are classified by deterministic finite automata (DFA) and nondeterministic finite automata (NFA). DFA activates only one state transition for each input character, whereas NFA activates multiple transitions per character. Therefore, the DFA algorithm s searching complexity is O(1), and provides a fast and stable matching speed. For this reason, mainstream NIDS (snort, bro etc.) prefers DFA to perform regular expression matching. But as the rule sets are becoming increasingly complex and large, DFAs suffer from the state blowup problem. For example, the L7-filter s [13] regular expression rule set consumes more than 16 GB memory space [14] when compiled by normal DFA algorithm. So it is crucial to reduce the memory consumption of DFA to satisfy the complex and high speed networking environments. For this Copyright 2013 John Wiley & Sons, Ltd. 13

2 A sparse matrix format for regular expression matching target, many algorithms have been proposed to eliminate the redundancies in the DFA transitions and compress the DFA storage, such as D 2 FA [15] and ıfa [16,17]. Sparse matrices are ubiquitous in these algorithms to represent and store the compressed DFA transition table, yielding significant savings in memory usage. Existed sparse matrix formats, such as the compressed row storage [18], the jagged diagonal format [19], and compressed diagonal storage format [20], achieve different level of space efficiency and operation efficiency. However, for a regular expression matching application implemented on hardware, these formats decrease the performance and matching speed because the access to the sparse matrix elements consumes multiple clock cycles. In this paper, we present a novel architecture for sparse matrix storage on field-programmable gate array (FPGA). Observing the sparse matrices generated from DFAs, we notice two interesting features. First, the row size is much larger than the column size. For example, the column size of sparse matrix is always 256 for the extended ASCII table. Second, most of the nonzero elements are located in the specific columns. We adopt two techniques to implement our architecture: memory packing and interleaved memory. Taking advantage of these techniques, our design succeed to accomplish one access within a single clock cycle while keeping a high space efficiency. Utilizing the new sparse matrix architecture, we implement a regular expression matching engine inspired by the algorithm proposed by Y. Liu et al.[21], which compresses the DFA storage space by matrix decomposition. The architecture proposed in this paper has been targeted to the Xilinx Virtex-6 FPGA chip. The experimental results show that the proposed architecture achieves a throughput of nearly 30 Gbps and saves about 94% memory space at the best case. In summary, the main contributions of this paper are: (1) We present a novel compact sparse matrix format that can efficiently store the compressed DFA transition table. (2) On the basis of the new sparse matrix format, we design an FPGA-based architecture. By means of the parallelism of FPGA, this architecture can accomplish one DFA state retrieval in a single clock cycle. (3) We build a regular expression circuit according to a simple DFA compression algorithm on an FPGA board. We store the compressed transition table in our sparse matrix format, giving a flexible regular expression matching engine while saving more than 90% memory space. The rest of the paper is organized as follows. Section 2 presents the preliminary knowledge of our work, including the knowledge of regular expression, finite automata, and sparse matrix, then summarizes the related work in literature. Section 3 presents our sparse matrix format and an architecture for sparse matrix storage. Section 4 introduces a regular expression matching engine using our sparse matrix architecture. Section 5 presented experiment results. Finally, Section 6 concludes this paper. 2. RELATED WORKS 2.1. Introduction to FPGAs In this paper, the design is proposed and implemented on the basis of FPGA chips. First, we introduce the knowledge of FPGA. An FPGA is an integrated circuit designed to be configured by a customer or a designer after manufacturing hence field-programmable [22]. Figure 1 shows the architecture of FPGA consisting of clustered logic blocks(clbs), I/O cells, interconnection resources, and switch blocks. CLBs are the basic programmable units of FPGA, and they are connected together via interconnection resources and switch blocks. After burning a new circuit design into FPGA, the CLBs are reconnected by configuring the interconnection resources. Currently, graphics processing unit (GPU) [23] and multi-core central processing unit [24] are other two popular parallel technologies. Compared with FPGA, GPU is more powerful and requires data transferring between main memory and GPU cells, increasing the latency. Multi-core central processing unit is easier to program and compile, but the parallelism degree and performance are much lower than FPGA. In this paper, we take FPGA as our implementation platform for better performance Regular expression grammar Regular expressions are used for searching specify strings containing a particular pattern in a text. A regular expression is a pattern consisting of ASCII characters and some meta characters. Not like the string patterns, the regular expression patterns depict the characteristics using the meta characters, so that a regular expression can describe a set of strings without enumerating them explicitly. Table I lists the common grammar of regular expressions. For example, consider a regular expression (a b)cd. This pattern matches any string that starts with ASCII character a or b, then follows some arbitrary characters, and ends with ASCII letters cd. Figure 1. Structure of field-programmable gate array. 14 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

3 A sparse matrix format for regular expression matching Table I. Grammars of regular expression. Meta character Meaning Description. A single character wildcard Matches any character OR relationship Separates alternate possibilities? A quantifier denoting zero or one Matches the preceding pattern element zero or one time * A quantifier denoting zero or more Matches the preceding pattern element zero or more times + A quantifier denoting one or more Matches the preceding pattern element one or more times {M,N} Repeat from M to N times Denotes the minimum M and the maximum N match count [] A class of characters Denotes a set of possible character matches. [abc] denotes a letter a, b, orc Figure 2. Example of a deterministic finite automata compression algorithm Finite automata Finite automata is a natural formalism for regular expressions. Every language defined by a regular expression is also defined by FAs. It is a well-established fact that each regular expression can be transformed into an FA [25]. Two kinds of FA are mostly used in the regular expression matching: DFA and NFA. In this section, we mainly talk about the DFA solution. A DFA is commonly denoted as a 5-tuple: (Q,,, q0, Fin), where Q is a finite set of states, is a finite set of input symbols (usually the ASCII alphabet, including 256 characters), is a transition function, q0 is a start state, and Fin is a set of accepting states. The transition function describes how a DFA state transforms to another DFA state. Figure 2 is the DFA corresponding to the regular expression (a b)cd. It is worth noting that in Figure 2 we omit a lot of transitions: when the input character is not labeled in the figure, the next state will move to state 1. Obviously, there is a large sum of transitions in the DFA. In theory, when a regular expression is converted into a DFA, it may generate O( n ) states, where n is the length of regular expression. This means that at most we may need a memory space as large as O(256 n )to store a DFA, unacceptable for modern computer systems. This space problem is critical, especially when compiling multiple regular expressions into a composite DFA Realization of NFA State-of-the-art NIDS prefer regular expressions to depict attack signatures and process the packets for its powerful expressiveness. Traditionally, DFA and NFA are used to Figure 3. Sidhu s conversion algorithm: (a) single character, (b) union of N1 and N2, (c) concatenation of N1 and N2, and (d) repetition of zero or more than one (R*). implement regular expressions matching. The space complexity of NFA is O(n), and its searching complexity is O(n 2 ), whereas DFA s storage complexity is O(2 n ) in the worst case, and its searching complexity is O(1), where n is the length of regular expression. Floyd and Ullman showed that an NFA regular expression circuit can be implemented efficiently using programmable logic array architecture [26]. Sidhu et al. [27] and Clark et al. [28] showed that NFA is an efficient method in terms of processing speed and area efficiency for implementing regular expressions on FPGAs. The conversion from the NFA to Sidhu s circuit is shown in Figure3. In Sidhu s circuit, each state of NFA is implemented by a cascade of logic cells (LCs) of FPGA. Thus, the consumed LC resources is proportional to the number of states. Meanwhile, the clock frequency of designed circuit becomes lower, decreasing the performance of regular expression Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 15

4 A sparse matrix format for regular expression matching matching. For this reason, our matching circuit is based on the DFA method, so we mainly talk about the DFA algorithms in the following section DFA algorithms Compared with NFA-based algorithms, DFA-based algorithms usually consume masses of memories but provide high matching speed. For NIDS, DFA algorithm is more appealing because of its deterministic behavior and high throughput. To conquer the limitations of DFA, many algorithms have been proposed to compress the memory space and improve the performance of regular expression matching. Fang Yu et al. [29] study and try to resolve the state blowup problem of DFAs. They find that memory requirements using traditional methods are prohibitively high for typical patterns used in real-world packet payload scanning applications. Then, they propose regular expression, rewrite techniques to reduce memory usage and a group scheme to regular expression rule sets to several groups. However, their rule rewriting depends on the rule sets. It is possible that new attack signatures lead DFAs to become invalid come-out. In this case, new signature structures have to be studied. Kumar et al. [15] observed that two states (S1 and S2) have many similar next state transitions (T) for an input characters subset. According to this observation, they proposed a new algorithm called D 2 FA to compress the transition table. D 2 FA eliminates S1 s transitions (T) by introducing a default transition from S1 to S2. The experimental results show that a D 2 FA reduces transitions by more than 95% compared with the original DFA. However, D 2 FA s transition mechanism is possible to look up memory multiple times per input character, leading to a higher memory band. On the basis of the observation that most adjacent states share a large part of identical transitions, Ficara et al. [16] present a new representation for DFA, called delta finite automata (ıfa). They record the transition set of current state into a local memory, and only store the differences between current state and next hop state. In this way, ıfa achieves very good compression effect. In addition, this algorithm requires only a state transition per character (keeping the characteristic of standard DFAs), thus allows a fast string matching speed. Qi Y et al. [30] proposed a new compression yalgorithm named FEACAN for DFA transition table. FEACAN introduced a two-dimensional compression algorithm, utilizing the intrastate redundancy and interstate redundancy of DFA to reduce the memory consumption. The author used a bitmap to process the intrastate compression and a two-stage grouping algorithm to process the interstate compression. In addition, other techniques were adopted to further improve the performance of FEACAN, such as input-interleaving inputs and dual-lookup pipeline. Experimental results showed that FEACAN could achieve a throughput of 40 Gbps on FPGA chips. However, this architecture consumed four clock cycles to process one character, and this weakness limited the deployment of FEACAN. T. Liu et al.[14] introduce a new compression algorithm, which can reduce memory usage of DFA stably without significant impact on matching speed. They observe the characteristic of transition distribution inside each state, and find that above 90% of transitions in DFAs transfer to the initial state or its near neighbors, which are called magic states by Becci in [31]. By this observation, they divide all the transitions and store them into three different matrices and compress these matrices. Experiment results show that this algorithm save memory space by 95% with only 40% loss of matching speed comparing with original DFA. Y. Liu et al.[21] have presented a new DFA matrix compressing algorithm named column row decomposition (CRD). This algorithm is to decompose the DFA transition table into a column, a row vector, a sparse matrix to reduce the storage space as much as possible. Experiments on typical rule sets show that the proposed method significantly reduces the memory usage and still runs at fast searching speed. The algorithms mentioned earlier focus on eliminating the redundancies of DFA transition table and generally store the compressed DFA transition table in sparse matrix format. One widely used sparse matrix format is compressed row storage (CRS) [18]. The sparse matrix is decomposed with three vectors: row vector VR, column vector VC, and matrix values vector VV. The VV vector stores the values of the nonzero elements of the sparse matrix. The column vector VC contains the original column positions of the corresponding elements in VR. The row vector VR contains the position of the first nonzero element of each row located in vector VC. Another scheme to store the sparse matrix is the sparse block compressed row storage (BCRS) [32], which divides a sparse matrix into multiple blocks. Similar to the CRS format, three arrays are required for the BCRS format: a rectangular array, which stores the nonzero blocks in (block) row-wise fashion; an integer array, which stores the actual column indices in the sparse matrix elements of the nonzero blocks; and a pointer array, whose entries point to the beginning of each block row. The savings in storage for BCRS and CRS can be significant for normal sparse matrices. But they do not take into account the particular sparse matrix format generated by DFA compression algorithms DFA compression and sparse matrix A lot of works have been presented to solve the space problem of DFA by compressing the DFA transition table [14 16,21,30]. The key point of the compression techniques is to eliminate the redundancies of the DFA transition table, in which the sparse matrix is a very useful tool to store the compressed transition table. In the following paragraphs, we will explain these technical details. 16 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

5 A sparse matrix format for regular expression matching Figure 4. Example of a deterministic finite automata compression algorithm. Figure 5. A typical sparse matrix. The state transition table of a DFA can be considered as an mn matrix A, where m is the number of the states and n is the alphabet size. The matrix contains nm elements. Each element A[i, j] defines the state switching from state i to the next state through the input character label j. The DFA compression algorithms focus on eliminating the redundancies of the transition matrix, and store the compressed transition table in a sparse matrix format. Figure 4 is an example of a DFA compression algorithm. The left part of Figure 4 depicts the transition matrix corresponding to a DFA on the alphabet {a, b, c, d} that recognizes the regular expressions (a+), (b + c), and (cd+). And the right part is the sparse matrix after eliminating the same transitions. Obviously, the key problem of the compression algorithm is the storage and access of sparse matrix. However, previous sparse matrix formats usually take the software-based method, without concerning the situation of hardware. In the following section, we develop a compact sparse matrix format with a high efficiency of access, based on the parallelism of FPGA. 3. SPARSE MATRIX FORMAT DESIGN In this section, we present a novel sparse matrix format for efficient storage on FPGA using three techniques: memory packing, index table and interleaved memories. Then, we illustrate the technical details Sparse Matrix Format The storage of sparse matrix is the key problem for the whole matching circuit. A sparse matrix contains very few nonzero elements, usually less than 1%. Obviously, there is no need to store a lot of zeros, and many methods have been presented utilizing the sparse structure of the matrix. For example, a typical sparse matrix R is shown in the left part of Figure 5. The sparse matrix R has seven nonzero elements, whereas the total number of elements is 30, with a lot of redundancies. If the zeroes of sparse matrix are eliminated, we can save a lot of memory space. In fact, there have existed many sparse matrix storage schemes, such as the CRS [18], the linked list [33], and the jagged diagonal format [32]. The example of CRS for- Figure 6. Compressed row storage format. mat is shown as Figure 6. We continue to take the matrix in Figure 5 as the example. The matrix is split into three vectors: row vector Vec_row, column vector Vec_col, and matrix values vector Vec_val. The vector Vec_val contains all the values of the nonzero elements of the matrix. The vector Vec_col of the length equal to Vec_val contains the original column positions of the element in Vec_val. The element in Vec_row points to the first element of each row in Vec_val and Vec_col. This method is rather straightforward. Another efficient way to store the sparse matrix is the parse BCRS format [32]. The main idea of this method is to split the matrix to several blocks. Each block stores every nonzero matrix element in a single vector, including its value and position information. Then, a CRS scheme is performed on the block arrays. However, both algorithms take multiple queries to look up one element [34], inefficient for the hardware-based DFA storage and access Sparse matrix storage and access In this section, we use a memory packing scheme to compress the sparse matrix: joining all the nonzero elements together, and ignoring all the zeroes in the matrix, as shown in the right part of Figure 7. We exploit three techniques in our sparse matrix architecture: memory packing, index table, and interleaved memories. To explain our Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 17

6 A sparse matrix format for regular expression matching Figure 7. The compact sparse matrix storage structure. Figure 9. Interleaved memories for sparse matrix storage. Figure 8. Index table for sparse matrix R. method more clearly, we take an example to illustrate these techniques. The memory packing technique is shown in the right of Figure 7. This scheme drops all of the zeros of sparse matrix, achieving a considerable compression ratio, but it leads to another problem we lose the position information of the matrix elements. To address this issue, we develop a novel access scheme, which can accomplish the access operation in a single clock cycle. This is very important to improve the throughput for the FPGA-based regular expression matching engines. The first technique adopted is an index table, which indicates the start position of matrix elements in each row. For the embedded memory on current FPGA, the FPGA chips can read multiple bits (we call these bits one word in this paper) in one clock cycle, and the word width (how many bits a word contains) can be configured manually. Utilizing this feature, we can adjust how many matrix elements one word contains in accordance with specific conditions. The right part of Figure 8 shows the matrix R in Figure 6 packed into a memory layout, which contains two matrix elements in a word. The left part of Figure 8 is the structure of the index table. ROWID indicates the row number of the sparse matrix. INDEX indicates the index of the first word to read in the memory. OFFSET indicates the first element s location in the word. #NZ indicates the number of nonzero elements in each row of the sparse matrix. Once we retrieve the index table by ROWID, the word associated with the INDEX is read. Then #NZ consecutive elements are read starting at position OFFSET. For example, when a ROWID = 1 comes to retrieve the corresponding elements, by inquiring the index table entry, INDEX =0,OFFSET =1,#NZ = 2, the word 0 is read, and two continuous elements start at position 1 are taken out as the query results. Note that when ROWID = 4, the nonzero elements of one row distribute in two different words. In this case, we read two consecutive words from memory, consuming two clock cycles, resulting in a significant decrease in throughput. Using the parallel processing ability of FPGA, we adopt the technique of interleaved memories to deal with this problem. On modern FPGAs, there are hundreds of on-chip memory banks, which can be read or written concurrently [35]. On the basis of this feature, we store the sparse matrix in multiple on-chip memory banks. If the nonzero elements to be read span across M words, we distribute these elements to M parallel memory banks. M can be calculated by d NZ max w e, where w is the number of matrix elements one word contains and NZ max is the largest number of nonzero elements one row contains in the sparse matrix. We show the interleaved memories technique in Figure 9. In Figure 9, we split the original memory into two memory banks. One bank stores the odd numbered words, and the other bank stores the even numbered words. By this means, we are able to read any two consecutive words in a single read cycle. For example, one access wants to read an element in S3, which spans across two continuous words: word 1 and word 2. Using the interleaved memories, word 1 and word 2 can be read from two different memory banks simultaneously, consuming only one clock cycle. Comparing with the original memory layout, this architecture saves one clock cycle, thus improves the performance by about one time. Because the memory architecture is modified, we also adjust the index table design to accommodate the new structure. When executing the access operation, the information in which bank the starting word is located must be informed in advance. So we add one bit flag in the index table entry to label this information. Figure 10 shows the modified index table structure. Bank0 and Bank1 in Figure 10 correspond to the two memory banks in Figure 9, respectively. The FLAG in the index table indicates in which banks the starting word located. FLAG =0 means that the starting word is located in Bank0, whereas FLAG = 1 means that the starting word is located in Bank1. Assuming that a query with ROWID equaling to 1 comes to retrieve the corresponding sparse matrix element, by inquiring the index table entry 1, we find out FLAG is 0, INDEX is 0, OFFSET is 1, and #NZ is 2. Then, according to the FLAG and INDEX, we go to Bank0 to read 18 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

7 A sparse matrix format for regular expression matching Figure 12. Model of deterministic finite automata (DFA) engine. Figure 10. Modified index table structure. one input using n clock cycles, the engine will halt n 1 cycles to wait for the next state to work out. Our architecture avoids this embarrassing pipeline stall by processing one input character within one clock cycle. 4. HARDWARE IMPLEMENTATION FOR REGULAR EXPRESSION MATCHING Figure 11. Combine two words from multiple memory banks to a wide word. the starting word 0. Because OFFSET + #NZ is greater than w (recalling w is the number of matrix elements one word contains), which means the sparse matrix elements of row 1 spanning across two different memory banks. We continue to read word 0 in Bank1 to get the remaining elements. Then, we combine the two words to one wide word, whose width is 2w. From the wide word, we take out #NZ elements starting from position OFFSET. This process is shown in Figure 11. By this means, we manage to access the sparse matrix within one clock cycle Other details In our prototype implementation, we use two memory banks. This is because in our experiments, the NZ max (recalling that NZ max is the largest number of nonzero elements in one row) is 73, and two banks are enough to store the 73 nonzero elements. If the rule set is more complex, the NZ max is very likely to become greater. In this case, two memory banks are not enough to contain these elements. This problem can be solved by adding more memory banks in our design. Because NZ max will never be greater than 256 (256 is the size of extended ASCII table), the calculation shows that four memory banks satisfy any regular expression rule set. Most of the DFA-based matching algorithms can be abstracted to a model in Figure 12. The DFA engine looks up the transition table to get the next state, and then put output state as the next input state in the next clock cycle. This process brings in a feedback to the circuit. Because of the existence of feedback, if the DFA engine processes In the previous section, we propose a sparse matrix architecture and introduce its technical details. In this section, we will implement a regular expression matching engine using this architecture. Inspired by the DFA decomposition algorithm proposed by Y. Liu [21], we develop a regular expression matching engine adopting this architecture on FPGA chips. In this section, we depict the architecture of our regular expression matching engine Main idea of DFA decomposition algorithm Y. Liu presented a software-based DFA matrix compression algorithm named CRD. As shown in Figure 13, the basic idea of CRD is to decompose the DFA transition table (matrix A) into a column vector X of size m, a row vector Y of size n, and a sparse matrix R (that can be stored with little space) to reduce the storage space. When to access the matrix element, A[i, j] can be calculated as A[i, j] =X[i]+Y[j]+R[i, j]. A[i, j] is the element of matrix A, X[i] istheith element in vector X, Y[j] is the jth element in vector Y, and R[i, j] is the corresponding element of sparse matrix R. This DFA compression algorithm is simple and well suited for hardware implementation. In [21], Liu stores the nonzero elements of each row in a sorted array, and accesses an element by doing binary searching on it. This method is practical for software and the general purpose processors, but is inappropriate for hardware implementation Regular expression matching architecture The overall structure of our regular expression matching engine is shown in Figure 14. The component in the top Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 19

8 A sparse matrix format for regular expression matching Figure 13. Decomposition of deterministic finite automata transition matrix. dashed box is the regular expression compiler, and the component in the bottom dashed box is the matching circuit. After compiling the regular expressions rules into a DFA, the compiler transforms DFA transition matrix into a row vector X, a column vector Y, and a sparse matrix R, and then writes the result to the embedded memory of the FPGA chip. The matching process is as follows: the current state pointer and the input character are combined to determine whether the nonzero elements of sparse matrix are hit. If hit, the next state is equal to X[i] +Y[ j ]+R[i, j]. Else, the next state is equal to the X[i]+Y[ j ]. Figure 15 depicts the architecture of the matching circuit in detail. The architecture consists of a module of looking up vector X (here we call it C_VECTOR, because in the transition table matrix, the row index equals to the input character), a module of looking up vector Y (we call it S_VECTOR, because the column index equals to the current state), and a module looking up the sparse matrix R. EVEN_CM and ODD_CM represent Bank0 and Bank1 in Figure 10, respectively. As discussed previously, the nonzero elements are stored in these two memory banks. Assuming current state is s, when a new character c comes, we obtain the corresponding value X[c] and Y[s], respectively, by querying module C_VECTOR and module S_VECTOR. Querying the index table and memory banks, we obtain all the nonzero elements located in the sth row of sparse matrix R. Yet, if we want to find out the exact element, we need additional information to determine whether the input character hit the sparse matrix and in which location to take out the correct element value. So we redesigned the form of element stored in the memories by replacing the <R[i, j]> with <R[i, j], y>, where R[i, j] is the element value of sparse matrix R, and y is the corresponding column index. The input character is compared with y to determine whether the input character c hits the sparse matrix element. As shown in Figure 15, the hit wire connects to the SEL port of the multiplier "Mux". If the sparse matrix is hit, the corresponding sparse matrix element value R[i, j] works out simultaneously. In this case, the next state equals to X[c] +Y[s] +R[c, s]. If the sparse matrix is not hit, the next state equals to X[c]+Y[s]. Figure 14. Overall structure of regular expression matching engine. 20 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

9 A sparse matrix format for regular expression matching Figure 15. Detailed structure of regular expression matching circuit. Table II. Memory usage results. Rule sets bro217 snort24 snort31 snort34 No. of states No. of NZ Baseline DFA memory size (MB) Memory size of our work (MB) Comp. ratio (%) DFA, deterministic finite automata. Table III. Frequency and throughput results. Rule sets bro217 snort24 snort31 snort34 No. of states LUT REG fmax (MHz) Throughput (Mbps) PERFORMANCE EVALUATION 5.2. Memory usage evaluation We evaluate our regular expression matching engine from two aspects: storage efficiency and throughput. First, we study the memory usage and compression ratio using some real-life regular expression rule sets. Then, we implement a prototype design with a Xilinx Virtex-6 FPGA chip, and investigate the matching speed and memory consumption. Finally, we compare and analyze the experimental results with several previous works. In our architecture, the index table is stored in the distributed memory (which is implemented using LUTs and REGs in FPGA), whereas C_VECTOR, S_VECTOR, EVEN_CM and ODD_CM are stored in the embedded memory, so the minimum memory consumed Smin can be calculated as: 5.1. Testbench Nc is the finite set size of input char, Ns is the finite set size of states, and NZ is the number of nonzero elements in sparse matrix. If not compressed, a baseline DFA engine requires storing the whole transition table. So, the compression ratio r can be calculated as We select four sets of regular expressions rules from Snort and Bro: snort24.re (with 24 rules from Snort), snort31.re (with 31 rules from Snort), snort34.re (with 34 rules from Snort), and bro217.re (with 217 rules from Bro). We implement the prototype on a Xilinx Virtex-6 FPGA chip(xc6vsx475t: LCs, 7640 Kb distributed RAM, total Kb BRAM). Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. Smin = Nc dlg Nc e+ns dlg Ns e+nz (dlg Nc e+dlg Ns e) r= Smin Nc Ns lg Ns 21

10 A sparse matrix format for regular expression matching We list the experimental results of memory usage in Table II. From Table II, we conclude that the compression ratio of our architecture is more than 90% in average. The compression ratio on bro217 is the best, about 94%. Another observation is that the compression ratio is related with not only the number of states, but the number of nonzero elements of sparse matrix. Although the states number of snort24 is much smaller than that of snort34, more nonzero elements of snort24 still lead to a much lower compression ratio Speed evaluation We write our regular expression matching engine in Verilog language and implement the prototype on one Virtex-6 FPGA chip, using two memory banks to store the sparse matrix elements. We simulate our design using ModelSim simulator, and synthesize with ISE synplify tool chain. Because our design processes one input character in a clock cycle, we can evaluate the throughput by the clock frequency exactly. In addition, we also list the LUTs and registers consumption because these factors also affect the frequency to some extent. Table III is the throughput results in our experiments. In terms of theory, more DFA states and more nonzero elements lead to the increase of LUT resource consumption, subsequently decrease the clock frequency. But the experimental results show that when the states number increases from 5389 of snort31 to of snort34, the frequency only decreases from to MHz. This result shows that our architecture is able to keep a steady throughput under different rule sets, which is of great importance for NIDS. We achieve scalable performance by running multiple regular expression matching engines in parallel on FPGAs. The number of parallel engines is Table IV. Compression ratio compared with other implementations. Method Compression ratio Clock cycles per input DFA 0 1 DPICO [35] 65% 1 CPDFA [36] 90% >= 2 FEACAN [30] 90% 4 Our work 90% 1 DFA, deterministic finite automata. mainly determined by the size of on-chip memories of FPGA chip. More experimental results is shown in Table Comparison with other implementations We compare the compression ratio and throughput of our design with other implementations. In Table IV, we list the compression ratios and clock cycles per input character of different method. Specially, we compare the throughput of our results with FEACAN [30] using the same rule sets and on the same FPGA chip. The results are shown in Table V. From Table V, our clock frequency and throughput is slower than FEACAN, but FEACAN consumes four clock cycles to process one input character, which means the throughput per clock should be four times lower. By contrast, the design based on our sparse matrix format can process one input character in a single clock cycle (compared FEACAN consuming four clock cycles per input character), the throughput of our implementation per clock cycle is higher than FEACAN. If new techniques such as dual port SRAM presented in [30] are used, the speed could be doubled in theory, which means we will obtain a throughput of about 57 Gbps on bro217 rule set and snort31 rule set. 6. CONCLUSION AND FUTURE WORK In this paper, we focus on solving the storage problem of compressed DFA transition matrix on FPGAs. We present a new architecture for sparse matrix storage and access. Our architecture takes advantage of the special properties of sparse matrices generated by DFAs, significantly improving the flexibility and efficiency of FPGA-based applications. In this architecture, we adopt some new techniques to reduce the memory space, including memory packing, index table, and interleaved memory banks. Then, we propose a regular expression matching engine on one Xilinx Virtex-6 FPGA chip. Finally, we use four groups of real-life regular expression rule sets to evaluate the regular expression matching engine. The results show that our design saves 90% of memory space in average, and in the best case saves 94% of memory space. Then, we try to run multiple engines in parallel on the FPGA and Table V. Throughput compared with FEACAN. Rule sets Method No. of engines f max (MHz) Throughput (Gbps) Throughput per clock (Gbps) bro217 FEACAN bro217 our work snort24 FEACAN snort24 our work snort31 our work snort34 our work Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

11 A sparse matrix format for regular expression matching achieve a throughput of 7 Gbps using the snort24 rule set, and 29 Gbps using the snort31 rule set. Because we can accomplish one lookup per clock cycle, compared with previous solutions, our regular expression matching engine has more flexibility while keeping a high compression ratio. It needs emphasizing that the sparse matrix storage architecture can be used in various DFA compression algorithms. For simplicity, in this paper, we implemented it by the DFA transition matrix decomposition algorithm. The experimental results proved the feasibility and efficiency of our method. In the future, we will implement our design on multiple DFA compression algorithms. ACKNOWLEDGEMENTS This work has been partially funded by the National High- Tech Research and Development Plan 863 of China, under grants 2011AA and 012AA012502; the National Natural Science Foundation of China (NSFC) under grant ; and the Special Pilot Research of the Chinese Academy of Sciences under grant XDA REFERENCES 1. Wall L, et al. The perl programming language, Flanagan D, Matsumoto Y. The Ruby Programming Language. O Reilly Media: Sebastopol, California, Robbins A D. The gnu awk users guide, Ousterhout JK, Jones K. Tcl and the Tk Toolkit, Vol Addison-Wesley Reading: Boston, Massachusetts, Kernighan BW, Ritchie D, Lippman SB, Lajoie J. C programming. Language 2009, In press. 6. Stroustrup B, Online STB. The C++ Programming Language, Vol. 3. Addison-Wesley Reading: Boston, Massachusetts, Gosling J, Joy B, Steele G, Bracha G. Java (TM) Language Specification, The (Java (Addison-Wesley)). Addison-Wesley Professional: Boston, Massachusetts, Paxson V. Bro: a system for detecting network intruders in real-time. Computer Networks 1999; 31(23-24): Roesch M, et al. Snort-lightweight intrusion detection for networks, Proceedings of the 13th Usenix Conference on System Administration, Seattle, Washington, 1999; Baeza-Yates RA, Gonnet GH. Fast text searching for regular expressions or automaton searching on tries. Journal of the ACM (JACM) 1996; 43 (6): Myers G. A four russians algorithm for regular expression pattern matching. Journal of the ACM (JACM) 1992; 39(2): Thompson K. Programming techniques: regular expression search algorithm. Communications of the ACM 1968; 11(6): Levandoski J, Sommer E, Strait M, et al. Application layer packet classifier for linux, Liu T, Yang Y, Liu Y, Sun Y, Guo L. An efficient regular expressions compression algorithm from a new perspective, 2011 Proceedings IEEE on INFOCOM, IEEE, Shanghai, 2011; Kumar S, Dharmapurikar S, Yu F, Crowley P, Turner J. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. ACM SIGCOMM Computer Communication Review 2006; 36 (4): Ficara D, Giordano S, Procissi G, Vitucci F, Antichi G, Di Pietro A. An improved DFA for fast regular expression matching. ACM SIGCOMM Computer Communication Review 2008; 38(5): Ficara D, Di Pietro A, Giordano S, Procissi G, Vitucci F, Antichi G. Differential encoding of DFAs for fast regular expression matching. IEEE/ACM Transactions on Networking (TON) 2011; 19 (3): Bai Z. Templates for the Solution of Algebraic Eigenvalue Problems, Society for Industrial Mathematics, vol. 11: Philadelphia, PA, Saad Y. Krylov subspace methods on supercomputers. SIAM Journal on Scientific and Statistical Computing 1989; 10(6): Dongarra J. Sparse matrix storage formats. Templates for the Solution of Algebraic Eigenvalue Problems: a Practical Guide. SIAM 2000; 11: Liu Y, Guo L, Liu P, Tan J. Compressing regular expressions DFA table by matrix decomposition. Implementation and Application of Automata 2011; 6482: Wikipedia. Field-programmable gate array Wikipedia, the free encyclopedia, wikipedia.org/wiki/field-programmable_gate_array. 23. Nickolls J, Buck I, Garland M, Skadron K. Scalable parallel programming with CUDA. Queue ; 6 (2): Wikipedia. Multi-core processor Wikipedia, the free encyclopedia, Multi-core_processor. 25. Brüggemann-Klein A. Regular expressions into finite automata. Theoretical Computer Science 1993; 120(2): Floyd RW, Ullman JD. The compilation of regular expressions into integrated circuits, 21st Annual Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 23

12 A sparse matrix format for regular expression matching Symposium on Foundations of Computer Science, 1980, Syracuse, New York, 1980; Sidhu R, Prasanna VK. Fast regular expression matching using fpgas. In IEEE Symposium on Field- Programmable Custom Computing Machines. IEEE: Rohnert Park, California, 2001; Clark CR, Schimmel DE. Efficient reconfigurable logic circuits for matching complex network intrusion detection patterns. In Proceedings Of 13Th International Conference on Field Program. IEEE: Lisbon, Portugal, 2003; Yu F, Chen Z, Diao Y, Lakshman TV, Katz RH. Fast and memory-efficient regular expression matching for deep packet inspection. In ANCS ACM/IEEE Symposium on Architecture for Networking and Communications Systems, IEEE: San Jose, California, 2006; Qi Y, Wang K, Fong J, Xue Y, Li J, Jiang W, Prasanna V. Feacan: Front-end acceleration for content-aware network processing. In 2011 Proceedings IEEE on INFOCOM. IEEE: Shanghai, 2011; Becchi M, Crowley P. An improved algorithm to accelerate regular expression evaluation. In Proceedings of the 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems. ACM: Orlando, Florida, 2007; Vassiliadis S, Cotofana S, Stathis P. Block based compression storage expected performance. Kluwer International Series in Engineering and Computer Science 2002; 657: Hu J, Wang W. Algorithm research for vectorlinked list sparse matrix multiplication. In 2010 Asia- Pacific Conference on Wearable Computing Systems (APWCS). IEEE: Kaohsiung, Taiwan, 2010; Smailbegovic F, Gaydadjiev GN, Vassiliadis S. Sparse matrix storage format, Proceedings of the 16th Annual Workshop on Circuits, Systems and Signal Processing, Veldhoven, The Netherlands, 2005; Hayes CL, Luo Y. Dpico: a high speed deep packet inspection engine using compact finite automata. In Proceedings of the 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems. ACM: Orlando, Florida, 2007; Lin W, Tang Y, Liu B, Pao D, Wang X. Compact DFA structure for multiple regular expressions matching. In ICC 09. IEEE International Conference on Communications, IEEE: Dresden, Germany, 2009; Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

Large-scale Multi-flow Regular Expression Matching on FPGA*

Large-scale Multi-flow Regular Expression Matching on FPGA* 212 IEEE 13th International Conference on High Performance Switching and Routing Large-scale Multi-flow Regular Expression Matching on FPGA* Yun Qu Ming Hsieh Dept. of Electrical Eng. University of Southern

More information

Computer Science at Kent

Computer Science at Kent Computer Science at Kent Regular expression matching with input compression and next state prediction. Gerald Tripp Technical Report No. 3-08 October 2008 Copyright 2008 University of Kent at Canterbury

More information

Hybrid Regular Expression Matching for Deep Packet Inspection on Multi-Core Architecture

Hybrid Regular Expression Matching for Deep Packet Inspection on Multi-Core Architecture Hybrid Regular Expression Matching for Deep Packet Inspection on Multi-Core Architecture Yan Sun, Haiqin Liu, Victor C. Valgenti, and Min Sik Kim School of Electrical and Computer Engineering Washington

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

Hardware Implementation for Scalable Lookahead Regular Expression Detection

Hardware Implementation for Scalable Lookahead Regular Expression Detection Hardware Implementation for Scalable Lookahead Regular Expression Detection Masanori Bando, N. Sertac Artan, Nishit Mehta, Yi Guan, and H. Jonathan Chao Department of Electrical and Computer Engineering

More information

Automation Framework for Large-Scale Regular Expression Matching on FPGA. Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna

Automation Framework for Large-Scale Regular Expression Matching on FPGA. Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna Automation Framework for Large-Scale Regular Expression Matching on FPGA Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna Ming-Hsieh Department of Electrical Engineering University of Southern California

More information

Design of Deterministic Finite Automata using Pattern Matching Strategy

Design of Deterministic Finite Automata using Pattern Matching Strategy Design of Deterministic Finite Automata using Pattern Matching Strategy V. N. V Srinivasa Rao 1, Dr. M. S. S. Sai 2 Assistant Professor, 2 Professor, Department of Computer Science and Engineering KKR

More information

An Efficient Regular Expressions Compression Algorithm From A New Perspective

An Efficient Regular Expressions Compression Algorithm From A New Perspective An Efficient Regular Expressions Compression Algorithm From A New Perspective Tingwen Liu, Yong Sun, Yifu Yang, Li Guo, Binxing Fang Institute of Computing Technology, Chinese Academy of Sciences, 119

More information

NOWADAYS, pattern matching is required in an increasing

NOWADAYS, pattern matching is required in an increasing IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 3, JUNE 2011 683 Differential Encoding of DFAs for Fast Regular Expression Matching Domenico Ficara, Member, IEEE, Andrea Di Pietro, Student Member, IEEE,

More information

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Da Tong, Viktor Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Email: {datong, prasanna}@usc.edu

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

Regular Expression Acceleration at Multiple Tens of Gb/s

Regular Expression Acceleration at Multiple Tens of Gb/s Regular Expression Acceleration at Multiple Tens of Gb/s Jan van Lunteren, Jon Rohrer, Kubilay Atasu, Christoph Hagleitner IBM Research, Zurich Research Laboratory 8803 Rüschlikon, Switzerland email: jvl@zurich.ibm.com

More information

FPGA Implementation of Token-Based Clam AV Regex Virus Signatures with Early Detection

FPGA Implementation of Token-Based Clam AV Regex Virus Signatures with Early Detection IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735 PP 54-61 www.iosrjournals.org FPGA Implementation of Token-Based Clam AV Regex Virus Signatures

More information

CHAPTER 4 BLOOM FILTER

CHAPTER 4 BLOOM FILTER 54 CHAPTER 4 BLOOM FILTER 4.1 INTRODUCTION Bloom filter was formulated by Bloom (1970) and is used widely today for different purposes including web caching, intrusion detection, content based routing,

More information

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan Abstract String matching is the most

More information

TOKEN-BASED DICTIONARY PATTERN MATCHING FOR TEXT ANALYTICS. Raphael Polig, Kubilay Atasu, Christoph Hagleitner

TOKEN-BASED DICTIONARY PATTERN MATCHING FOR TEXT ANALYTICS. Raphael Polig, Kubilay Atasu, Christoph Hagleitner TOKEN-BASED DICTIONARY PATTERN MATCHING FOR TEXT ANALYTICS Raphael Polig, Kubilay Atasu, Christoph Hagleitner IBM Research - Zurich Rueschlikon, Switzerland email: pol, kat, hle@zurich.ibm.com ABSTRACT

More information

Fast Deep Packet Inspection with a Dual Finite Automata

Fast Deep Packet Inspection with a Dual Finite Automata 1 Fast Deep Packet Inspection with a Dual Finite Automata Cong Liu Jie Wu Sun Yat-sen University Temple University gzcong@gmail.com jiewu@temple.edu Abstract Deep packet inspection, in which packet payloads

More information

Bit-Reduced Automaton Inspection for Cloud Security

Bit-Reduced Automaton Inspection for Cloud Security Bit-Reduced Automaton Inspection for Cloud Security Haiqiang Wang l Kuo-Kun Tseng l* Shu-Chuan Chu 2 John F. Roddick 2 Dachao Li 1 l Department of Computer Science and Technology, Harbin Institute of Technology,

More information

Gregex: GPU based High Speed Regular Expression Matching Engine

Gregex: GPU based High Speed Regular Expression Matching Engine 11 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing Gregex: GPU based High Speed Regular Expression Matching Engine Lei Wang 1, Shuhui Chen 2, Yong Tang

More information

Deduction and Logic Implementation of the Fractal Scan Algorithm

Deduction and Logic Implementation of the Fractal Scan Algorithm Deduction and Logic Implementation of the Fractal Scan Algorithm Zhangjin Chen, Feng Ran, Zheming Jin Microelectronic R&D center, Shanghai University Shanghai, China and Meihua Xu School of Mechatronical

More information

Automatic compilation framework for Bloom filter based intrusion detection

Automatic compilation framework for Bloom filter based intrusion detection Automatic compilation framework for Bloom filter based intrusion detection Dinesh C Suresh, Zhi Guo*, Betul Buyukkurt and Walid A. Najjar Department of Computer Science and Engineering *Department of Electrical

More information

A SURVEY ON RECENT DFA COMPRESSION TECHNIQUES FOR DEEP PACKET INSPECTION IN NETWORK INTRUSION DETECTION SYSTEM

A SURVEY ON RECENT DFA COMPRESSION TECHNIQUES FOR DEEP PACKET INSPECTION IN NETWORK INTRUSION DETECTION SYSTEM A SURVEY ON RECENT DFA COMPRESSION TECHNIQUES FOR DEEP PACKET INSPECTION IN NETWORK INTRUSION DETECTION SYSTEM 1 S.Prithi, 2 S.Sumathi 1 Department of CSE, RVS College of Engineering and Technology, Coimbatore.

More information

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions Last lecture CMSC330 Finite Automata Languages Sets of strings Operations on languages Regular expressions Constants Operators Precedence 1 2 Finite automata States Transitions Examples Types This lecture

More information

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding LETTER IEICE Electronics Express, Vol.14, No.21, 1 11 Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding Rongshan Wei a) and Xingang Zhang College of Physics

More information

Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor. Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University

Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor. Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University Methods of Regular Expression Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University Abstract - Regular expressions

More information

Efficient Self-Reconfigurable Implementations Using On-Chip Memory

Efficient Self-Reconfigurable Implementations Using On-Chip Memory 10th International Conference on Field Programmable Logic and Applications, August 2000. Efficient Self-Reconfigurable Implementations Using On-Chip Memory Sameer Wadhwa and Andreas Dandalis University

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

An Architecture for IPv6 Lookup Using Parallel Index Generation Units

An Architecture for IPv6 Lookup Using Parallel Index Generation Units An Architecture for IPv6 Lookup Using Parallel Index Generation Units Hiroki Nakahara, Tsutomu Sasao, and Munehiro Matsuura Kagoshima University, Japan Kyushu Institute of Technology, Japan Abstract. This

More information

A Framework for Rule Processing in Reconfigurable Network Systems

A Framework for Rule Processing in Reconfigurable Network Systems A Framework for Rule Processing in Reconfigurable Network Systems Michael Attig and John Lockwood Washington University in Saint Louis Applied Research Laboratory Department of Computer Science and Engineering

More information

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and Computer Language Theory Chapter 4: Decidability 1 Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

More information

Developing a Data Driven System for Computational Neuroscience

Developing a Data Driven System for Computational Neuroscience Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate

More information

Efficient Packet Classification for Network Intrusion Detection using FPGA

Efficient Packet Classification for Network Intrusion Detection using FPGA Efficient Packet Classification for Network Intrusion Detection using FPGA ABSTRACT Haoyu Song Department of CSE Washington University St. Louis, USA hs@arl.wustl.edu FPGA technology has become widely

More information

Packet Inspection on Programmable Hardware

Packet Inspection on Programmable Hardware Abstract Packet Inspection on Programmable Hardware Benfano Soewito Information Technology Department, Bakrie University, Jakarta, Indonesia E-mail: benfano.soewito@bakrie.ac.id In the network security

More information

An Efficient Implementation of LZW Compression in the FPGA

An Efficient Implementation of LZW Compression in the FPGA An Efficient Implementation of LZW Compression in the FPGA Xin Zhou, Yasuaki Ito and Koji Nakano Department of Information Engineering, Hiroshima University Kagamiyama 1-4-1, Higashi-Hiroshima, 739-8527

More information

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China

More information

Highly Compressed Aho-Corasick Automata For Efficient Intrusion Detection

Highly Compressed Aho-Corasick Automata For Efficient Intrusion Detection Highly Compressed Aho-Corasick Automata For Efficient Intrusion Detection Xinyan Zha & Sartaj Sahni Computer and Information Science and Engineering University of Florida Gainesville, FL 32611 {xzha, sahni}@cise.ufl.edu

More information

Achieving Reliable Digital Data Communication through Mathematical Algebraic Coding Techniques

Achieving Reliable Digital Data Communication through Mathematical Algebraic Coding Techniques International Journal of Pure and Applied Mathematical Sciences. ISSN 0972-9828 Volume 9, Number 2 (2016), pp. 183-190 Research India Publications http://www.ripublication.com Achieving Reliable Digital

More information

TriBiCa: Trie Bitmap Content Analyzer for High-Speed Network Intrusion Detection

TriBiCa: Trie Bitmap Content Analyzer for High-Speed Network Intrusion Detection Dept. of Electrical and Computer Eng. : Trie Bitmap Content Analyzer for High-Speed Network Intrusion Detection N. Sertac Artan and Jonathan H. Chao 8 May 27 26th Annual IEEE Conference on Computer Communications

More information

Hardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor

Hardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor Kubilay Atasu IBM Research Zurich 23 May 2013 Hardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor Kubilay Atasu, Florian Doerfler, Jan van Lunteren, and Christoph

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

Implementation of Lexical Analysis. Lecture 4

Implementation of Lexical Analysis. Lecture 4 Implementation of Lexical Analysis Lecture 4 1 Tips on Building Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested It is easier to modify a working

More information

Analysis of Basic Data Reordering Techniques

Analysis of Basic Data Reordering Techniques Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu

More information

ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching

ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching Marco Paolieri, Ivano Bonesana ALaRI, Faculty of Informatics University of Lugano, Lugano, Switzerland {paolierm, bonesani}@alari.ch

More information

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture International Journal of Computer Trends and Technology (IJCTT) volume 5 number 5 Nov 2013 Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

More information

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT K.Sandyarani 1 and P. Nirmal Kumar 2 1 Research Scholar, Department of ECE, Sathyabama

More information

Low Complexity Opportunistic Decoder for Network Coding

Low Complexity Opportunistic Decoder for Network Coding Low Complexity Opportunistic Decoder for Network Coding Bei Yin, Michael Wu, Guohui Wang, and Joseph R. Cavallaro ECE Department, Rice University, 6100 Main St., Houston, TX 77005 Email: {by2, mbw2, wgh,

More information

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification Yaxuan Qi, Jun Li Research Institute of Information Technology (RIIT) Tsinghua University, Beijing, China, 100084

More information

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS Rev. Roum. Sci. Techn. Électrotechn. et Énerg. Vol. 61, 1, pp. 53 57, Bucarest, 016 Électronique et transmission de l information EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL

More information

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Shreyas G. Singapura, Anand Panangadan and Viktor K. Prasanna University of Southern California, Los Angeles CA 90089, USA, {singapur,

More information

FPGA Provides Speedy Data Compression for Hyperspectral Imagery

FPGA Provides Speedy Data Compression for Hyperspectral Imagery FPGA Provides Speedy Data Compression for Hyperspectral Imagery Engineers implement the Fast Lossless compression algorithm on a Virtex-5 FPGA; this implementation provides the ability to keep up with

More information

A Hardware Structure for FAST Protocol Decoding Adapting to 40Gbps Bandwidth Lei-Lei YU 1,a, Yu-Zhuo FU 2,b,* and Ting LIU 3,c

A Hardware Structure for FAST Protocol Decoding Adapting to 40Gbps Bandwidth Lei-Lei YU 1,a, Yu-Zhuo FU 2,b,* and Ting LIU 3,c 2017 3rd International Conference on Computer Science and Mechanical Automation (CSMA 2017) ISBN: 978-1-60595-506-3 A Hardware Structure for FAST Protocol Decoding Adapting to 40Gbps Bandwidth Lei-Lei

More information

Pipelined Parallel AC-based Approach for Multi-String Matching

Pipelined Parallel AC-based Approach for Multi-String Matching 2008 14th IEEE International Conference on Parallel and Distributed Systems Pipelined Parallel AC-based Approach for Multi-String Matching Wei Lin 1, 2, Bin Liu 1 1 Department of Computer Science and Technology,

More information

Design of a Near-Minimal Dynamic Perfect Hash Function on Embedded Device

Design of a Near-Minimal Dynamic Perfect Hash Function on Embedded Device Design of a Near-Minimal Dynamic Perfect Hash Function on Embedded Device Derek Pao, Xing Wang and Ziyan Lu Department of Electronic Engineering, City University of Hong Kong, HONG KONG E-mail: d.pao@cityu.edu.hk,

More information

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI CHEN TIANZHOU SHI QINGSONG JIANG NING College of Computer Science Zhejiang University College of Computer Science

More information

CS415 Compilers. Lexical Analysis

CS415 Compilers. Lexical Analysis CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework

More information

Improving Signature Matching using Binary Decision Diagrams

Improving Signature Matching using Binary Decision Diagrams Improving Signature Matching using Binary Decision Diagrams Liu Yang, Rezwana Karim, Vinod Ganapathy Rutgers University Randy Smith Sandia National Labs Signature matching in IDS Find instances of network

More information

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Shijie Zhou, Yun R. Qu, and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering, University of Southern

More information

Novel FPGA-Based Signature Matching for Deep Packet Inspection

Novel FPGA-Based Signature Matching for Deep Packet Inspection Novel FPGA-Based Signature Matching for Deep Packet Inspection Nitesh B. Guinde and Sotirios G. Ziavras Electrical & Computer Engineering Department, New Jersey Institute of Technology, Newark NJ 07102,

More information

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing CS 432 Fall 2017 Mike Lam, Professor Finite Automata Conversions and Lexing Finite Automata Key result: all of the following have the same expressive power (i.e., they all describe regular languages):

More information

Highly Space Efficient Counters for Perl Compatible Regular Expressions in FPGAs

Highly Space Efficient Counters for Perl Compatible Regular Expressions in FPGAs Highly Space Efficient Counters for Perl Compatible Regular Expressions in FPGAs Chia-Tien Dan Lo and Yi-Gang Tai Department of Computer Science University of Texas at San Antonio {danlo,ytai}@cs.utsa.edu

More information

Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching

Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching Weirong Jiang, Yi-Hua E. Yang and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of

More information

Hardware Description of Multi-Directional Fast Sobel Edge Detection Processor by VHDL for Implementing on FPGA

Hardware Description of Multi-Directional Fast Sobel Edge Detection Processor by VHDL for Implementing on FPGA Hardware Description of Multi-Directional Fast Sobel Edge Detection Processor by VHDL for Implementing on FPGA Arash Nosrat Faculty of Engineering Shahid Chamran University Ahvaz, Iran Yousef S. Kavian

More information

IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY FPGA

IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY FPGA IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY FPGA Implementations of Tiny Mersenne Twister Guoping Wang Department of Engineering, Indiana University Purdue University Fort

More information

OPTIMAL MULTI-CHANNEL ASSIGNMENTS IN VEHICULAR AD-HOC NETWORKS

OPTIMAL MULTI-CHANNEL ASSIGNMENTS IN VEHICULAR AD-HOC NETWORKS Chapter 2 OPTIMAL MULTI-CHANNEL ASSIGNMENTS IN VEHICULAR AD-HOC NETWORKS Hanan Luss and Wai Chen Telcordia Technologies, Piscataway, New Jersey 08854 hluss@telcordia.com, wchen@research.telcordia.com Abstract:

More information

Fast and Memory-Efficient Traffic Classification with Deep Packet Inspection in CMP Architecture

Fast and Memory-Efficient Traffic Classification with Deep Packet Inspection in CMP Architecture 2010 Fifth IEEE International Conference on Networking, Architecture, and Storage Fast and Memory-Efficient Traffic Classification with Deep Packet Inspection in CMP Architecture Tingwen Liu, Yong Sun

More information

Structural and Syntactic Pattern Recognition

Structural and Syntactic Pattern Recognition Structural and Syntactic Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent

More information

Multi-pattern Signature Matching for Hardware Network Intrusion Detection Systems

Multi-pattern Signature Matching for Hardware Network Intrusion Detection Systems This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 5 proceedings. Multi-pattern Signature Matching for Hardware

More information

DEEP packet inspection, in which packet payloads are

DEEP packet inspection, in which packet payloads are 310 IEEE TRANSACTIONS ON COMPUTERS, VOL. 62, NO. 2, FEBRUARY 2013 Fast Deep Packet Inspection with a Dual Finite Automata Cong Liu and Jie Wu, Fellow, IEEE Abstract Deep packet inspection, in which packet

More information

Lexical Analysis - 2

Lexical Analysis - 2 Lexical Analysis - 2 More regular expressions Finite Automata NFAs and DFAs Scanners JLex - a scanner generator 1 Regular Expressions in JLex Symbol - Meaning. Matches a single character (not newline)

More information

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm *

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 29, 595-605 (2013) High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm * JONGWOO BAE 1 AND JINSOO CHO 2,+ 1

More information

Scalable Lookahead Regular Expression Detection System for Deep Packet Inspection

Scalable Lookahead Regular Expression Detection System for Deep Packet Inspection IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 699 Scalable Lookahead Regular Expression Detection System for Deep Packet Inspection Masanori Bando, Associate Member, IEEE, N. Sertac Artan,

More information

Deep Packet Inspection of Next Generation Network Devices

Deep Packet Inspection of Next Generation Network Devices Deep Packet Inspection of Next Generation Network Devices Prof. Anat Bremler-Barr IDC Herzliya, Israel www.deepness-lab.org This work was supported by European Research Council (ERC) Starting Grant no.

More information

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased platforms Damian Karwowski, Marek Domański Poznan University of Technology, Chair of Multimedia Telecommunications and Microelectronics

More information

CHAPTER 9 MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

CHAPTER 9 MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES CHAPTER 9 MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES This chapter in the book includes: Objectives Study Guide 9.1 Introduction 9.2 Multiplexers 9.3 Three-State Buffers 9.4 Decoders and Encoders

More information

Efficient Assembly of Sparse Matrices Using Hashing

Efficient Assembly of Sparse Matrices Using Hashing Efficient Assembly of Sparse Matrices Using Hashing Mats Aspnäs, Artur Signell, and Jan Westerholm Åbo Akademi University, Faculty of Technology, Department of Information Technologies, Joukahainengatan

More information

Comparing and Contrasting different Approaches of Code Generator(Enum,Map-Like,If-else,Graph)

Comparing and Contrasting different Approaches of Code Generator(Enum,Map-Like,If-else,Graph) Comparing and Contrasting different Approaches of Generator(Enum,Map-Like,If-else,Graph) Vivek Tripathi 1 Sandeep kumar Gonnade 2 Mtech Scholar 1 Asst.Professor 2 Department of Computer Science & Engineering,

More information

Fault Diagnosis Schemes for Low-Energy BlockCipher Midori Benchmarked on FPGA

Fault Diagnosis Schemes for Low-Energy BlockCipher Midori Benchmarked on FPGA Fault Diagnosis Schemes for Low-Energy BlockCipher Midori Benchmarked on FPGA Abstract: Achieving secure high-performance implementations for constrained applications such as implantable and wearable medical

More information

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

More information

Efficient Parallelization of Regular Expression Matching for Deep Inspection

Efficient Parallelization of Regular Expression Matching for Deep Inspection Efficient Parallelization of Regular Expression Matching for Deep Inspection Zhe Fu, Zhi Liu and Jun Li Department of Automation, Tsinghua University, China Research Institute of Information Technology,

More information

Design and Implementation of 3-D DWT for Video Processing Applications

Design and Implementation of 3-D DWT for Video Processing Applications Design and Implementation of 3-D DWT for Video Processing Applications P. Mohaniah 1, P. Sathyanarayana 2, A. S. Ram Kumar Reddy 3 & A. Vijayalakshmi 4 1 E.C.E, N.B.K.R.IST, Vidyanagar, 2 E.C.E, S.V University

More information

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification. Fang Yu, T.V. Lakshman, Martin Austin Motoyama, Randy H.

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification. Fang Yu, T.V. Lakshman, Martin Austin Motoyama, Randy H. SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu, T.V. Lakshman, Martin Austin Motoyama, Randy H. Katz Presented by: Discussion led by: Sailesh Kumar Packet Classification

More information

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router Overview Implementing Gigabit Routers with NetFPGA Prof. Sasu Tarkoma The NetFPGA is a low-cost platform for teaching networking hardware and router design, and a tool for networking researchers. The NetFPGA

More information

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI, CHEN TIANZHOU, SHI QINGSONG, JIANG NING College of Computer Science Zhejiang University College of Computer

More information

Regular expression matching with input compression: a hardware design for use within network intrusion detection systems.

Regular expression matching with input compression: a hardware design for use within network intrusion detection systems. Regular expression matching with input compression: a hardware design for use within network intrusion detection systems. Gerald Tripp University of Kent About Author Gerald Tripp is a Lecturer in Computer

More information

Boundary Hash for Memory-Efficient Deep Packet Inspection

Boundary Hash for Memory-Efficient Deep Packet Inspection Boundary Hash for Memory-Efficient Deep Packet Inspection N. Sertac Artan, Masanori Bando, and H. Jonathan Chao Electrical and Computer Engineering Department Polytechnic University Brooklyn, NY Abstract

More information

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection StriD FA: Scalale Regular Expression Matching for Deep Packet Inspection Xiaofei Wang Junchen Jiang Yi Tang Yi Wang Bin Liu Xiaojun Wang School of Electronic Engineering, Dulin City University, Dulin,

More information

Software Architecture for a Lightweight Payload Signature-Based Traffic Classification System

Software Architecture for a Lightweight Payload Signature-Based Traffic Classification System Software Architecture for a Lightweight Payload Signature-Based Traffic Classification System Jun-Sang Park, Sung-Ho Yoon, and Myung-Sup Kim Dept. of Computer and Information Science, Korea University,

More information

PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho

PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database Johnny Ho Supervisor: Guy Lemieux Date: September 11, 2009 University of British Columbia

More information

Hardware Implementations of Finite Automata and Regular Expressions

Hardware Implementations of Finite Automata and Regular Expressions Hardware Implementations of Finite Automata and Regular Expressions Extended Abstract Bruce W. Watson (B) FASTAR Group, Department of Information Science, Stellenbosch University, Stellenbosch, South Africa

More information

Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider

Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider Tamkang Journal of Science and Engineering, Vol. 3, No., pp. 29-255 (2000) 29 Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider Jen-Shiun Chiang, Hung-Da Chung and Min-Show

More information

Line-rate packet processing in hardware: the evolution towards 400 Gbit/s

Line-rate packet processing in hardware: the evolution towards 400 Gbit/s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 259 268 doi: 10.14794/ICAI.9.2014.1.259 Line-rate packet processing in hardware:

More information

Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations

Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations Nikolai Zamarashkin and Dmitry Zheltkov INM RAS, Gubkina 8, Moscow, Russia {nikolai.zamarashkin,dmitry.zheltkov}@gmail.com

More information

Scanline-based rendering of 2D vector graphics

Scanline-based rendering of 2D vector graphics Scanline-based rendering of 2D vector graphics Sang-Woo Seo 1, Yong-Luo Shen 1,2, Kwan-Young Kim 3, and Hyeong-Cheol Oh 4a) 1 Dept. of Elec. & Info. Eng., Graduate School, Korea Univ., Seoul 136 701, Korea

More information

A MULTI-CHARACTER TRANSITION STRING MATCHING ARCHITECTURE BASED ON AHO-CORASICK ALGORITHM. Chien-Chi Chen and Sheng-De Wang

A MULTI-CHARACTER TRANSITION STRING MATCHING ARCHITECTURE BASED ON AHO-CORASICK ALGORITHM. Chien-Chi Chen and Sheng-De Wang International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 12, December 2012 pp. 8367 8386 A MULTI-CHARACTER TRANSITION STRING MATCHING

More information

An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic

An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic Pedro Echeverría, Marisa López-Vallejo Department of Electronic Engineering, Universidad Politécnica de Madrid

More information

Performance Modeling of Pipelined Linear Algebra Architectures on FPGAs

Performance Modeling of Pipelined Linear Algebra Architectures on FPGAs Performance Modeling of Pipelined Linear Algebra Architectures on FPGAs Sam Skalicky, Sonia López, Marcin Łukowiak, James Letendre, and Matthew Ryan Rochester Institute of Technology, Rochester NY 14623,

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

How to Apply the Geospatial Data Abstraction Library (GDAL) Properly to Parallel Geospatial Raster I/O?

How to Apply the Geospatial Data Abstraction Library (GDAL) Properly to Parallel Geospatial Raster I/O? bs_bs_banner Short Technical Note Transactions in GIS, 2014, 18(6): 950 957 How to Apply the Geospatial Data Abstraction Library (GDAL) Properly to Parallel Geospatial Raster I/O? Cheng-Zhi Qin,* Li-Jun

More information

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS. INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS Arulalan Rajan 1, H S Jamadagni 1, Ashok Rao 2 1 Centre for Electronics Design and Technology, Indian Institute of Science, India (mrarul,hsjam)@cedt.iisc.ernet.in

More information