Large-scale Multi-flow Regular Expression Matching on FPGA*

Size: px
Start display at page:

Download "Large-scale Multi-flow Regular Expression Matching on FPGA*"

Transcription

1 212 IEEE 13th International Conference on High Performance Switching and Routing Large-scale Multi-flow Regular Expression Matching on FPGA* Yun Qu Ming Hsieh Dept. of Electrical Eng. University of Southern California Yi-Hua E. Yang Network Division Huawei North America Viktor K. Prasanna Ming Hsieh Dept. of Electrical Eng. University of Southern California Abstract High-throughput regular expression matching (REM) over a single packet flow for deep packet inspection in routers has been well studied. In many real-world cases, however, the packet processing operations are performed on a large number of packet flows, each supported by many run-time states. To handle a large number of flows, the architecture should support a mechanism to perform rapid context without adversely affecting the throughput. As the number of flows increases, large-capacity memory is needed to store per flow states of the matching. In this paper, we propose a hardware-accelerated context mechanism for managing a large number of states on memory efficiently. With sufficiently large off-chip memory, a state-of-the-art FPGA device can be multiplexed by millions of packet flows with negligible throughput degradation for large-size packets. Post-place-androute results show that when 8 characters are matched per, our design can achieve 18 MHz clock rate, leading to a throughput of 11.8 Gbps. Index Terms Deep packet inspection, packet flow, context, FPGA, off-chip memory I. INTRODUCTION High-speed packet processing with large amount of state information is becoming an essential function of the network routers. For example, deep packet inspection (DPI) utilizing regular expression matching (REM) [1, 2] has been used for detecting malicious patterns in packet flow (see Section II-B). Most of the packet processing tasks, such as DPI using REM, require keeping increasingly large amount of states per packet flow [3, 4, 5, 6]. Specifically, packet processing engines keep track of current states and generate various outputs based on the saved states and input, making the states very important information to be recorded during run-time. Meanwhile, a major concern has been the rapidly growing number of packet flows and increasing network bandwidth. The aggregated internet traffic has been experiencing an annual bandwidth growth of 4%~5% from 22 to 21 [7], and the number of concurrent packet flows has increased to over millions in backbone routers. As a consequence, efficient mechanism is needed both to process packets at high throughput and to multiplex the packet processing engine by a large number of packet flows. * This work is supported by the U.S. National Science Foundation under grant CCR-11881; Equipment grant from Xilinx Inc. is gratefully acknowledged. In most cases, FPGA-based REM solutions [6, 8] only address the problem of matching a set of regular expressions (regexes) against a single packet flow. The traffic on a highspeed network link, on the other hand, usually consists of over thousands of packet flows at any time. In order to multiplex an existing single-flow REM solution [6, 8] by multiple packet flows, the REM system must have high-bandwidth access to the state context (see Section III) of every packet flow at run time [9]. The number of packet flows supported by the REM system is thus restricted by the size of the on-chip memory used for context storage. In general, high-bandwidth on-chip memory (e.g., distributed RAM on FPGA), as is used in [9], has limited size and is insufficient to hold the state context for more than hundreds of packet flows. Thus we need to explore off-chip memory in order to store large amount of run-time states. This paper focuses on the design of a highly efficient context mechanism for multiplexing a high-throughput packet processing engine by multiple packet flows. Using a high-performance REM solution [6] on FPGA as the example packet processing engine, our design allows the original single-flow REM solution to be multiplexed with single- context- overhead. Specifically, our main contributions are as follows: We propose a design for REM circuit to utilize off-chip context memory. We propose a deeply-overlapped schedule to manage the context and reduce the ing overhead on the REM throughput. We give a detailed implementation and performance evaluation to demonstrate high throughput. The paper is organized as follows. Section II introduces the background of our problem as well as prior work. Section III gives in detail the proposed architecture and the context management schedule. The performance is evaluated in Section IV. Finally Section V concludes the paper. II. BACKGROUND AND PRIOR WORK A. Regular Expression Matching (REM) A regular expression (regex) defines a regular language over a fixed alphabet. Given a regex r and a sequence of input characters s = [x, x 1,...], regular expression matching /12/$ IEEE 7

2 (REM) of r against s is the process of finding and reporting any substring of s which is a member of L(r), the regular language defined by r. In general, REM can be performed with a set of regexes {r, r 1,..., r m 1 }, where all regexes are matched against the input packet flow concurrently. A typical construction for the hardware-based regular expression matching engine (REME) uses non-deterministic finite automaton (NFA), utilizing the massively parallel and reconfigurable logic resources on FPGA to achieve high throughput [3, 5, 1, 11, 12]. Input char. On-chip circuit pipeline_ pipeline_1 pipeline_u-1 stage (,v-1) stage (,1) stage (,) stage (1,v-1) stage (1,) off-chip context memory stage (u-1,) results B. Multi-flow REM problem A packet flow is a sequence of packets sent from a particular source to a particular destination. Since the same network link is usually traversed by over thousands of packet flows, multiflow REM needs to be performed at the router. In multi-flow REM, all the regexes are matched against multiple input packet flows coming from the network interface. Although the input to the entire REM system consists of k interleaved packet flows {s, s 1,..., s k 1 }, all m regexes {r, r 1,..., r m 1 } are matched against each packet flow individually. Any match output associated with a specific regex is clearly identified by the corresponding input packet flow number (between and k 1) in which the match occurs. However, large numbers of packet flows and regexes may consume a lot of on-chip resources, and require highbandwidth access to memory to store and retrieve a large number of states online. As a consequence, it remains a challenge to dynamically updating all states during run-time. C. Prior work 1) Single-flow REM: From the hardware s point of view, the implementation of NFA-based REM was first studied by Floyd and Ullman in [13]. Later in [11], an algorithm was proposed to translate an arbitrary regular expression directly into its matching circuit on FPGA. [5] proposed a tree structure where character inputs are pipelined and broadcast. Automatic REM circuit construction in VHDL was proposed in [3] and [1], where the NFA structure at the circuit level is later used by most other implementations [3, 4, 5, 1]. Several techniques were proposed to improve the circuit and enhance the performance. Among them, [12] proposed an algorithm to construct multi-stride NFA. [3] proposed an approach which uses shift-register lookup tables (SRL) for implementing single-character repetitions. However, all of these ideas involving NFA-based REM require large size of state status to be stored. [6] proposed an efficient algorithm to construct the NFA structure for single-flow REM. The resulting NFA circuit was mapped into several modules, and multiple modules of the same structure were stacked together to match multiple characters per clock. The resulting circuit structure was 2-dimensional pipelined, with the character input propagating along different pipelines horizontally and along multiple stages vertically. The total size of state bits can be very large. Figure 1: Overall Architecture 2) Multi-flow REM: The first paper on multi-flow REM appeared in [1], where a parallel architecture of matching engine for regular patterns was proposed. For each single packet flow, the system matches the input packet flow individually, each by using an independent NFA-based matching subsystem. This approach is commonly referred as the individual solution to the multi-flow REM problem. However, since each packet flow gets its own designated resource, the solution does not scale well with respect to the number of packet flows. As depicted in [9], multiplexing solution is another option to multi-flow REM problem, where multiple packet flows share a single REM system composed of several REME. In this approach, multiple packet flows are time-multiplexed first outside of the NFA-based REM circuit. A multiplexer selects a single packet flow as the input to the REM circuit each time, and es to another packet flow after the current status of the states has been recorded. Since multiple packet flows can share the resources, the scalability issue is mitigated. In [9], context memory was instantiated by utilizing onchip distributed RAM of FPGA because of its maximum bandwidth among all types of memory. However, as the number of packet flows or regexes becomes larger, another obvious concern comes into picture: the memory size can no longer support that many packet flows and regexes. In reallife scenarios too many packet flows (>>1) and too many regexes (>1, each with ~1 state bits) need to be dealt with, while the maximum on-chip distributed RAM size is below 9Mb, making it impossible to store all contexts onchip. As a result, the shortcoming of the proposed design in [9] intuitively initiates the intrinsic motivation of this paper. III. CONTEXT SWITCH USING OFF-CHIP MEMORY The multiplexing of the REM circuit is facilitated by the context ing operation. A context, possibly consisting of millions of bits, represents the states of the REME corresponding to a particular packet flow at a specific byte offset. A. Overall Architecture Unlike in [9], where the high-bandwidth on-chip distributed RAM is used for context memory, in this work we focus on designing the context mechanism using the highcapacity off-chip memory. This allows us to multiplex the 71

3 STAGE (from the previous stage) context State buffer (to the next stage) On-chip circuit pipeline_ pipeline_1 pipeline_u-1 stage (,v-1) stage (1,v-1) REME Characters State registers Transition logic State registers Transition logic State registers Transition logic Figure 2: Stage organization stage (,1) 2 nd (uv) th stage (,) stage (1,) stage (u-1,) 1 st loading off chip context memory (a) Loading context high-performance REM circuit for a larger number of packet flows. The overall architecture is shown in Figure 1, where we arrange all the REME in a 2-dimensional array. We define the number of pipelines as u and the number of stages per pipeline as v, so the total number of stages is (u v), where each stage can consist of n REME. Further, each stage in Figure 1 is marked uniquely with a pair of numbers. When a specific packet flow is selected by the off-chip control circuit, the character input is propagated along pipelines and stages in a pipelined fashion as in [6]. To concurrently match against multiple packet flows, the original architecture of the stage proposed in [6] has to be modified accordingly. For a particular stage in Figure 1, a state buffer is attached to support context mechanism. The organization of each stage is shown in Figure 2, where the state buffer is locally connected to the adjacent stages as discussed in Section III-B1. B. Context Switch Mechanism 1) Context access: The context access order is scheduled in a snake-like linear array as shown in Figure 3, where the contexts propagate along the direction in a pipelined manner as the arrows indicate. With even number of pipelines implemented, only the first stage (,) and the last stage (u 1,) are directly connected to the off-chip memory. The context access datapath forms a ring structure, which is different from the datapath of input characters. The proposed context access order has the following properties: The contexts of all stages are loaded from (or offloaded to) the off-chip memory in the reverse order of the snakelike arrows, i.e., stage (u 1,), stage (u 1,1),... stage (u 1,v 1), stage (u 2,v 1), stage (u 2,v 2),... stage (,). During load time, the context of each stage is first loaded to the state buffer at stage (,), then shift through the state buffers of all stages following the snake-like arrows in Figure 3a until reaching its destination. It requires (u v) s to load the context of stage (u 1,) into its state buffer, and the total load time of all stages is (u v) s. During offload time, the contexts of each stage first shift through the state buffers following the snake-like arrows in Figure 3b until reaching the state buffer at stage (u On-chip circuit pipeline_ pipeline_1 pipeline_u-1 stage (,v-1) 2 nd stage (,1) 1 st stage (,) stage (1,v-1) stage (1,) offloading off chip context memory (b) Offloading context (uv-1) th stage (u-1,) (uv) th Figure 3: Context access schedule 1,), then is offloaded to the off-chip memory. It requires (u v) s to offload the context of stage (, ) into the off-chip memory, and the total offload time of all stages is (u v) s. 2) Context : After all stages have received the corresponding contexts from the off-chip context memory, for each stage as shown in Figure 2, the next context saved in the state buffer and the current context recorded in the state registers can be swapped in a single clock. For different stages, an efficient way to contexts is to pipeline the ing control signals along with the character input in a 2-dimensional architecture, resulting a diagonal waveform-like propagation for the context as shown in Figure 4. Specifically, in the first after the completion of context access, only stage (,) is ing to the next packet flow while halting the REM (otherwise the context in the state registers will be destroyed), and all other stages stick with the REM for the current packet flow; in the second, both stage (,1) and stage (1,) are ing while halting the REM, and stage (,) starts its new REM for the next packet flow; the ing process will propagate along the diagonal of the 2-dimensional array until the stage (u 1, v 1) is ed in the (u + v 1)th. The proposed context order has the following properties: Context occurs after loading the next context, and 72

4 On-chip circuit pipeline_ pipeline_1 pipeline_u-1 stage (,v-1) stage (1,v-1) v th (v+1) th (u+v 1 ) th flow i flow (i+1) (uv) s matching load ( u+v-1 ) s (uv) s offload matching offload min. matching time= (uv+u+v-2) s stage (,1) stage (,) stage (1,) stage (u-1,) 1 st 2 nd u th Figure 4: Context order before storing the current context. When ing context from flow i to flow i + 1, all stages have to after matching to the same character offset in flow i. The context of all stages can be ed in the same order as the propagation of the input character as shown in Figure 4. 3) Context update schedule: Since the bandwidth between the on-chip circuit and the off-chip memory usually supports multiple REME to load and offload contexts, a stage can have multiple REME (n) so that each stage can read and write contexts in a single clock. An example of context update schedule for three packet flows and the entire REM circuit is shown in Figure 5. We have the following observations: The load and offload time can be overlapped assuming the off-chip memory supports concurrent read-and-write access. The context load or offload time for the whole REM circuit is (u v) s due to the snake-like linear array in Figure 3, resulting in a minimum matching time of q = (u v + u + v 2) s during which the REM should not be interrupted by ing. Alternating reads and writes can be slow for some types of memory. If load and offload time cannot be overlapped, then there must be (2 u v) s between context es, resulting in a minimum matching time of (2 u v + u + v 2) s. The ing lasts for (u + v 1) s. The method mentioned in Section III-B2 leads to the stepping slopes during context as shown in Figure 5. The context in each stage takes only 1 off the REM process, resulting in a single- context overhead. A. Experimental setup IV. PERFORMANCE EVALUATION We conducted experiments using Xilinx ISE 13.1 targeting Virtex 6 HX-565T FPGA chip. The on-chip circuit can be configured as 8 pipelines, 8 stages per pipeline and 4 REME 1 per stage to improve area efficiency. Multiple on-chip circuits 1 Each REME corresponds to a single regex. flow (i+2) load matching Figure 5: Context update schedule were constructed for different number of regexes to match single or multiple characters per. To simplify the experiments, we only consider DDRII SRAM as the off-chip context memory. In practice, the proposed context scheme can be applied to systems with various types of off-chip memory. We used 6 parallel SRAM modules (each 1M 36bits, 3 MHz DDRII, dual port access) as the context memory. Since the off-chip memory access bandwidth is limited, the maximum total number of regex states that can be accessed for a single stage is bounded. With 4 REME per stage, the memory access bandwidth can support concurrently 1 stage to read and 1 stage to write, each of up to 432 bits per. The control circuit and multiplexer outside the REM circuit were excluded from the on-chip design. The targeted device was synthesized (xst) and place-and-routed (par) with either the maximize speed or minimize area option. Post-place-and-route results are reported. We used a fixed set of regexes extracted from Snort-rules (published in February 21) [1]. Regexes consisting of large number of states (> 18 states per regex) were excluded. Note that our REME design methodology as well as the proposed architecture can handle larger number of states. For fair evaluation of the proposed multi-flow REM, regexes that are too short (< 1 states) were also omitted. Our implementation prototype consists of a set of 256 regexes with in average 72 states per regex, while the longest regex we instantiated consists of 96 states. B. Evaluation results The access bandwidth and the size of off-chip memory are fixed in the experiments. The total memory size in the experiments is able to hold different contexts for 1 million concurrent packet flows. Using more copies of off-chip SRAM modules can support even larger number (over millions) of packet flows. Assuming there are sufficient off-chip resources, we conducted experiments mainly on the following parameters: Total number of regexes (n u v) Number of input characters per (m) Context period (p) (see Section IV-B3) For each parameter, we analyze its influence on the following metrics: Clock rate and throughput On-chip resource usage 73

5 Throughput (Gbps) Throughput Min. matching period Number of REME Figure 6: Throughput vs. number of REME (1-character input) Clock rate (MHz) clock rate throughput No. of input characters per Figure 7: Throughput vs. no. of input characters (256 REME) Min. matching period (s) Throughput (Gbps) No. of occupied slices No. of occupied slices occupied slices I/O pins Number of REME (a) Number of REME occupied slices I/O pins No. of input characters per (b) Number of input characters (256 REME) Figure 8: Resource consumption Number of used I/O pins Number of used I/O pins We represent the experimental results in Figure 6, Figure 7 and Figure 8. 1) Number of regexes: The total number of regexes in our design can be factored into three variables- Number of pipeline (u) Number of stages per pipeline (v) Number of REME per stage (n) We constructed several circuits to examine the effect of varying the number of regexes. Each circuit was organized as a fixed number of (n = 4) REME per stage since the number of REME per stage depends on the context access bandwidth. Because varying the number of pipelines is similar to varying the number of stages for the 2-dimensional array in Figure 1, we configured our circuits to have u 8 pipelines, each pipeline having a fixed number of (v = 8) stages. By varying the number of pipelines (u) only, we present the experimental results in Figure 6. a) Clock rate and throughput: The clock rate and the throughput nearly have a linear relationship despite a small negligible term (context overhead), so the curve of clock rate is omitted in Figure 6. The trend of the blue curve (throughput) in Figure 6 is not overwhelmingly influenced by a large number of regexes; our design can achieve a throughput of 2.17 Gbps for 256 REME. For 256 REME circuit (8-by-8 array), the minimize area option rather than the maximize speed of the targeted device was used for place-and-route (par) results. Therefore we expect the clock rate to drop when optimizing the area efficiency as shown in Figure 6. By increasing the total number of regexes, the minimum matching time also increases as shown in Figure 6, which will be discussed later. b) On-chip resource usage: The number of occupied slices is measured, since it indicates the area we used on FPGA. We also measure the number of used I/O pins because it may become a constraint when we only have limited number of I/O pins on-chip. As shown in Figure 8a, we notice- The number of occupied slices or used I/O pins increases linearly with the number of regexes, because more pipelines have to be implemented to accommodate more REME; Since each pipeline has the same number of stages and REME, the resource (slices and I/O pins) consumption has a linear relationship with the number of REME; The number of used I/O pins becomes the most consumed resource, since a large amount of I/O pins have to be used as the memory interface connecting to the off-chip context memory. 2) Number of input characters per : To enhance the throughput, we implemented the multi-character matching mechanism using the method proposed in [6] in our design. Specifically, a single-character REME takes a single character (8 bit) as input per ; by stacking the same REME together 74

6 Table I: Multi-flow vs. Single-flow (8-character, 256 REME) REM Clock Throughput Min. match. Occupied I/O (MHz) (Gbps) time (s) slices pins Single any flow (16%) (14%) Multi flow (16%) (37%) and removing redundant registers, a multi-character REME can be constructed, where multiple characters can be taken as input to the on-chip circuit per clock. The resulting circuit can have longer routing paths, which affects the clock rate negatively. However, since the m-character REME can match more characters (8 m bits) per, a higher throughput can be achieved. a) Clock rate and throughput: The green and orange lines in Figure 7 indicate, respectively, the achievable clock rate and throughput of our multi-flow REM design with respect to the number of input characters. We also implemented the single-flow multi-character REM, utilizing the same 2- dimensional structure as shown in Figure 1. We have the following two observations: As shown in Figure 7, when more characters are matched per, the clock rate of multi-flow REM decreases due to longer routing paths in each REME; however, the throughput increases due to multi-character input. As shown in Table I, compared with the single-flow multicharacter REM, the clock rate of our design slightly differs from the single flow REM (by <4%), yielding effectively the same throughput. b) On-chip resource usage: The on-chip resources consumed by 256 REME circuit for multi-flow REM is listed in Table I, compared with the resource usage of single-flow REM system. As shown in Table I, we need slightly more on-chip resources to implement the multi-flow REM. We used a lot of I/O pins in the implementation of multi-flow REM. As shown in Figure 8b, to match m characters per, we need to stack m copies of the same REME together, resulting in a sublinear increase of resource consumption with respect to increasing number of input characters. 3) Context period: The context period indicates the period of the context es in our design. a) Clock rate and throughput: As discussed in Section III-B3, the minimum matching time between two context es in a particular stage is q s. During this period of time, continuous data from a packet flow should be fed into the on-chip circuit. A small value of the minimum matching time is desirable because- If p q + 1, then our REM system can achieve the maximum matching throughput. With sufficiently large p, we only have negligible throughput degradation. If p < q + 1, then the REM system achieves lower throughput due to the idle s during which the system has to wait for the context load-offload process to complete. In general, if we denote the single-flow throughput as T{ and the multi-flow throughput as T, then we have T = p 1 p T, if p q + 1 p 1 q+1 T where q = u v + u + v 2., if p < q + 1 b) On-chip resource usage: The designed context period has no influence on the resource consumption. V. CONCLUSION In this paper, we studied the multiplexing solution to REM problem for over thousands of concurrent packet flows. With an off-chip context memory and an extension to the singleflow REM architecture, we developed an implementation of multi-flow REM to support a large number of packet flows and a large number of regexes on FPGA. The same approach can be used in any packet processing for multiple network flows whenever a large number of state bits are involved. REFERENCES [1] SNORT, [2] Bro Intrusion Detection System, [Online]. Available: [3] J. Bispo, I. Sourdis, J. M. P. Cardoso, and S. Vassiliadis, Regular expression matching for reconfigurable packet inspection, in Proc. IEEE Intl. Conf. on Field Programmable Technology (FPT), December 26, pp [4] C. Clark and D. Schimmel, Scalable pattern matching for high speed networks, in Proc. IEEE Sym. on Field-Programmable Custom Computing Machines (FCCM), April 24, pp [5] B. L. Hutchings, R. Franklin, and D. Carver, Assisting Network Intrusion Detection with Reconfigurable Hardware, in Proc. IEEE Sym. on Field-Programmable Custom Computing Machines (FCCM), 22, p [6] Y.-H. E. Yang, W. Jiang, and V. K. Prasanna, Compact Architecture for High-Throughput Regular Expression Matching on FPGA, in Proc. ACM/IEEE Sym. on Architectures for Networking and Communications Systems (ANCS), November 28. [7] Minnesota Internet Traffic Studies (MINTS), [8] Y.-H. E. Yang and V. K. Prasanna, Automatic Circuit Construction for Large-Scale Regular Expression Matching on FPGA, in Proc. Intl. Conf. on ReConFigurable Computing and FPGAs, 28. [9] Y. Qu, Y.-H. E. Yang, and V. K. Prasanna, Multi-stream Regular Expression Matching on FPGA, in Proc. Intl. Conf. on ReConFigurable Computing and FPGAs, December 211. [1] A. Mitra, W. Najjar, and L. Bhuyan, Compiling PCRE to FPGA for accelerating SNORT IDS, in Proc. ACM/IEEE Sym. on Architecture for Networking and Communications Systems (ANCS), New York, NY, USA, 27, pp [11] R. Sidhu and V. Prasanna, Fast Regular Expression Matching Using FPGAs, in Proc. IEEE Sym. on Field-Programmable Custom Computing Machines (FCCM), 21, pp [12] N. Yamagaki, R. Sidhu, and S. Kamiya, High-Speed Regular Expression Matching Engine Using Multi-Character NFA, in Proc. Intl. Conf. on Field Programmable Logic and Applications (FPL), Aug. 28, pp [13] R. W. Floyd and J. D. Ullman, The Compilation of Regular Expressions into Integrated Circuits, Journal of ACM, vol. 29, no. 3, pp ,

Automation Framework for Large-Scale Regular Expression Matching on FPGA. Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna

Automation Framework for Large-Scale Regular Expression Matching on FPGA. Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna Automation Framework for Large-Scale Regular Expression Matching on FPGA Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna Ming-Hsieh Department of Electrical Engineering University of Southern California

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

Automatic compilation framework for Bloom filter based intrusion detection

Automatic compilation framework for Bloom filter based intrusion detection Automatic compilation framework for Bloom filter based intrusion detection Dinesh C Suresh, Zhi Guo*, Betul Buyukkurt and Walid A. Najjar Department of Computer Science and Engineering *Department of Electrical

More information

Highly Space Efficient Counters for Perl Compatible Regular Expressions in FPGAs

Highly Space Efficient Counters for Perl Compatible Regular Expressions in FPGAs Highly Space Efficient Counters for Perl Compatible Regular Expressions in FPGAs Chia-Tien Dan Lo and Yi-Gang Tai Department of Computer Science University of Texas at San Antonio {danlo,ytai}@cs.utsa.edu

More information

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Weirong Jiang, Viktor K. Prasanna University of Southern California Norio Yamagaki NEC Corporation September 1, 2010 Outline

More information

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Da Tong, Viktor Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Email: {datong, prasanna}@usc.edu

More information

High-throughput Online Hash Table on FPGA*

High-throughput Online Hash Table on FPGA* High-throughput Online Hash Table on FPGA* Da Tong, Shijie Zhou, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 989 Email: datong@usc.edu,

More information

EFFICIENT FAILURE PROCESSING ARCHITECTURE IN REGULAR EXPRESSION PROCESSOR

EFFICIENT FAILURE PROCESSING ARCHITECTURE IN REGULAR EXPRESSION PROCESSOR EFFICIENT FAILURE PROCESSING ARCHITECTURE IN REGULAR EXPRESSION PROCESSOR ABSTRACT SangKyun Yun Department of Computer and Telecom. Engineering, Yonsei University, Wonju, Korea skyun@yonsei.ac.kr Regular

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

Online Heavy Hitter Detector on FPGA

Online Heavy Hitter Detector on FPGA Online Heavy Hitter Detector on FPGA Da Tong, Viktor Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Email: {datong, prasanna}@usc.edu Abstract Detecting heavy

More information

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA 2010 International Conference on Field Programmable Logic and Applications Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Weirong Jiang, Viktor K. Prasanna Ming Hsieh Department

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Efficient Self-Reconfigurable Implementations Using On-Chip Memory

Efficient Self-Reconfigurable Implementations Using On-Chip Memory 10th International Conference on Field Programmable Logic and Applications, August 2000. Efficient Self-Reconfigurable Implementations Using On-Chip Memory Sameer Wadhwa and Andreas Dandalis University

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Hardware Implementation for Scalable Lookahead Regular Expression Detection

Hardware Implementation for Scalable Lookahead Regular Expression Detection Hardware Implementation for Scalable Lookahead Regular Expression Detection Masanori Bando, N. Sertac Artan, Nishit Mehta, Yi Guan, and H. Jonathan Chao Department of Electrical and Computer Engineering

More information

Regular Expression Acceleration at Multiple Tens of Gb/s

Regular Expression Acceleration at Multiple Tens of Gb/s Regular Expression Acceleration at Multiple Tens of Gb/s Jan van Lunteren, Jon Rohrer, Kubilay Atasu, Christoph Hagleitner IBM Research, Zurich Research Laboratory 8803 Rüschlikon, Switzerland email: jvl@zurich.ibm.com

More information

A MULTI-CHARACTER TRANSITION STRING MATCHING ARCHITECTURE BASED ON AHO-CORASICK ALGORITHM. Chien-Chi Chen and Sheng-De Wang

A MULTI-CHARACTER TRANSITION STRING MATCHING ARCHITECTURE BASED ON AHO-CORASICK ALGORITHM. Chien-Chi Chen and Sheng-De Wang International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 12, December 2012 pp. 8367 8386 A MULTI-CHARACTER TRANSITION STRING MATCHING

More information

A Framework for Rule Processing in Reconfigurable Network Systems

A Framework for Rule Processing in Reconfigurable Network Systems A Framework for Rule Processing in Reconfigurable Network Systems Michael Attig and John Lockwood Washington University in Saint Louis Applied Research Laboratory Department of Computer Science and Engineering

More information

TOKEN-BASED DICTIONARY PATTERN MATCHING FOR TEXT ANALYTICS. Raphael Polig, Kubilay Atasu, Christoph Hagleitner

TOKEN-BASED DICTIONARY PATTERN MATCHING FOR TEXT ANALYTICS. Raphael Polig, Kubilay Atasu, Christoph Hagleitner TOKEN-BASED DICTIONARY PATTERN MATCHING FOR TEXT ANALYTICS Raphael Polig, Kubilay Atasu, Christoph Hagleitner IBM Research - Zurich Rueschlikon, Switzerland email: pol, kat, hle@zurich.ibm.com ABSTRACT

More information

Index Terms- Field Programmable Gate Array, Content Addressable memory, Intrusion Detection system.

Index Terms- Field Programmable Gate Array, Content Addressable memory, Intrusion Detection system. Dynamic Based Reconfigurable Content Addressable Memory for FastString Matching N.Manonmani 1, K.Suman 2, C.Udhayakumar 3 Dept of ECE, Sri Eshwar College of Engineering, Kinathukadavu, Coimbatore, India1

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Shijie Zhou, Yun R. Qu, and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering, University of Southern

More information

Configurable String Matching Hardware for Speeding up Intrusion Detection

Configurable String Matching Hardware for Speeding up Intrusion Detection Configurable String Matching Hardware for Speeding up Intrusion Detection Monther Aldwairi, Thomas Conte, Paul Franzon Dec 6, 2004 North Carolina State University {mmaldwai, conte, paulf}@ncsu.edu www.ece.ncsu.edu/erl

More information

High Throughput Energy Efficient Parallel FFT Architecture on FPGAs

High Throughput Energy Efficient Parallel FFT Architecture on FPGAs High Throughput Energy Efficient Parallel FFT Architecture on FPGAs Ren Chen Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, USA 989 Email: renchen@usc.edu

More information

Single Pass Connected Components Analysis

Single Pass Connected Components Analysis D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected

More information

SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA. Hoang Le, Weirong Jiang, Viktor K. Prasanna

SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA. Hoang Le, Weirong Jiang, Viktor K. Prasanna SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA Hoang Le, Weirong Jiang, Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los

More information

ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching

ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching Marco Paolieri, Ivano Bonesana ALaRI, Faculty of Informatics University of Lugano, Lugano, Switzerland {paolierm, bonesani}@alari.ch

More information

Deep Packet Inspection of Next Generation Network Devices

Deep Packet Inspection of Next Generation Network Devices Deep Packet Inspection of Next Generation Network Devices Prof. Anat Bremler-Barr IDC Herzliya, Israel www.deepness-lab.org This work was supported by European Research Council (ERC) Starting Grant no.

More information

Automated Incremental Design of Flexible Intrusion Detection Systems on FPGAs 1

Automated Incremental Design of Flexible Intrusion Detection Systems on FPGAs 1 Automated Incremental Design of Flexible Intrusion Detection Systems on FPGAs 1 Zachary K. Baker and Viktor K. Prasanna University of Southern California, Los Angeles, CA, USA zbaker@halcyon.usc.edu, prasanna@ganges.usc.edu

More information

Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching

Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching Weirong Jiang, Yi-Hua E. Yang and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of

More information

An Architecture for IPv6 Lookup Using Parallel Index Generation Units

An Architecture for IPv6 Lookup Using Parallel Index Generation Units An Architecture for IPv6 Lookup Using Parallel Index Generation Units Hiroki Nakahara, Tsutomu Sasao, and Munehiro Matsuura Kagoshima University, Japan Kyushu Institute of Technology, Japan Abstract. This

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Memory-efficient and fast run-time reconfiguration of regularly structured designs

Memory-efficient and fast run-time reconfiguration of regularly structured designs Memory-efficient and fast run-time reconfiguration of regularly structured designs Brahim Al Farisi, Karel Heyse, Karel Bruneel and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat

More information

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

Packet Inspection on Programmable Hardware

Packet Inspection on Programmable Hardware Abstract Packet Inspection on Programmable Hardware Benfano Soewito Information Technology Department, Bakrie University, Jakarta, Indonesia E-mail: benfano.soewito@bakrie.ac.id In the network security

More information

A closer look at network structure:

A closer look at network structure: T1: Introduction 1.1 What is computer network? Examples of computer network The Internet Network structure: edge and core 1.2 Why computer networks 1.3 The way networks work 1.4 Performance metrics: Delay,

More information

Resource-efficient regular expression matching architecture for text analytics

Resource-efficient regular expression matching architecture for text analytics Resource-efficient regular expression matching architecture for text analytics Kubilay Atasu IBM Research - Zurich Presented at ASAP 2014 SystemT: an algebraic approach to declarative information extraction

More information

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava Hardware Acceleration in Computer Networks Outline Motivation for hardware acceleration Longest prefix matching using FPGA Hardware acceleration of time critical operations Framework and applications Contracted

More information

Fast Reconfiguring Deep Packet Filter for 1+ Gigabit Network

Fast Reconfiguring Deep Packet Filter for 1+ Gigabit Network Fast Reconfiguring Deep Packet Filter for + Gigabit Network Young H. Cho and William H. Mangione-Smith {young,billms}@ee.ucla.edu University of California, Los Angeles Department of Electrical Engineering

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

HARDWARE-ACCELERATED REGULAR EXPRESSION MATCHING FOR HIGH-THROUGHPUT TEXT ANALYTICS

HARDWARE-ACCELERATED REGULAR EXPRESSION MATCHING FOR HIGH-THROUGHPUT TEXT ANALYTICS HARDWARE-ACCELERATED REGULAR EXPRESSION MATCHING FOR HIGH-THROUGHPUT TEXT ANALYTICS Kubilay Atasu, Raphael Polig, Christoph Hagleitner IBM Research - Zurich email: {kat,pol,hle}@zurich.ibm.com Frederick

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Shreyas G. Singapura, Anand Panangadan and Viktor K. Prasanna University of Southern California, Los Angeles CA 90089, USA, {singapur,

More information

Packet Header Analysis and Field Extraction for Multigigabit Networks

Packet Header Analysis and Field Extraction for Multigigabit Networks Packet Header Analysis and Field Extraction for Multigigabit Networks Petr Kobierský Faculty of Information Technology Brno University of Technology Božetěchova 2, 612 66, Brno, Czech Republic Email: ikobier@fit.vutbr.cz

More information

On the parallelization of slice-based Keccak implementations on Xilinx FPGAs

On the parallelization of slice-based Keccak implementations on Xilinx FPGAs On the parallelization of slice-based Keccak implementations on Xilinx FPGAs Jori Winderickx, Joan Daemen and Nele Mentens KU Leuven, ESAT/COSIC & iminds, Leuven, Belgium STMicroelectronics Belgium & Radboud

More information

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and

More information

Regular Expression Matching for Reconfigurable Packet Inspection

Regular Expression Matching for Reconfigurable Packet Inspection Regular Expression Matching for Reconfigurable Packet Inspection João Bispo, Ioannis Sourdis #,João M.P. Cardoso and Stamatis Vassiliadis # # Computer Engineering, TU Delft, The Netherlands, {sourdis,

More information

Developing a Data Driven System for Computational Neuroscience

Developing a Data Driven System for Computational Neuroscience Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate

More information

AN FPGA BASED ARCHITECTURE FOR COMPLEX RULE MATCHING WITH STATEFUL INSPECTION OF MULTIPLE TCP CONNECTIONS

AN FPGA BASED ARCHITECTURE FOR COMPLEX RULE MATCHING WITH STATEFUL INSPECTION OF MULTIPLE TCP CONNECTIONS AN FPGA BASED ARCHITECTURE FOR COMPLEX RULE MATCHING WITH STATEFUL INSPECTION OF MULTIPLE TCP CONNECTIONS Claudio Greco, Enrico Nobile, Salvatore Pontarelli, Simone Teofili CNIT/University of Rome Tor

More information

Networks-on-Chip Router: Configuration and Implementation

Networks-on-Chip Router: Configuration and Implementation Networks-on-Chip : Configuration and Implementation Wen-Chung Tsai, Kuo-Chih Chu * 2 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan,

More information

NETWORK INTRUSION DETECTION SYSTEMS ON FPGAS WITH ON-CHIP NETWORK INTERFACES

NETWORK INTRUSION DETECTION SYSTEMS ON FPGAS WITH ON-CHIP NETWORK INTERFACES In Proceedings of International Workshop on Applied Reconfigurable Computing (ARC), Algarve, Portugal, February 2005. NETWORK INTRUSION DETECTION SYSTEMS ON FPGAS WITH ON-CHIP NETWORK INTERFACES Christopher

More information

PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho

PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database Johnny Ho Supervisor: Guy Lemieux Date: September 11, 2009 University of British Columbia

More information

Multi-dimensional Packet Classification on FPGA: 100 Gbps and Beyond

Multi-dimensional Packet Classification on FPGA: 100 Gbps and Beyond Multi-dimensional Packet Classification on FPGA: 00 Gbps and Beyond Yaxuan Qi, Jeffrey Fong 2, Weirong Jiang 3, Bo Xu 4, Jun Li 5, Viktor Prasanna 6, 2, 4, 5 Research Institute of Information Technology

More information

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid

More information

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2 ISSN 2277-2685 IJESR/November 2014/ Vol-4/Issue-11/799-807 Shruti Hathwalia et al./ International Journal of Engineering & Science Research DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL ABSTRACT

More information

Evaluating Energy Efficiency of Floating Point Matrix Multiplication on FPGAs

Evaluating Energy Efficiency of Floating Point Matrix Multiplication on FPGAs Evaluating Energy Efficiency of Floating Point Matrix Multiplication on FPGAs Kiran Kumar Matam Computer Science Department University of Southern California Email: kmatam@usc.edu Hoang Le and Viktor K.

More information

Interlaken Look-Aside Protocol Definition

Interlaken Look-Aside Protocol Definition Interlaken Look-Aside Protocol Definition Contents Terms and Conditions This document has been developed with input from a variety of companies, including members of the Interlaken Alliance, all of which

More information

Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications

Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications 46 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.3, March 2008 Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications

More information

Implementation and Analysis of Large Receive Offload in a Virtualized System

Implementation and Analysis of Large Receive Offload in a Virtualized System Implementation and Analysis of Large Receive Offload in a Virtualized System Takayuki Hatori and Hitoshi Oi The University of Aizu, Aizu Wakamatsu, JAPAN {s1110173,hitoshi}@u-aizu.ac.jp Abstract System

More information

Hybrid Regular Expression Matching for Deep Packet Inspection on Multi-Core Architecture

Hybrid Regular Expression Matching for Deep Packet Inspection on Multi-Core Architecture Hybrid Regular Expression Matching for Deep Packet Inspection on Multi-Core Architecture Yan Sun, Haiqin Liu, Victor C. Valgenti, and Min Sik Kim School of Electrical and Computer Engineering Washington

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

소프트웨어기반고성능침입탐지시스템설계및구현

소프트웨어기반고성능침입탐지시스템설계및구현 소프트웨어기반고성능침입탐지시스템설계및구현 KyoungSoo Park Department of Electrical Engineering, KAIST M. Asim Jamshed *, Jihyung Lee*, Sangwoo Moon*, Insu Yun *, Deokjin Kim, Sungryoul Lee, Yung Yi* Department of Electrical

More information

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Improving Signature Matching using Binary Decision Diagrams

Improving Signature Matching using Binary Decision Diagrams Improving Signature Matching using Binary Decision Diagrams Liu Yang, Rezwana Karim, Vinod Ganapathy Rutgers University Randy Smith Sandia National Labs Signature matching in IDS Find instances of network

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

On the Deployment of AQM Algorithms in the Internet

On the Deployment of AQM Algorithms in the Internet On the Deployment of AQM Algorithms in the Internet PAWEL MROZOWSKI and ANDRZEJ CHYDZINSKI Silesian University of Technology Institute of Computer Sciences Akademicka 16, Gliwice POLAND pmrozo@go2.pl andrzej.chydzinski@polsl.pl

More information

Fast and Reconfigurable Packet Classification Engine in FPGA-Based Firewall

Fast and Reconfigurable Packet Classification Engine in FPGA-Based Firewall 2011 International Conference on Electrical Engineering and Informatics 17-19 July 2011, Bandung, Indonesia Fast and Reconfigurable Packet Classification Engine in FPGA-Based Firewall Arief Wicaksana #1,

More information

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS)

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS) The University of Akron IdeaExchange@UAkron Mechanical Engineering Faculty Research Mechanical Engineering Department 2008 High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS) Ajay

More information

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box

More information

Architecture and Performance Models for Scalable IP Lookup Engines on FPGA*

Architecture and Performance Models for Scalable IP Lookup Engines on FPGA* Architecture and Performance Models for Scalable IP Lookup Engines on FPGA* Yi-Hua E. Yang Xilinx Inc. Santa Clara, CA edward.yang@xilinx.com Yun Qu* Dept. of Elec. Eng. Univ. of Southern California yunqu@usc.edu

More information

Efficient Packet Classification for Network Intrusion Detection using FPGA

Efficient Packet Classification for Network Intrusion Detection using FPGA Efficient Packet Classification for Network Intrusion Detection using FPGA ABSTRACT Haoyu Song Department of CSE Washington University St. Louis, USA hs@arl.wustl.edu FPGA technology has become widely

More information

AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4. Bas Breijer, Filipa Duarte, and Stephan Wong

AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4. Bas Breijer, Filipa Duarte, and Stephan Wong AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4 Bas Breijer, Filipa Duarte, and Stephan Wong Computer Engineering, EEMCS Delft University of Technology Mekelweg 4, 2826CD, Delft, The Netherlands email:

More information

Hash-Based String Matching Algorithm For Network Intrusion Prevention systems (NIPS)

Hash-Based String Matching Algorithm For Network Intrusion Prevention systems (NIPS) Hash-Based String Matching Algorithm For Network Intrusion Prevention systems (NIPS) VINOD. O & B. M. SAGAR ISE Department, R.V.College of Engineering, Bangalore-560059, INDIA Email Id :vinod.goutham@gmail.com,sagar.bm@gmail.com

More information

Two-Stage Decomposition of SNORT Rules towards Efficient Hardware Implementation

Two-Stage Decomposition of SNORT Rules towards Efficient Hardware Implementation Two-Stage Decomposition of SNORT Rules towards Efficient Hardware Implementation Hao Chen, Douglas H. Summerville, Yu Chen* Dept. of Electrical and Computer Engineering, SUNY Binghamton, Binghamton, NY

More information

Boundary Hash for Memory-Efficient Deep Packet Inspection

Boundary Hash for Memory-Efficient Deep Packet Inspection Boundary Hash for Memory-Efficient Deep Packet Inspection N. Sertac Artan, Masanori Bando, and H. Jonathan Chao Electrical and Computer Engineering Department Polytechnic University Brooklyn, NY Abstract

More information

A Modular System for FPGA-Based TCP Flow Processing in High-Speed Networks

A Modular System for FPGA-Based TCP Flow Processing in High-Speed Networks A Modular System for FPGA-Based Flow Processing in High-Speed Networks David V. Schuehler and John W. Lockwood Applied Research Laboratory, Washington University One Brookings Drive, Campus Box 1045 St.

More information

Highly Memory-Efficient LogLog Hash for Deep Packet Inspection

Highly Memory-Efficient LogLog Hash for Deep Packet Inspection Highly Memory-Efficient LogLog Hash for Deep Packet Inspection Masanori Bando, N. Sertac Artan, and H. Jonathan Chao Department of Electrical and Computer Engineering Polytechnic Institute of NYU Abstract

More information

MULTIPLEXER / DEMULTIPLEXER IMPLEMENTATION USING A CCSDS FORMAT

MULTIPLEXER / DEMULTIPLEXER IMPLEMENTATION USING A CCSDS FORMAT MULTIPLEXER / DEMULTIPLEXER IMPLEMENTATION USING A CCSDS FORMAT Item Type text; Proceedings Authors Grebe, David L. Publisher International Foundation for Telemetering Journal International Telemetering

More information

A Network Storage LSI Suitable for Home Network

A Network Storage LSI Suitable for Home Network 258 HAN-KYU LIM et al : A NETWORK STORAGE LSI SUITABLE FOR HOME NETWORK A Network Storage LSI Suitable for Home Network Han-Kyu Lim*, Ji-Ho Han**, and Deog-Kyoon Jeong*** Abstract Storage over (SoE) is

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

High Throughput Sketch Based Online Heavy Change Detection on FPGA

High Throughput Sketch Based Online Heavy Change Detection on FPGA High Throughput Sketch Based Online Heavy Change Detection on FPGA Da Tong, Viktor Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 90089, USA.

More information

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade

More information

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor K.Rani Rudramma 1, B.Murali Krihna 2 1 Assosiate Professor,Dept of E.C.E, Lakireddy Bali Reddy Engineering College, Mylavaram

More information

RiceNIC. A Reconfigurable Network Interface for Experimental Research and Education. Jeffrey Shafer Scott Rixner

RiceNIC. A Reconfigurable Network Interface for Experimental Research and Education. Jeffrey Shafer Scott Rixner RiceNIC A Reconfigurable Network Interface for Experimental Research and Education Jeffrey Shafer Scott Rixner Introduction Networking is critical to modern computer systems Role of the network interface

More information

Jakub Cabal et al. CESNET

Jakub Cabal et al. CESNET CONFIGURABLE FPGA PACKET PARSER FOR TERABIT NETWORKS WITH GUARANTEED WIRE- SPEED THROUGHPUT Jakub Cabal et al. CESNET 2018/02/27 FPGA, Monterey, USA Packet parsing INTRODUCTION It is among basic operations

More information

Computer Science at Kent

Computer Science at Kent Computer Science at Kent Regular expression matching with input compression and next state prediction. Gerald Tripp Technical Report No. 3-08 October 2008 Copyright 2008 University of Kent at Canterbury

More information

Area Efficient Z-TCAM for Network Applications

Area Efficient Z-TCAM for Network Applications Area Efficient Z-TCAM for Network Applications Vishnu.S P.G Scholar, Applied Electronics, Coimbatore Institute of Technology. Ms.K.Vanithamani Associate Professor, Department of EEE, Coimbatore Institute

More information

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

More information

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM Mansi Jhamb, Sugam Kapoor USIT, GGSIPU Sector 16-C, Dwarka, New Delhi-110078, India Abstract This paper demonstrates an asynchronous

More information

Reconfigurable Computing. On-line communication strategies. Chapter 7

Reconfigurable Computing. On-line communication strategies. Chapter 7 On-line communication strategies Chapter 7 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design On-line connection - Motivation Routing-conscious temporal placement algorithms consider

More information

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany

More information

Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs 1

Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs 1 Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs 1 Zachary K. Baker and Viktor K. Prasanna zbaker@usc.edu, prasanna@ganges.usc.edu Abstract This paper presents a methodology and a

More information

FPGA Provides Speedy Data Compression for Hyperspectral Imagery

FPGA Provides Speedy Data Compression for Hyperspectral Imagery FPGA Provides Speedy Data Compression for Hyperspectral Imagery Engineers implement the Fast Lossless compression algorithm on a Virtex-5 FPGA; this implementation provides the ability to keep up with

More information

Scalable Enterprise Networks with Inexpensive Switches

Scalable Enterprise Networks with Inexpensive Switches Scalable Enterprise Networks with Inexpensive Switches Minlan Yu minlanyu@cs.princeton.edu Princeton University Joint work with Alex Fabrikant, Mike Freedman, Jennifer Rexford and Jia Wang 1 Enterprises

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

Low Cost Network on Chip Router Design for Torus Topology

Low Cost Network on Chip Router Design for Torus Topology IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.5, May 2017 287 Low Cost Network on Chip Router Design for Torus Topology Bouraoui Chemli and Abdelkrim Zitouni Electronics

More information

A Hardware Filesystem Implementation for High-Speed Secondary Storage

A Hardware Filesystem Implementation for High-Speed Secondary Storage A Hardware Filesystem Implementation for High-Speed Secondary Storage Dr.Ashwin A. Mendon, Dr.Ron Sass Electrical & Computer Engineering Department University of North Carolina at Charlotte Presented by:

More information