SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA. Hoang Le, Weirong Jiang, Viktor K. Prasanna

Size: px
Start display at page:

Download "SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA. Hoang Le, Weirong Jiang, Viktor K. Prasanna"

Transcription

1 SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA Hoang Le, Weirong Jiang, Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 90089, USA {hoangle, weirongj, ABSTRACT Most high-speed Internet Protocol (IP) lookup implementations use tree traversal and pipelining. However, this approach results in inefficient memory utilization. Due to available on-chip memory and pin limitations of FPGAs, stateof-the-art designs on FPGAs cannot support large routing tables arising in backbone routers. Therefore, ternary content addressable memory (TCAM) is widely used. We propose a novel SRAM-based linear pipeline architecture, named DuPI. Using a single Virtex-4, DuPI can support a routing table of up to 228K prefixes, which is 3 the state-of-the-art. Our architecture can also be easily partitioned, so as to use external SRAM to handle even larger routing tables (up to 2 M prefixes), while maintaining a 324 MLPS throughput. The use of SRAM (instead of TCAM) leads to orders of magnitude of reduction in power dissipation. Employing caching to exploit Internet traffic locality, we can achieve a throughput of 1.3 GLPS (billion lookups per second). Our design also maintains packet input order, and supports in-place nonblocking route updates. 1. INTRODUCTION Most hardware-based solutions for high speed packet forwarding in routers fall into two main categories: ternary content addressable memory (TCAM)-based and dynamicstatic random access memory (DRAM/SRAM)-based solutions. Although TCAM-based engines can retrieve results in just one clock cycle, their throughput is limited by the relatively low speed of TCAMs. They are expensive and offer little adaptability to new addressing and routing protocols [1]. As shown in Table 1, SRAMs outperform TCAMs with respect to speed, density, and power consumption. Since SRAM-based solutions utilize some kind of tree traversal, they require multiple cycles to perform a single IP lookup. Several researchers have explored pipelining to improve the throughput. A simple pipelining approach is to map each SUPPORTED BY THE UNITED STATES NATIONAL SCIENCE FOUNDATION UNDER GRANT NO. CCF EQUIPMENT GRANT FROM XILINX INC. IS GRATEFULLY ACKNOWLEDGED. Table 1. Comparison of TCAM and SRAM technologies TCAM (18 SRAM (18 Mb chip) Mb chip) Maximum clock rate (MHz) 266 [2] 400 [3, 4] Cell size (# transistors/bit) [5] 16 6 Power consumption (Watts) [6] 0.1 [7] tree level onto a pipeline stage with its own memory and processing logic. One packet can be processed every clock cycle. However, these designs result in inefficient memory utilization, since each node must store the addresses of its child nodes. This inefficiency dictates the size of the routing table that an SRAM-based solution can support. In addition, since each stage needs its own memory, it is not feasible to use external SRAM for all stages, due to the constraint on the number of I/O pins. We have two constraints: the number of external SRAM banks and the limited amount of onchip memory. These two constraints are interdependent and make the current solutions very difficult to scale to support larger routing tables. This scalability has been a dominant issue for any implementation on FPGAs. The key issues in designing an architecture for IP lookup are (1) the size of supported routing table, (2) high throughput, (3) in-order packet output, (4) incremental update, and (5) power consumption. To address these challenges, we propose and implement a scalable, high-throughput SRAMbased dual linear pipeline architecture for IP Lookup on FP- GAs (DuPI). This paper makes the following contributions: To the best of our knowledge, this architecture is the first binary-tree-based design to use on-chip FPGA resources only to support a large routing table up to 228K prefixes. This is 3 times the size of a Mae-West routing table (rrc08, , prefixes) [8]. DuPI is also the first architecture that can easily interface with external SRAM. Using this we can handle up to 2M prefixes, which is 8 times the size of the current largest routing table (rrc11, , prefixes) [8].

2 The implementation results show a sustained throughput of 324 MLPS, whether or not off-chip commodity SRAM is used, for a non-cache design, and 1.3 GLPS for cache-based design. This is a promising solution for next generation IP routers. The rest of the paper is organized as follows. Section 2 covers the background and related work. Section 3 introduces the DiPI architecture. Section 4 describes DuPI implementation. Section 5 presents implementation results. Section 6 concludes the paper. 2. BACKGROUND AND RELATED WORK 2.1. Trie-based IP Lookup The IP lookup problem is longest prefix matching (LPM) problem. The common data structure in algorithmic solutions for performing LPM is some form of tree, such as trie [9]. A trie is a binary-tree-like data structure for LPM. Each prefix is represented by a node in the trie, and the value of the prefix corresponds to the path from the root of the tree to the node. The prefix bits are scanned left to right. If the scanned bit is 0, the node has a child to the left. A bit of 1 indicates a child to the right. IP lookup is performed by traversing the trie according to the bits in the IP address. When a leaf is reached, the last seen prefix along the path to the leaf is the longest matching prefix for the IP address. The time to look up a uni-bit trie (which is traversed in a bit-by-bit fashion), is equal to the prefix length. The use of multiple bits in one scan increases the search speed. Such a trie is called a multi-bit trie Related Work Since the proposed work addresses FPGA implementation, we summarize related work in this area. TCAM is widely used to simplify the complexity of the designs. However, TCAM results in lower overall clock speed and increases the power consumption of the entire system. Song et al. [10] introduce an architecture called BV-TCAM, which combines the TCAM and the Bit Vector (BV) algorithm to compress effectively the data representations and boost the throughput. Due to the relatively low clock rate of TCAMs, this design can only handle a lookup rate of about 30 MLPS. The fastest IP lookup implementation on FPGAs to date is reported in [11], which can achieve a lookup rate of 325 MLPS. This is a bidirectional optimized linear pipeline architecture, named BiOLP, which takes advantage of the dualported SRAM to map the prefix trie in both directions. By doing this, BiOLP achieves a perfectly balanced memory allocation over all pipeline stages. BiOLP also supports a Mae-West routing table (rrc08, 84K prefixes). Another very fast IP lookup implementation on FPGAs to date is described in [12], which can achieve a lookup rate of 263 MLPS. Their architecture takes advantage of both a traditional hashing scheme and reconfigurable hardware. They implement only the colliding prefixes (prefixes that have the same hashing value), on reconfigurable hardware, and the remaining prefixes in a main table in memory. This architecture supports a Mae-West routing table (rrc08, 84K prefixes), and can be updated using partial reconfiguration when adding or removing prefixes. However, it does not support incremental update, and the update time is lower bounded by the time required for partial reconfiguration. It is also not clear how to scale this design to support larger routing tables, due to the nondeterministic characteristic of the hashing function. Moreover, the power consumption of this design is potentially high, due to the large number of logic resources utilized. Baboescu et al. [17] propose a Ring pipeline architecture for tree-based search engines. The pipeline stages are configured in a circular, multi-point access pipeline so that the search can be initiated at any stage. This architecture is implemented in [18] and achieves a throughput of 125 MLPS. Sangireddy et al. [13] propose two algorithms, Elevator- Stairs and log W-Elevators, which are scalable and memory efficient. However, their designs can only achieve up to MLPS. Meribout et al. [14] present another architecture, with the lookup speed of 66 MLPS. In this design, a commodity Random Access Memory (RAM) is needed, and the achieved lookup rate is reasonably low. 3. DuPI ARCHITECTURE 3.1. Binary-tree-based IP Lookup We propose a memory efficient data structure based on a binary tree. Binary search tree (BST) is a special binary tree data structure with the following properties: (1) each node has a value; (2) the left subtree of a node contains only values less than the node s value; (3) the right subtree of a node contains only values greater than the node s value. In an optimal binary search tree, an element can be found in at most (1 + log 2 N) operations, where N is the number of nodes. Figure 1 illustrates a sample prefix set and its corresponding binary search tree. For simplicity, IP addresses with length of 8 bits are considered. Prefixes are padded with ones as shown in the third column. The fourth column is the number of padded bits. The padded prefix and its number of padded bits are concatenated, and this value is used to build the tree. However, the value that is stored in each node is the concatenation of the padded prefix and the length of the original prefix. For example, node #4 has the value of , not All the prefixes are sorted in descending order, as shown in the last column. The header of an incoming packet is extracted and enters the tree from its root. At each node, only the k most significant bits of the node s padded prefix and the packet s IP address are compared, where k is the length of the node s prefix. Given such a binary search tree, IP lookup is performed by traversing left or right depending on the comparison re-

3 Prefix 1 padded # padded Sorted prefix bits rank P1 0* P2 000* P3 010* P * P * P6 011* P7 110* P8 111* (a) Prefix set IP Address Pipeline 1 Pipeline 2 Address Addr1 Data1 Dual Port SRAM Addr2 Data2 LFIn LFOut B B IP Comp A B>A Address Address IP A B>A Comp B B LFIn LFOut IP Fig. 2. Block diagram of DuPI architecture (b) Binary search tree Fig. 1. Sample prefix set and its binary search tree sult at each node. If the packet header IP is smaller or equal to node s value, it is forwarded to the left branch, and to the right branch otherwise. For example, assume that a packet with header s IP of arrives. At the root, the prefix 011, with length of 3, is compared with 010 of the header IP, which is smaller. Thus packet traverses to the left. The comparison with value in node #6 yields a smaller outcome, hence packet again traverses to the left. At node #7, the packet header matches the node s prefix, and is forwarded to the left to find a longer prefix, if any. However, no match has been found at node #8, and hence, the prefix at node #7 (or P4) is the longest matched prefix. We must ensure that the proposed algorithm actually finds the longest matching prefix. Given two prefixes, P A and P B, P A is a longer matching prefix than P B iff P B is included in P A. This is referred to as P A is longer than P B hereafter. Property: Given two prefixes, P A and P B,ifP A is longer than P B then P A belongs to the left branch of P B. Let P A,P B be P A,P B after 1-padding, respectively. Since P A is longer than P B, and prefixes are 1-padded, all bits of P A and P B are identical in all cases, except for the bits that make P A longer than P B. If these bits are all 1, we have P A = P B, else P A <P B. The use of the number of padded bits helps break the tie. As P A is longer than P B, the number of non-prefix bits of P A is smaller than that of P B, causing P A <P B.Byproperty #2 of BST, P A belongs to the left branch of P B. Therefore, in all cases, the above property is satisfied DuPI Architecture A binary tree structure is utilized in our design. To ensure that every IP lookup results in the same number of operations or cycles, the IP address continues with all the comparisons even though a match may have already been found. A pipelining technique is used to increase the throughput. The number of pipeline stages is determined by the maxi- mum number of operations needed to traverse the tree. For the design to work, each stage has its own memory (or table). Each table is one level of a binary tree structure, so the memory size doubles in each stage. For example, the first stage has only one element, the second one has two, the third one has four, and so on. The maximum number of elements in each stage is determined by 2 n, where n is the stage number. The block diagram of the basic architecture and a single stage are shown in Figure 2. The architecture is configured as dual linear pipelines. At each stage, the memory has dual Read/Write ports so that two packets can be input every clock cycle. The content of each entry in the memory includes (1) the padded prefix of the current node; (2) the length of the prefix; and (3) the status of this prefix (up/down). At each stage, there are 4 data that are forwarded from the previous stage: the IP address of the package, the address to access the memory, the length, and flow information of the previously longest matched prefix. The memory address is used to retrieve the node s value, which is compared with the packet s IP address to determine matching status. If there is a match, and it is longer than the previously stored match, the length and flow information of the new match replace the old ones. The IP address is forwarded left or right depending on the comparison result, as described in Section 3.1. The comparison result (1 if packet s IP is greater than node s padded prefix, 0 otherwise) is concatenated with the current memory address and forwarded to the next stage Tree Mapping Algorithm A complete BST is required for efficient memory utilization. All levels must be fully occupied, except for the last one. In the last level, if it is not full, all elements must be as far left as possible. Given a sorted array of elements in ascending order, this BST can easily be built by picking the right pivot as the root and recursively building the left and right subtrees. Two cases of complete BSTs are illustrated in Figure 3. Let N be the number of elements, n be the number of

4 0 Write Bubble ID 1 Write Bubble Table I1 I2 I3 I4 I5 I6 I7 I8 n-2 n-1 (a) Fig. 3. Two cases of complete BST Cache Module Cache Module Miss Miss Hit Hit DELAY DuPBI (b) O1 O2 O3 O4 O5 O6 O7 O8 Fig. 4. Top level block diagram of cache-based DuPI levels, and Δ be the number of elements in the last level. The total number of nodes in all stages, excluding the last one, is 2 n 1 1. Therefore, the number of nodes in the last stage is Δ=N (2 n 1 1). There are 2 n 1 nodes in the last stage if it is full. If Δ 2 n 1 /2, we have a complete BST, as in Figure 3 (a), or (b) otherwise. Let x be the index of the desired root. x can be calculated as: x =2 n 2 1+Δ for case (a), or x =2 n 1 1 for case (b). The complete BST can be built recursively, as described in Algorithm 1. Algorithm 1 COMPLETEBST(SORTED ARRAY) Input: Array of N elements is sorted in ascending order Output: Complete BST 1: n = log 2 (N +1), Δ=N (2 n 1 1) 2: if (Δ 2 n 1 /2) then 3: x =2 n 2 1+Δ 4: else 5: x =2 n 1 1 6: end if 7: Pick element x as root 8: Left-branch of x = COMPLETEBST(left-of-x sub-array) 9: Right-branch of x = COMPLETEBST(right-of-x sub-array) 3.4. Cache-based DuPI The proposed architecture can be utilized as an engine in a cached-based DuPI architecture, as shown in Figure 4. Our experiments (Section 5.1) show that 4 inputs per pipeline is the optimal number. Therefore, the cache-based architecture consists of seven modules: cache ( 2), 4-1 ( 2), DuPI, Delay, and 5-4 module ( 2). Notation m n denotes a with m inputs and n outputs. At the front are the two cache modules, which take in up to 8 IP addresses at a time. These modules take advantage of the internet traffic locality due to the TCP mechanism and application characteristics [15]. The most recently searched packets are cached. Any arriving packet accesses the cache Addr New Content WE Dual-Port SRAM Fig. 5. Route update using write-bubbles first. If a cache hit occurs, the packet will skip traversing the pipeline. Otherwise, the packet must traverse the pipeline. For IP lookup, only the destination IP of the packet is used to index the cache. The cache update will be triggered, either when there is a route update that is related to some cached entry, or after a packet that previously had a cache miss retrieves its search result from the pipeline. Any replacement algorithm can be used to update the cache. The Least Recently Used (LRU) algorithm is used in the implementation. We can insert multiple packets per clock cycle as long as there are enough copies of the cache for those packets to access simultaneously. We can insert at most four packets during one clock cycle per pipeline. Without caching, the packet input order is maintained due to the linear architecture. However, with caching, the packet which has a cache hit will skip traversing the pipeline, and may go out of order. We add buffers to delay outputting the packets with a cache hit, as shown in Figure 4 (Delay module). The length of the delay buffer is equal to the sum of the pipeline depth and the queue length. By these means, the packet input order can be preserved. Packets coming out of the Delay module, as well as each pipeline are buffered in 5-4 output s Route Update We define two types of updates: in-place update and newroute update. Once the FPGA is configured and a routing table is stored, all updates on these prefixes are defined as in-place updates. These updates include change flow information and bring up or down a prefix. We can perform inplace update by inserting write bubbles [16] (Figure 5). The new content of the memory is computed off-line. When an in-place update is initiated, a write bubble is inserted into the pipeline. Each write bubble is assigned an ID. There is one write bubble table (WBT) in each stage. It stores the update information associated with the write bubble ID. When it arrives at the stage prior to the stage to be updated, the write bubble uses its ID to look up the WBT. Then it retrieves (1) the memory address to be updated in the next stage, (2) the new content for that memory location, and (3) a write enable bit. If the write enable bit is set, the write bubble will use the new content to update the memory location in the next stage. For new-route update, if the structure of the tree is changed, the BST must be rebuilt, and the entire memory content of each pipeline stage must be reloaded.

5 4. DuPI IMPLEMENTATION 4.1. Cache and Modules As mentioned above, full associativity and LRU are used to implement the cache module. Our experiments show that 16-entry cache is sufficient for our purposes. Since this is a tree-based architecture, all the prefixes are stored in block RAM (BRAM). It is desirable that a large amount of BRAM is used to store routing tables. Therefore, this module is implemented using only registers and logic. There are two types of s in the architecture, as shown in Figure 4. Even though they have different numbers of input and output ports, their functionalities are identical. Since the number of inputs is different from the number of outputs in each, and there is only one clock for both the writing and reading sides, these are synchronous s with conversion. For simplicity, no handshaking feature is implemented, and therefore any further arriving packet is dropped when the is full. Similar to the cache implementation, the two s are implemented using only registers and logic, to save BRAM for routing tables. Details of the implementations are not described in this paper due to page limitations DuPI As mentioned earlier, the memory size in each stage doubles that of the previous stage. Therefore, if they are full, stage 0 has one entry, stage 1 has two entries,, stage n has 2 n entries. Each entry includes (1) a padded prefix (32 bits), (2) a prefix length (5 bits), (3) flow information (4 bits), and (4) an active status (1 bit). That makes a total of 42 bits per entry. On a Xilinx FPGA, BRAM comes in blocks of 18Kb, which can hold up to 438 entries. Hence, stages from 0 to 8 need one block of BRAM for each stage, or 9 blocks total. We can avoid these inefficient BRAM by using distributed RAM, due to the unit block of only 16 bit. However, this optimization adds only 4K prefixes to the total, which is not significant. Our target chip, Virtex-4 FX140, has 9936Kb of BRAM on chip, or 552 blocks. With this amount of memory, we can have 17 full stages, which hold 128K prefixes, and one half-full stage, which holds 100K entries, for a total of 228K prefixes. The distributions of prefixes and BRAMs are shown in Table 2 for stages 9 to 17. Our mapping algorithm ensures that all entries of the last stage are left-aligned, and thus can be safely mapped onto 228 memory blocks without getting out-of-bound memory access violations. Our design utilizes 539 memory blocks (311 for first 17 stages, 228 for the last stage), or about 97.6% of available memory. Let S be the size of a routing table, and assume that 128K <S<228K. The number of memory blocks needed can be calculated: (S 128K)/18 Kb. In our design, external SRAM can be used to handle larger routing tables. Since only the last stage is allowed not to be full, we use 311 blocks (or 56%) of on-chip BRAM for the first 17 full stages, and move the subsequent stages Table 2. Number of prefixes and BRAMs in stages 9 to 17 Stage # # prefixes 512 1K 2K 4K 8K 16K 32K 64K 100K # BRAMs Table 3. Number of prefixes and amount of SRAM SRAM (Mb) # external stages # prefixes 256K 512K 1M 2M onto external SRAM. In the current market, SRAM comes in 2 Mb to 32 Mb packages [4], with data widths of 18, 32, or 36 bits, and a minimum access frequency of 250 MHz. Since each entry needs 42 bits, we must use two chips to make 50 bits (18+32). Each stage uses dual port memory, which requires two address and two data ports. Stage #17, which stores 128K prefixes, needs 6 Mb SRAM and 270 pins (2 17-bit address+2 50-bit data), going into FPGA chip. Similarly, stage #18, which stores 256K prefixes, needs 12 Mb SRAM and 272 pins. A largest Virtex-4 package, which has 1517 I/O pins, can interface with up to four stacks of dual port SRAM. Using this package type, we can have up to 21 full stages that can hold a routing table of 2M prefixes. Moreover, since the access frequency of SRAM is twice as fast as that of our target frequency, the use of external SRAM should not adversely affect the performance of our design. Table 3 describes the relationship between the number of prefixes supported, the amount of external SRAM needed, and the number of external stages. 5. IMPLEMENTATION RESULTS 5.1. Throughput We implemented the proposed architecture in VHDL, using Xilinx ISE 9.1, and Virtex-4 FX140 as the target. The implementation results show a minimum clock period of ns, or a maximum frequency of 162 MHz. Thus, DuPI can handle a lookup rate of 324 MLPS, or over 100 Gbps data rate (with the minimum packet size of 40 bytes), which is 7.5 times the OC-256 rate. We conducted some experiments on the cache size of the architecture to analyze its impact on the throughput. We found that caching is effective in improving the throughput. Even with only 1% of the routing entries being cached, the throughput reached almost 4 packets processed per cycle (PPC), per pipeline, or 8 PPC total. Hence, the overall throughput was as high as =1.3G packets per second, i.e 416 Gbps for the minimum packet size of 40 bytes, which is 2.6 times the OC-3072 rate. Such a throughput is also 144% higher than that of the state-ofthe-art TCAM-based IP lookup engines [6] Performance Comparison Two key comparisons were performed with respect to the size of supported routing table and throughput. The two candidates were (1) the Ring architecture [17, 18] and (2) the state of the art architecture on FPGA [12], since they

6 can support the largest routing table to date and have the highest throughput. All the resource data were normalized to Virtex-4 FX140, as shown in Table 4. With 324 MLPS throughput, our design was faster than Architecture 1 (125 MLPS) and Architecture 2 (263 MLPS). Using only BRAM, DuPI outperformed the above two architectures with respect to the size of the supported routing table (228K vs. 80K). The resource utilization was slightly higher in this design compared to Architecture 1, but was about half of Architecture 2. Furthermore, our architecture supports static/dynamic RAM incremental update at run time without any CAD tool involvement (as does Architecture 1), by inserting the write bubble whenever there is an update. In contrast, Architecture 2 relies on partial reconfiguration, which requires modifying, re-synthesizing, and reimplementing the code to generate the partial bitstream. Our design also supports in-order output packets, and has lower power consumption (lower utilized resources). With regards to scalability, DuPI can be partitioned to use BRAM+SRAM, as discussed in Section 4.2, to support larger routing tables of up to 2M prefixes. This can be done without sacrificing the sustained throughput. The results are shown on Table 4 as Architecture 5 and Routing table size vs. Throughput trade-off As shown in Table 4, our proposed architecture used at most 12.3% of the chip area. Hence, the same design can be duplicated to take advantage of the available resources. As described in Section 4.2, up to 4 banks of dual-port SRAMs can be connected to the largest Virtex-4 package. We can duplicate the design 2 or 4. When the design is duplicated 2, due to the limited amount of BRAM, we can fit only 16 stages on chip, and 2 stages on external SRAM, for each duplication. This architecture can support routing tables of up to 1M prefixes, with a throughput of 648 MLPS for non-cache based design, and 2.6 GLPS for cache-based design. Similarly, with 4 duplication, 15 stages can fit on chip, and 1 stage on external SRAM, for each duplication. This configuration supports routing tables of up to 512K prefixes, with a throughput of 1.3 GLPS for non-cache based design, and 5.2 GLPS for cache-based design. The resource utilization is approximately 25% and 50%, for cache-based implementations of 2 and 4 duplication, respectively. 6. CONCLUDING REMARKS This paper proposed and implemented a SRAM-based dual pipeline architecture for IP Lookup, named DuPI, without using external TCAM. By using a binary search tree algorithm, the address of the child node can be eliminated, resulting in a very efficient memory utilization. Therefore, DuPI can support large routing tables of up to 228K prefixes, using on-chip BRAM. This is 3 times the size of a Mae-West routing table (rrc08). Using external SRAM, DuPI can handle even larger routing tables of up to 2M prefixes, which is Table 4. Performance comparison Architecture # slices BRAM # prefix Throughput 1 ([17, 18]) 1405(2.3%) K 125 MLPS 2 ([12]) 14274(22.7%) K 263 MLPS 3 (USC) 2009(3.2%) K 324 MLPS 4 (USC) 7982(12.7%) K 1.3 GLPS 5 (USC) 1813(2.9%) 311 2M 324 MLPS 6 (USC) 7713(12.3%) 311 2M 1.3 GLPS Our proposed architectures: (3) Non-cache-based; (4) Cache-based; (5) Non-cache-based with SRAM; (6) Cache-based with SRAM 8 times the size of the current largest routing table (rrc11, 250K prefixes). DuPI also maintains the packet input order and supports nonblocking route update. By employing packet caching to improve the throughput, DuPI can achieve a high throughput of up to 416 Gbps i.e. 2.6 times the OC rate. If necessary, our architecture can be duplicated to double and quadruple the throughput, with a 2 and 4 reduction in the size of supported routing tables, respectively. We plan to enhance the architecture to support the IPv6 requirement, as well as packet classification, and evaluate its performance in real-life scenarios. 7. REFERENCES [1] F. Baboescu, S. Rajgopal, L. Huang, and N. Richardson, Hardware implementation of a tree based IP lookup algorithm for oc-768 and beyond, in Proc. DesignCon 05, 2005, pp [2] Renesas CAM. [Online]. Available: [3] Cypress SRAMs. [Online]. Available: [4] Samsung SRAMs. [Online]. Available: [5] M. J. Akhbarizadeh, M. Nourani, D. S. Vijayasarathi, and T. Balsara, A non-redundant ternary CAM circuit for network search engines. IEEE Trans. VLSI Syst., vol. 14, no. 3, pp , [6] K. Zheng, C. Hu, H. Lu, and B. Liu, A TCAM-based distributed parallel IP lookup scheme and performance analysis, IEEE/ACM Trans. Netw., vol. 14, no. 4, pp , [7] CACTI. [Online]. Available: [8] RIS Raw Data. [Online]. Available: [9] M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous, Survey and taxonomy of IP address lookup algorithms, IEEE Network, vol. 15, no. 2, pp. 8 23, [10] H. Song and J. W. Lockwood, Efficient packet classification for network intrusion detection using fpga, pp , 2005, [11] H. Le, W. Jiang, and V. K. Prasanna, A sram-based architecture for trie-based ip lookup using fpga, in Proc. FCCM 08. [12] H. Fadishei, M. S. Zamani, and M. Sabaei, A novel reconfigurable hardware architecture for IP address lookup, in Proc. ANCS 05, pp [13] R. Sangireddy, N. Futamura, S. Aluru, and A. K. Somani, Scalable, memory efficient, high-speed ip lookup algorithms, IEEE/ACM Trans. Netw., vol. 13, no. 4, pp , 2005, [14] M. Meribout and M. Motomura, A new hardware algorithm for fast ip routing targeting programmable routers, in Network control and engineering for Qos, security and mobility II. Kluwer Academic Publishers, 2003, pp , [15] J. Verdú, J. Garcí, M. Nemirovsky, and M. Valero, Architectural impact of stateful networking applications, in Proc. ANCS 05, pp [16] A. Basu and G. Narlikar, Fast incremental updates for pipelined forwarding engines, in Proc. INFOCOM 03, pp [17] F. Baboescu, D. M. Tullsen, G. Rosu, and S. Singh, A tree based router search engine architecture with single port memories, in Proc. ISCA 05, pp [18] W. Jiang and V. K. Prasanna, A memory-balanced linear pipeline architecture for trie-based ip lookup, in Proc. HOTI 07, 2007, pp

Scalable High Throughput and Power Efficient IP-Lookup on FPGA

Scalable High Throughput and Power Efficient IP-Lookup on FPGA Scalable High Throughput and Power Efficient IP-Lookup on FPGA Hoang Le and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, USA {hoangle,

More information

Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression

Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression Hoang Le Ming Hsieh Dept. of Electrical Eng. University of Southern California Los Angeles, CA 989, USA Email: hoangle@usc.edu

More information

IP packet forwarding, or simply, IP-lookup, is a classic

IP packet forwarding, or simply, IP-lookup, is a classic Scalable Tree-based Architectures for IPv4/v6 Lookup Using Prefix Partitioning Hoang Le, Student Member, IEEE, and Viktor K. Prasanna, Fellow, IEEE Abstract Memory efficiency and dynamically updateable

More information

Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup

Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup Weirong Jiang, Qingbo Wang and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup

A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup 5th IEEE Symposium on High-Performance Interconnects A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup Weirong Jiang and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering

More information

An Architecture for IPv6 Lookup Using Parallel Index Generation Units

An Architecture for IPv6 Lookup Using Parallel Index Generation Units An Architecture for IPv6 Lookup Using Parallel Index Generation Units Hiroki Nakahara, Tsutomu Sasao, and Munehiro Matsuura Kagoshima University, Japan Kyushu Institute of Technology, Japan Abstract. This

More information

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Shijie Zhou, Yun R. Qu, and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering, University of Southern

More information

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA 2010 International Conference on Field Programmable Logic and Applications Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Weirong Jiang, Viktor K. Prasanna Ming Hsieh Department

More information

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Weirong Jiang, Viktor K. Prasanna University of Southern California Norio Yamagaki NEC Corporation September 1, 2010 Outline

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

Fast and Scalable IP Address Lookup with Time Complexity of Log m Log m (n)

Fast and Scalable IP Address Lookup with Time Complexity of Log m Log m (n) 58 JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, VOL. 5, NO. 2, MAY 214 Fast and Scalable IP Address Lookup with Time Complexity of Log m Log m (n) Abhishant Prakash Motilal Nehru National Institute of

More information

Frugal IP Lookup Based on a Parallel Search

Frugal IP Lookup Based on a Parallel Search Frugal IP Lookup Based on a Parallel Search Zoran Čiča and Aleksandra Smiljanić School of Electrical Engineering, Belgrade University, Serbia Email: cicasyl@etf.rs, aleksandra@etf.rs Abstract Lookup function

More information

High-performance Pipelined Architecture for Tree-based IP lookup Engine on FPGA*

High-performance Pipelined Architecture for Tree-based IP lookup Engine on FPGA* High-performance Pipelined Architecture for Tree-based IP lookup Engine on FPGA* Yun Qu Ming Hsieh Dept. of Electrical Eng. University of Southern California Email: yunqu@usc.edu Viktor K. Prasanna Ming

More information

Multi-dimensional Packet Classification on FPGA: 100 Gbps and Beyond

Multi-dimensional Packet Classification on FPGA: 100 Gbps and Beyond Multi-dimensional Packet Classification on FPGA: 00 Gbps and Beyond Yaxuan Qi, Jeffrey Fong 2, Weirong Jiang 3, Bo Xu 4, Jun Li 5, Viktor Prasanna 6, 2, 4, 5 Research Institute of Information Technology

More information

FPGA Implementation of Lookup Algorithms

FPGA Implementation of Lookup Algorithms 2011 IEEE 12th International Conference on High Performance Switching and Routing FPGA Implementation of Lookup Algorithms Zoran Chicha, Luka Milinkovic, Aleksandra Smiljanic Department of Telecommunications

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

Large-scale Multi-flow Regular Expression Matching on FPGA*

Large-scale Multi-flow Regular Expression Matching on FPGA* 212 IEEE 13th International Conference on High Performance Switching and Routing Large-scale Multi-flow Regular Expression Matching on FPGA* Yun Qu Ming Hsieh Dept. of Electrical Eng. University of Southern

More information

Architecture and Performance Models for Scalable IP Lookup Engines on FPGA*

Architecture and Performance Models for Scalable IP Lookup Engines on FPGA* Architecture and Performance Models for Scalable IP Lookup Engines on FPGA* Yi-Hua E. Yang Xilinx Inc. Santa Clara, CA edward.yang@xilinx.com Yun Qu* Dept. of Elec. Eng. Univ. of Southern California yunqu@usc.edu

More information

A Framework for Rule Processing in Reconfigurable Network Systems

A Framework for Rule Processing in Reconfigurable Network Systems A Framework for Rule Processing in Reconfigurable Network Systems Michael Attig and John Lockwood Washington University in Saint Louis Applied Research Laboratory Department of Computer Science and Engineering

More information

Scalable Packet Classification on FPGA

Scalable Packet Classification on FPGA 1668 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 9, SEPTEMBER 2012 Scalable Packet Classification on FPGA Weirong Jiang, Member, IEEE, and Viktor K. Prasanna, Fellow,

More information

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Da Tong, Viktor Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Email: {datong, prasanna}@usc.edu

More information

Updating Designed for Fast IP Lookup

Updating Designed for Fast IP Lookup Updating Designed for Fast IP Lookup Natasa Maksic, Zoran Chicha and Aleksandra Smiljanić School of Electrical Engineering Belgrade University, Serbia Email:maksicn@etf.rs, cicasyl@etf.rs, aleksandra@etf.rs

More information

A Trie Merging Approach with Incremental Updates for Virtual Routers

A Trie Merging Approach with Incremental Updates for Virtual Routers 213 Proceedings IEEE INFOCOM A Trie Merging Approach with Incremental Updates for Virtual Routers Layong Luo *, Gaogang Xie *,Kavé Salamatian, Steve Uhlig, Laurent Mathy, Yingke Xie * * Institute of Computing

More information

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011 // Bottlenecks Memory, memory, 88 - Switch and Router Design Dr. David Hay Ross 8b dhay@cs.huji.ac.il Source: Nick Mckeown, Isaac Keslassy Packet Processing Examples Address Lookup (IP/Ethernet) Where

More information

Implementation of Boundary Cutting Algorithm Using Packet Classification

Implementation of Boundary Cutting Algorithm Using Packet Classification Implementation of Boundary Cutting Algorithm Using Packet Classification Dasari Mallesh M.Tech Student Department of CSE Vignana Bharathi Institute of Technology, Hyderabad. ABSTRACT: Decision-tree-based

More information

Packet Classification Using Dynamically Generated Decision Trees

Packet Classification Using Dynamically Generated Decision Trees 1 Packet Classification Using Dynamically Generated Decision Trees Yu-Chieh Cheng, Pi-Chung Wang Abstract Binary Search on Levels (BSOL) is a decision-tree algorithm for packet classification with superior

More information

Growth of the Internet Network capacity: A scarce resource Good Service

Growth of the Internet Network capacity: A scarce resource Good Service IP Route Lookups 1 Introduction Growth of the Internet Network capacity: A scarce resource Good Service Large-bandwidth links -> Readily handled (Fiber optic links) High router data throughput -> Readily

More information

High-Performance Packet Classification on GPU

High-Performance Packet Classification on GPU High-Performance Packet Classification on GPU Shijie Zhou, Shreyas G. Singapura and Viktor. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 99

More information

Online Heavy Hitter Detector on FPGA

Online Heavy Hitter Detector on FPGA Online Heavy Hitter Detector on FPGA Da Tong, Viktor Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Email: {datong, prasanna}@usc.edu Abstract Detecting heavy

More information

High Performance Architecture for Flow-Table Lookup in SDN on FPGA

High Performance Architecture for Flow-Table Lookup in SDN on FPGA High Performance Architecture for Flow-Table Lookup in SDN on FPGA Rashid Hatami a, Hossein Bahramgiri a and Ahmad Khonsari b a Maleke Ashtar University of Technology, Tehran, Iran b Tehran University

More information

IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1

IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1 2005 IEEE International Symposium on Signal Processing and Information Technology IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1 G.T. Kousiouris and D.N. Serpanos Dept. of Electrical and

More information

Multiroot: Towards Memory-Ef cient Router Virtualization

Multiroot: Towards Memory-Ef cient Router Virtualization Multiroot: Towards Memory-Ef cient Router Virtualization Thilan Ganegedara, Weirong Jiang, Viktor Prasanna University of Southern California 3740 McClintock Ave., Los Angeles, CA 90089 Email: {ganegeda,

More information

Power Efficient IP Lookup with Supernode Caching

Power Efficient IP Lookup with Supernode Caching Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu * and Lide Duan Department of Electrical & Computer Engineering Louisiana State University Baton Rouge, LA 73 {lpeng, lduan1}@lsu.edu

More information

Lecture 11: Packet forwarding

Lecture 11: Packet forwarding Lecture 11: Packet forwarding Anirudh Sivaraman 2017/10/23 This week we ll talk about the data plane. Recall that the routing layer broadly consists of two parts: (1) the control plane that computes routes

More information

Unique Journal of Engineering and Advanced Sciences Available online: Research Article

Unique Journal of Engineering and Advanced Sciences Available online:   Research Article ISSN 2348-375X Unique Journal of Engineering and Advanced Sciences Available online: www.ujconline.net Research Article A POWER EFFICIENT CAM DESIGN USING MODIFIED PARITY BIT MATCHING TECHNIQUE Karthik

More information

Three Different Designs for Packet Classification

Three Different Designs for Packet Classification Three Different Designs for Packet Classification HATAM ABDOLI Computer Department Bu-Ali Sina University Shahid Fahmideh street, Hamadan IRAN abdoli@basu.ac.ir http://www.profs.basu.ac.ir/abdoli Abstract:

More information

Efficient hardware architecture for fast IP address lookup. Citation Proceedings - IEEE INFOCOM, 2002, v. 2, p

Efficient hardware architecture for fast IP address lookup. Citation Proceedings - IEEE INFOCOM, 2002, v. 2, p Title Efficient hardware architecture for fast IP address lookup Author(s) Pao, D; Liu, C; Wu, A; Yeung, L; Chan, KS Citation Proceedings - IEEE INFOCOM, 2002, v 2, p 555-56 Issued Date 2002 URL http://hdlhandlenet/0722/48458

More information

Dynamic Routing Tables Using Simple Balanced. Search Trees

Dynamic Routing Tables Using Simple Balanced. Search Trees Dynamic Routing Tables Using Simple Balanced Search Trees Y.-K. Chang and Y.-C. Lin Department of Computer Science and Information Engineering National Cheng Kung University Tainan, Taiwan R.O.C. ykchang@mail.ncku.edu.tw

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers ABSTRACT Jing Fu KTH, Royal Institute of Technology Stockholm, Sweden jing@kth.se Virtual routers are a promising

More information

Index Terms- Field Programmable Gate Array, Content Addressable memory, Intrusion Detection system.

Index Terms- Field Programmable Gate Array, Content Addressable memory, Intrusion Detection system. Dynamic Based Reconfigurable Content Addressable Memory for FastString Matching N.Manonmani 1, K.Suman 2, C.Udhayakumar 3 Dept of ECE, Sri Eshwar College of Engineering, Kinathukadavu, Coimbatore, India1

More information

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava Hardware Acceleration in Computer Networks Outline Motivation for hardware acceleration Longest prefix matching using FPGA Hardware acceleration of time critical operations Framework and applications Contracted

More information

Selective Boundary Cutting For Packet Classification SOUMYA. K 1, CHANDRA SEKHAR. M 2

Selective Boundary Cutting For Packet Classification SOUMYA. K 1, CHANDRA SEKHAR. M 2 ISSN 2319-8885 Vol.04,Issue.34, August-2015, Pages:6786-6790 www.ijsetr.com SOUMYA. K 1, CHANDRA SEKHAR. M 2 1 Navodaya Institute of Technology, Raichur, Karnataka, India, E-mail: Keerthisree1112@gmail.com.

More information

Scalable Packet Classification on FPGA

Scalable Packet Classification on FPGA Scalable Packet Classification on FPGA 1 Deepak K. Thakkar, 2 Dr. B. S. Agarkar 1 Student, 2 Professor 1 Electronics and Telecommunication Engineering, 1 Sanjivani college of Engineering, Kopargaon, India.

More information

Efficient Prefix Cache for Network Processors

Efficient Prefix Cache for Network Processors Efficient Prefix Cache for Network Processors Mohammad J. Akhbarizadeh and Mehrdad Nourani Center for Integrated Circuits & Systems The University of Texas at Dallas Richardson, TX 7583 feazadeh,nouranig@utdallas.edu

More information

A Hybrid IP Lookup Architecture with Fast Updates

A Hybrid IP Lookup Architecture with Fast Updates A Hybrid IP Lookup Architecture with Fast Updates Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian To cite this version: Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian.

More information

Efficient Packet Classification using Splay Tree Models

Efficient Packet Classification using Splay Tree Models 28 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.5B, May 2006 Efficient Packet Classification using Splay Tree Models Srinivasan.T, Nivedita.M, Mahadevan.V Sri Venkateswara

More information

Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification

Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification V.S.Pallavi 1, Dr.D.Rukmani Devi 2 PG Scholar 1, Department of ECE, RMK Engineering College, Chennai, Tamil Nadu, India

More information

Novel Hardware Architecture for Fast Address Lookups

Novel Hardware Architecture for Fast Address Lookups Novel Hardware Architecture for Fast Address Lookups Pronita Mehrotra Paul D. Franzon Department of Electrical and Computer Engineering North Carolina State University {pmehrot,paulf}@eos.ncsu.edu This

More information

Jakub Cabal et al. CESNET

Jakub Cabal et al. CESNET CONFIGURABLE FPGA PACKET PARSER FOR TERABIT NETWORKS WITH GUARANTEED WIRE- SPEED THROUGHPUT Jakub Cabal et al. CESNET 2018/02/27 FPGA, Monterey, USA Packet parsing INTRODUCTION It is among basic operations

More information

Parallel-Search Trie-based Scheme for Fast IP Lookup

Parallel-Search Trie-based Scheme for Fast IP Lookup Parallel-Search Trie-based Scheme for Fast IP Lookup Roberto Rojas-Cessa, Lakshmi Ramesh, Ziqian Dong, Lin Cai, and Nirwan Ansari Department of Electrical and Computer Engineering, New Jersey Institute

More information

FPGA Based Packet Classification Using Multi-Pipeline Architecture

FPGA Based Packet Classification Using Multi-Pipeline Architecture International Journal of Wireless Communications and Mobile Computing 2015; 3(3): 27-32 Published online May 8, 2015 (http://www.sciencepublishinggroup.com/j/wcmc) doi: 10.11648/j.wcmc.20150303.11 ISSN:

More information

Iran (IUST), Tehran, Iran Gorgan, Iran

Iran (IUST), Tehran, Iran Gorgan, Iran Hamidreza Mahini 1, Reza Berangi 2 and Alireza Mahini 3 1 Department of ELearning, Iran University of Science and Technology (IUST), Tehran, Iran h_mahini@vu.iust.ac.ir 2 Department of Computer Engineering,

More information

An Efficient Parallel TCAM Scheme for the Forwarding Engine of the Next-generation Router

An Efficient Parallel TCAM Scheme for the Forwarding Engine of the Next-generation Router An Efficient Parallel TCAM Scheme for the Forwarding Engine of the Next-generation Router Bin Zhang, Jiahai Yang, Jianping Wu, Qi Li, Donghong Qin Network Research Center, Tsinghua University Tsinghua

More information

Efficient Self-Reconfigurable Implementations Using On-Chip Memory

Efficient Self-Reconfigurable Implementations Using On-Chip Memory 10th International Conference on Field Programmable Logic and Applications, August 2000. Efficient Self-Reconfigurable Implementations Using On-Chip Memory Sameer Wadhwa and Andreas Dandalis University

More information

Shape Shifting Tries for Faster IP Route Lookup

Shape Shifting Tries for Faster IP Route Lookup Shape Shifting Tries for Faster IP Route Lookup Haoyu Song, Jonathan Turner, John Lockwood Applied Research Laboratory Washington University in St. Louis Email: {hs1,jst,lockwood}@arl.wustl.edu Abstract

More information

ENERGY EFFICIENT INTERNET INFRASTRUCTURE

ENERGY EFFICIENT INTERNET INFRASTRUCTURE CHAPTER 1 ENERGY EFFICIENT INTERNET INFRASTRUCTURE Weirong Jiang, Ph.D. 1, and Viktor K. Prasanna, Ph.D. 2 1 Juniper Networks Inc., Sunnvyale, California 2 University of Southern California, Los Angeles,

More information

High-throughput Online Hash Table on FPGA*

High-throughput Online Hash Table on FPGA* High-throughput Online Hash Table on FPGA* Da Tong, Shijie Zhou, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 989 Email: datong@usc.edu,

More information

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding LETTER IEICE Electronics Express, Vol.14, No.21, 1 11 Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding Rongshan Wei a) and Xingang Zhang College of Physics

More information

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router Overview Implementing Gigabit Routers with NetFPGA Prof. Sasu Tarkoma The NetFPGA is a low-cost platform for teaching networking hardware and router design, and a tool for networking researchers. The NetFPGA

More information

Last Lecture: Network Layer

Last Lecture: Network Layer Last Lecture: Network Layer 1. Design goals and issues 2. Basic Routing Algorithms & Protocols 3. Addressing, Fragmentation and reassembly 4. Internet Routing Protocols and Inter-networking 5. Router design

More information

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT Hardware Assisted Recursive Packet Classification Module for IPv6 etworks Shivvasangari Subramani [shivva1@umbc.edu] Department of Computer Science and Electrical Engineering University of Maryland Baltimore

More information

Shape Shifting Tries for Faster IP Route Lookup

Shape Shifting Tries for Faster IP Route Lookup Shape Shifting Tries for Faster IP Route Lookup Haoyu Song, Jonathan Turner, John Lockwood Applied Research Laboratory Washington University in St. Louis Email: {hs1,jst,lockwood}@arl.wustl.edu Abstract

More information

Content Addressable Memory with Efficient Power Consumption and Throughput

Content Addressable Memory with Efficient Power Consumption and Throughput International journal of Emerging Trends in Science and Technology Content Addressable Memory with Efficient Power Consumption and Throughput Authors Karthik.M 1, R.R.Jegan 2, Dr.G.K.D.Prasanna Venkatesan

More information

FlashTrie: Hash-based Prefix-Compressed Trie for IP Route Lookup Beyond 100Gbps

FlashTrie: Hash-based Prefix-Compressed Trie for IP Route Lookup Beyond 100Gbps FlashTrie: Hash-based Prefix-Compressed Trie for IP Route Lookup Beyond 100Gbps Masanori Bando and H. Jonathan Chao Department of Electrical and Computer Engineering Polytechnic Institute of NYU, Brooklyn,

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Scalable Ternary Content Addressable Memory Implementation Using FPGAs

Scalable Ternary Content Addressable Memory Implementation Using FPGAs Scalable Ternary Content Addressable Memory Implementation Using FPGAs Weirong Jiang Xilinx Research Labs San Jose, CA, USA weirongj@acm.org ABSTRACT Ternary Content Addressable Memory (TCAM) is widely

More information

Scalable Lookup Algorithms for IPv6

Scalable Lookup Algorithms for IPv6 Scalable Lookup Algorithms for IPv6 Aleksandra Smiljanić a*, Zoran Čiča a a School of Electrical Engineering, Belgrade University, Bul. Kralja Aleksandra 73, 11120 Belgrade, Serbia ABSTRACT IPv4 addresses

More information

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers Tania Mishra and Sartaj Sahni Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL 326 {tmishra,

More information

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 PP 11-18 www.iosrjen.org Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory S.Parkavi (1) And S.Bharath

More information

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Shreyas G. Singapura, Anand Panangadan and Viktor K. Prasanna University of Southern California, Los Angeles CA 90089, USA, {singapur,

More information

Optimizing Packet Lookup in Time and Space on FPGA

Optimizing Packet Lookup in Time and Space on FPGA Optimizing Packet Lookup in Time and Space on FPGA Thilan Ganegedara, Viktor Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089 Email: {ganegeda,

More information

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES Greg Hankins APRICOT 2012 2012 Brocade Communications Systems, Inc. 2012/02/28 Lookup Capacity and Forwarding

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

Dynamic Pipelining: Making IP- Lookup Truly Scalable

Dynamic Pipelining: Making IP- Lookup Truly Scalable Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University SIGCOMM 05 Rung-Bo-Su 10/26/05 1 0.Abstract IP-lookup

More information

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table Recent Results in Best Matching Prex George Varghese October 16, 2001 Router Model InputLink i 100100 B2 Message Switch B3 OutputLink 6 100100 Processor(s) B1 Prefix Output Link 0* 1 100* 6 1* 2 Forwarding

More information

Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching

Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching Weirong Jiang, Yi-Hua E. Yang and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of

More information

PACKET classification is a prominent technique used in

PACKET classification is a prominent technique used in IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 2014 1135 A Scalable and Modular Architecture for High-Performance Packet Classification Thilan Ganegedara, Weirong Jiang, and

More information

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification Yaxuan Qi, Jun Li Research Institute of Information Technology (RIIT) Tsinghua University, Beijing, China, 100084

More information

Automatic compilation framework for Bloom filter based intrusion detection

Automatic compilation framework for Bloom filter based intrusion detection Automatic compilation framework for Bloom filter based intrusion detection Dinesh C Suresh, Zhi Guo*, Betul Buyukkurt and Walid A. Najjar Department of Computer Science and Engineering *Department of Electrical

More information

Efficient Packet Classification for Network Intrusion Detection using FPGA

Efficient Packet Classification for Network Intrusion Detection using FPGA Efficient Packet Classification for Network Intrusion Detection using FPGA ABSTRACT Haoyu Song Department of CSE Washington University St. Louis, USA hs@arl.wustl.edu FPGA technology has become widely

More information

A Configurable Packet Classification Architecture for Software- Defined Networking

A Configurable Packet Classification Architecture for Software- Defined Networking A Configurable Packet Classification Architecture for Software- Defined Networking Guerra Pérez, K., Yang, X., Scott-Hayward, S., & Sezer, S. (2014). A Configurable Packet Classification Architecture for

More information

Cisco Nexus 9508 Switch Power and Performance

Cisco Nexus 9508 Switch Power and Performance White Paper Cisco Nexus 9508 Switch Power and Performance The Cisco Nexus 9508 brings together data center switching power efficiency and forwarding performance in a high-density 40 Gigabit Ethernet form

More information

A Pipelined IP Address Lookup Module for 100 Gbps Line Rates and beyond

A Pipelined IP Address Lookup Module for 100 Gbps Line Rates and beyond A Pipelined IP Address Lookup Module for 1 Gbps Line Rates and beyond Domenic Teuchert and Simon Hauger Institute of Communication Networks and Computer Engineering (IKR) Universität Stuttgart, Pfaffenwaldring

More information

LONGEST prefix matching (LPM) techniques have received

LONGEST prefix matching (LPM) techniques have received IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 14, NO. 2, APRIL 2006 397 Longest Prefix Matching Using Bloom Filters Sarang Dharmapurikar, Praveen Krishnamurthy, and David E. Taylor, Member, IEEE Abstract We

More information

IP lookup with low memory requirement and fast update

IP lookup with low memory requirement and fast update Downloaded from orbit.dtu.dk on: Dec 7, 207 IP lookup with low memory requirement and fast update Berger, Michael Stübert Published in: Workshop on High Performance Switching and Routing, 2003, HPSR. Link

More information

Data Structures for Packet Classification

Data Structures for Packet Classification Presenter: Patrick Nicholson Department of Computer Science CS840 Topics in Data Structures Outline 1 The Problem 2 Hardware Solutions 3 Data Structures: 1D 4 Trie-Based Solutions Packet Classification

More information

15-744: Computer Networking. Routers

15-744: Computer Networking. Routers 15-744: Computer Networking outers Forwarding and outers Forwarding IP lookup High-speed router architecture eadings [McK97] A Fast Switched Backplane for a Gigabit Switched outer Optional [D+97] Small

More information

Line-rate packet processing in hardware: the evolution towards 400 Gbit/s

Line-rate packet processing in hardware: the evolution towards 400 Gbit/s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 259 268 doi: 10.14794/ICAI.9.2014.1.259 Line-rate packet processing in hardware:

More information

High Throughput Energy Efficient Parallel FFT Architecture on FPGAs

High Throughput Energy Efficient Parallel FFT Architecture on FPGAs High Throughput Energy Efficient Parallel FFT Architecture on FPGAs Ren Chen Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, USA 989 Email: renchen@usc.edu

More information

Resource-Efficient SRAM-based Ternary Content Addressable Memory

Resource-Efficient SRAM-based Ternary Content Addressable Memory Abstract: Resource-Efficient SRAM-based Ternary Content Addressable Memory Static random access memory (SRAM)-based ternary content addressable memory (TCAM) offers TCAM functionality by emulating it with

More information

ENERGY EFFICIENT PARAMETERIZED FFT ARCHITECTURE. Ren Chen, Hoang Le, and Viktor K. Prasanna

ENERGY EFFICIENT PARAMETERIZED FFT ARCHITECTURE. Ren Chen, Hoang Le, and Viktor K. Prasanna ENERGY EFFICIENT PARAMETERIZED FFT ARCHITECTURE Ren Chen, Hoang Le, and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California, Los Angeles, USA 989 Email:

More information

Routers: Forwarding EECS 122: Lecture 13

Routers: Forwarding EECS 122: Lecture 13 Routers: Forwarding EECS 122: Lecture 13 epartment of Electrical Engineering and Computer Sciences University of California Berkeley Router Architecture Overview Two key router functions: run routing algorithms/protocol

More information

CHAPTER 4 BLOOM FILTER

CHAPTER 4 BLOOM FILTER 54 CHAPTER 4 BLOOM FILTER 4.1 INTRODUCTION Bloom filter was formulated by Bloom (1970) and is used widely today for different purposes including web caching, intrusion detection, content based routing,

More information

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline Forwarding and Routers 15-744: Computer Networking L-9 Router Algorithms IP lookup Longest prefix matching Classification Flow monitoring Readings [EVF3] Bitmap Algorithms for Active Flows on High Speed

More information

CS 268: Route Lookup and Packet Classification

CS 268: Route Lookup and Packet Classification Overview CS 268: Route Lookup and Packet Classification Packet Lookup Packet Classification Ion Stoica March 3, 24 istoica@cs.berkeley.edu 2 Lookup Problem Identify the output interface to forward an incoming

More information

Disjoint Superposition for Reduction of Conjoined Prefixes in IP Lookup for Actual IPv6 Forwarding Tables

Disjoint Superposition for Reduction of Conjoined Prefixes in IP Lookup for Actual IPv6 Forwarding Tables Disjoint Superposition for Reduction of Conjoined Prefixes in IP Lookup for Actual IPv6 Forwarding Tables Roberto Rojas-Cessa, Taweesak Kijkanjanarat, Wara Wangchai, Krutika Patil, Narathip Thirapittayatakul

More information

Routers: Forwarding EECS 122: Lecture 13

Routers: Forwarding EECS 122: Lecture 13 Input Port Functions Routers: Forwarding EECS 22: Lecture 3 epartment of Electrical Engineering and Computer Sciences University of California Berkeley Physical layer: bit-level reception ata link layer:

More information