ENERGY EFFICIENT INTERNET INFRASTRUCTURE

Size: px
Start display at page:

Download "ENERGY EFFICIENT INTERNET INFRASTRUCTURE"

Transcription

1 CHAPTER 1 ENERGY EFFICIENT INTERNET INFRASTRUCTURE Weirong Jiang, Ph.D. 1, and Viktor K. Prasanna, Ph.D. 2 1 Juniper Networks Inc., Sunnvyale, California 2 University of Southern California, Los Angeles, California 1.1 INTRODUCTION Internet is built as a packet-switching network. The kernel function of Internet infrastructure, including routers and switches, is to forward the packets that are received from one subnet to another subnet. The packet forwarding is accomplished by using the header information extracted from a packet to look up the forwarding table maintained in the routers/ switches. Due to rapid growth of network traffic, packet forwarding has long been a performance bottleneck in routers/ switches.

2 2 ENERGY EFFICIENT INTERNET INFRASTRUCTURE Figure 1.1 shows the block diagram of a modern Internet router. A router contains two main architectural components: a routing engine and a packet forwarding engine. The routing engine on the control plane processes routing protocols, receives inputs from network administrators, and produces the forwarding table. The packet forwarding engine on the data plane receives packets, matches the header information of the packet against the forwarding table to identify the corresponding action, and applies the action for the packet. The routing engine and the forwarding engine perform their tasks independently, although they constantly communicate through high-throughput links [26]. Control Plane Admin Input Routing Protocol Process Admin Output Forwarding Table Updates Routing Engine Forwarding Table Routing Protocol Packets Packet_in Forwarding Engine Packet_out Data Plane Figure 1.1 Block diagram of the router system architecture The core function of network routers is IP lookup, where the destination IP address of each packet is matched against the entries in the routing table. Each routing entry consists of a prefix and its corresponding next-hop interface. Table 1.1 shows a simple routing table where we assume 8-bit IP addresses. A prefix in the routing table represents a subset of IP addresses that share the same prefix, and the prefix length is denoted by the number followed by the slash. The nature of IP lookup is longest prefix matching (LPM) [41]. In other words, an IP address may match multiple prefixes, but only the longest

3 INTRODUCTION 3 prefix is used to retrieve the next-hop information. For example, a packet with destination IP address will match the prefixes 110* and 11* in Table 1.1. But 110* becomes the LPM. Therefore, that packet is forwarded to the corresponding next-hop interface, P6. Table 1.1 Example IP lookup table Prefix/Length Next-Hop Interface /1 P /3 P /2 P /3 P /2 P Performance Challenges As the Internet becomes even more pervasive, performance of the network infrastructure that supports this universal connectivity becomes critical with respect to throughput and power. Traditionally, performance has been achieved by increasing the maximum network throughput to handle bursty traffic. The harsh truth about the power-throughput relationship of today s network infrastructures is that power efficiency is often sacrificed in order to obtain higher throughput through brute-force expansion. A single high-end core router that switches 640 Gbps full-duplex network traffic can consume over 10 kw of power [9, 25], whereas a high-end service gateway capable of Gbps routing and firewall throughput can take 1 5 kw of power [8, 24]. Both the large amount of total energy and the high power density cause serious problems for the industry and the environment through the operation and maintenance of these network equipments: The historical trend (Figure 1.2) shows that the capacity of backbone routers had doubled every 18 months until 5 years ago. Today, terabit

4 4 ENERGY EFFICIENT INTERNET INFRASTRUCTURE routers with kw are at the limit due to the power density. As a result, a thirty-fold shortfall in capacity will be seen by 2015 as compared to the historical trend for single rack routers [36]. Figure 1.2 Router capacity limited by power [36]. The high power density imposes a strenuous burden on the cooling of the network equipment. According to [44], a hardware component that consumes W/ft 2 can require 1.3x to 2.3x more power for its cooling. In other words, every wattage that is saved from the critical operation of the network equipment reduces up to 3.3 watts of total power dissipation. The high power and cooling also implies high monetary investments and energy costs. This fact is especially true for network infrastructures where the equipment (nodes) often need to be placed strategically near the center of metropolitan areas. At 500 watts per square foot, it will cost $5000/ft 2, or $250 million in total, to equip a 50,000-square-foot facility as a data center [2]. Some recent investigations [36, 7] show that power dissipation has become the major limiting factor for next-generation routers and predict that expensive liquid cooling may be needed in future routers. Packet forwarding has

5 INTRODUCTION 5 been a major performance bottleneck for network infrastructure [13]. Power/ energy consumption by forwarding engines has become an increasingly critical concern [47, 14]. Recent analysis by researchers from Bell labs [36] reveals that almost two thirds of power dissipation inside a core router is due to IP forwarding engines Existing Packet Forwarding Approaches Software Approaches The nature of IP lookup is longest prefix matching (LPM). The most common data structure in algorithmic solutions for performing LPM is some form of trie [41]. A trie is a binary tree, where a prefix is represented by a node. The value of the prefix corresponds to the path from the root of the tree to the node representing the prefix. The branching decisions are made based on the consecutive bits in the prefix. If only one bit is used to make branching decision at a time, then a trie is called a uni-bit trie. The prefix set in Figure 1.3 (a) corresponds to the uni-bit trie in Figure 1.3 (b). For example, the prefix 010* corresponds to the path that starts at the root and ends in node P3: first a left-turn (0), then a right-turn (1), and finally a turn to the left (0). Each trie node contains two fields: the represented prefix and the pointer to the child nodes. By using the optimization called leaf-pushing [46], each node needs only one field: either the pointer to the next-hop address or the pointer to the child nodes. Figure 1.3 (c) shows the leaf-pushed uni-bit trie that is derived from Figure 1.3 (b). Given a leaf-pushed uni-bit trie, IP lookup is performed by traversing the trie according to the bits in the IP address. When a leaf is reached, the prefix associated with the leaf is the longest matched prefix for that IP address. The corresponding next-hop information of that prefix is then retrieved. The time to look up a uni-bit trie is equal to the prefix length. The use of multiple bits in one scan can increase the search speed. Such a trie is called a multibit trie. The number of bits scanned at a time is called the stride. Figure 1.3 (d) shows the multi-bit trie for the prefix entries in Figure 1.3 (a). The root node uses a stride of 3, while the node that contains P3 uses a stride

6 6 ENERGY EFFICIENT INTERNET INFRASTRUCTURE Figure 1.3 (a) Prefix set; (b) Uni-bit trie; (c) Leaf-pushed trie; (d) Multi-bit trie. of 2. Multi-bit tries that use a larger stride usually result in a much larger memory requirement, while some optimization schemes have been proposed for memory compression [13, 45]. The well-known tree bitmap algorithm [13] uses a pair of bit maps for each node in a multi-bit trie. One bit map represents the children that are actually present and the other represents the next hop information that is associated with the given node. Children of a node are stored in consecutive memory locations, which allows each node to use just a single child pointer. Similarly, another single pointer is used to reference the next hop information that is associated with a node. This representation allows every node in the multi-bit trie to occupy a small amount of memory Hardware Approaches With the advances in optical networking technology, link rates in Internet infrastructure are being pushed from OC-768 (40 Gbps) to even higher rates [50]. Such high rates demand that IP lookup

7 SRAM-BASED PIPELINED IP LOOKUP ARCHITECTURES: ALTERNATIVE TO TCAMS 7 in routers must be performed in hardware. For instance, 40 Gbps links require a throughput of 8 ns per lookup for a minimum size (40 bytes) packet. Such throughput is impossible using existing software-based solutions [41, 16]. Most hardware-based solutions for high speed IP lookup fall into two main categories: TCAM (ternary content addressable memory)-based and DRAM/SRAM (dynamic/static random access memory)-based solutions. Although TCAM-based engines can retrieve IP lookup results in just one clock cycle, their throughput is limited by the relatively low speed of TCAMs. They are expensive and offer little flexibility for adapting to new addressing and routing protocols [48]. As shown in Table 1.2, SRAM outperforms TCAM with respect to speed, density and power consumption. However, traditional SRAM-based solutions, most of which can be regarded as some form of tree traversal, need multiple clock cycles to complete a lookup. For example, trie [41], a tree-like data structure representing a collection of prefixes, is widely used in SRAM-based solutions. It needs multiple memory accesses to search a trie to find the longest matched prefix for an IP packet. Table 1.2 Comparison of TCAM and SRAM technologies TCAM (18 Mb chip) SRAM (18 Mb chip) Maximum clock rate (MHz) 250 [18] 450 [11, 43] Cell size (# of transistors per bit) [1] 16 6 Power consumption (Watts) [54] 0.1 [5] 1.2 SRAM-BASED PIPELINED IP LOOKUP ARCHITECTURES: ALTERNATIVE TO TCAMS Several researchers have explored pipelining in order to improve significantly the throughput of SRAM-based IP lookup engines. Taking trie-based solutions as an example, a simple pipelining approach is to map each trie level

8 8 ENERGY EFFICIENT INTERNET INFRASTRUCTURE onto a pipeline stage with its own memory and processing logic. One IP lookup can be performed every clock cycle. However, this approach results in unbalanced trie node distribution over the pipeline stages. Memory imbalancing has been identified as a dominant issue for pipelined architectures [15, 4]. In an unbalanced pipeline, the fattest stage, which stores the largest number of trie nodes, becomes a bottleneck. It adversely affects the overall performance of the pipeline for the following reasons: First, it needs more time to access the larger local memory. This leads to reduction in the global clock rate. Second, a fat stage results in many updates, due to the proportional relationship between the number of updates and the number of trie nodes stored in that stage. Particularly during the update process caused by intensive route insertion, the fattest stage can also result in memory overflow. Furthermore, since it is unclear at hardware design time which stage will be the fattest, memory with the maximum size must be allocated for each stage. This results in memory wastage. Basu et al. [4] and Kim et al. [29] both reduce the memory imbalance by using variable strides to minimize the largest trie level. However, even with their schemes, the size of the memory of different stages can have a large variation. As an improvement upon [29], Lu et al. [34] proposes a tree-packing heuristic to further balance the memory, but it does not solve the fundamental problem of how to retrieve one node s descendants that are not allocated in the following stage. Furthermore, a variable stride multi-bit trie is difficult for hardware implementation especially if incremental updating is needed [4]. Baboescu et al. [3] propose a Ring pipeline architecture for trie-based IP lookup. The memory stages are configured in a circular, multi-point access pipeline, so that lookups can be initiated at any stage. The trie is split into many small subtries of equal size. These subtries are then mapped to different stages to create a balanced pipeline. Some subtries have to wrap around if their roots are mapped to the last several stages. Though all IP packets enter the pipeline from the first stage, their lookup processes may be activated at different stages. Hence, all the IP lookup packets must traverse the pipeline

9 SRAM-BASED PIPELINED IP LOOKUP ARCHITECTURES: ALTERNATIVE TO TCAMS 9 twice to complete the trie traversal. The throughput is thus 0.5 lookups per clock cycle. Kumar et al. [31] extend the circular pipeline with a new architecture called the Circular, Adaptive and Monotonic Pipeline (CAMP). It has multiple entrance and exit points, so that the throughput can be increased at the cost of output disorder and delay variation. It employs several request queues to manage access conflicts between the new request and the one from the preceding stage. It can achieve a worst-case throughput of 0.8 lookups per clock cycle, while maintaining balanced memory across pipeline stages. Due to the non-linear structure, neither the Ring pipeline nor CAMP under worst cases can maintain a throughput of one lookup per clock cycle. Also, neither of them properly supports the write bubble proposed in [4] for the incremental route update. Jiang et al. [19, 23] proposes a fine-grained nodeto-stage mapping scheme for linear pipeline architectures. It is based on the heuristic that allows any two nodes on the same level of a trie to be mapped onto different stages of a pipeline. Balanced memory distribution across pipeline stages is achieved, while a high throughput of one packet per clock cycle is sustained. The linear pipeline architecture supports incremental route updates without disrupting the ongoing IP lookup operations. Figure 1.4 shows the mapping result for the uni-bit trie in Figure 1.3 (b) using the fine-grained mapping scheme. To allow two nodes on the same trie level to be mapped to different stages, each node stored in the local memory of a pipeline stage has two fields. One is the memory address of its child node in the pipeline stage where the child node is stored. The other is the distance to the pipeline stage where the child node is stored. When a packet is passed through the pipeline, the distance value is decremented by 1 when it goes through a stage. When the distance value becomes 0, the child node s address is used to access the memory in that stage.

10 10 ENERGY EFFICIENT INTERNET INFRASTRUCTURE root Stage 1 Stage 2 Stage 3 Stage 4 P3 P1 P6 P2 P7 P8 Stage 5 P4 P5 Figure 1.4 pipeline [19, 23]. Mapping the uni-bit trie (shown in Figure 1.3 (b)) onto the linear 1.3 DATA STRUCTURE OPTIMIZATION FOR POWER EFFICIENCY Though SRAM-based pipeline architectures have been proposed as a promising alternative to power-hungry TCAMs for IP lookup engines in next-generation routers [6, 23], they may still suffer from high power/ energy consumption, due to the large number of memory accesses for each IP lookup [52]. The overall power consumption for each IP lookup in SRAM-based engines can be expressed as in Equation 1.1: H P ower overall = [P m (S i, ) + P l (i)] (1.1) i=1 Here, H denotes the number of memory accesses, P m (.) the function of the power dissipation of a memory access (which usually has a positive correlation with the memory size), S i the size of the ith memory that is being accessed, and P l (i) the power consumption of the logic that is associated with the ith memory access. Since the logic dissipates much less power than the memories in the memory-dominant architectures [31, 39], the main focus of our work is on reducing the power consumption of the memory accesses. Note that the power consumption of a single memory is affected by many other factors, such as the fabrication technology and sub-bank organization, which are beyond the scope of our work. Since the energy consumption for each memory access is the product of the power consumption and the memory access time, the energy consumption per IP lookup is identical to its power consumption times clock period. In this

11 DATA STRUCTURE OPTIMIZATION FOR POWER EFFICIENCY 11 chapter, power, if not specified, refers to the power/ energy consumption per IP lookup Problem Formulation Little work has been done on data structure optimization for power-efficient SRAM-based IP lookup engines. In this section we focus on fixed-stride multibit tries where all nodes at the same level have the same stride. Fixed-stride multi-bit tries are attractive for hardware implementation due to their ease for route update [13]. We use the following notations. Let W denote the maximum prefix length. W = 32 for IPv4. Let S = {s 0, s 1,, s k 1 } denote the sequence of strides for building a k-level multi-bit trie. Let S denote the number of strides in k 1 S ( S = k). 0 s i = W. Considering the hardware implementation for tree-bitmap-coded multi-bit tries, we cap the length of strides at s i < B s, i = 0, 1,, k 1, where B s is a predefined parameter, called the stride bound Non-Pipelined and Pipelined Engines A SRAM-based non-pipelined IP lookup engine stores the entire trie in a single memory. Any IP lookup may need to access the memory multiple times. Hence, the worst-case power consumption of a SRAM-based non-pipelined IP lookup engine can be modeled by Equation (1.2), where P ower memory and P ower logic denote the power consumption of the memory and of the logic, respectively. P ower = (P ower memory + P ower logic ) k (1.2) The logic dissipates much less power than the memories in the memoryintensive architectures [31, 17, 22]. For example, [22] shows that the memory dissipates almost an order of magnitude higher power than the logic in FPGA implementation of a pipelined IP lookup engine. Thus, we do not consider the power consumption of the logic. The optimal stride problem can be formulated as: min min P m(m(s(k))) k (1.3) k=1,2,,w S(k)

12 12 ENERGY EFFICIENT INTERNET INFRASTRUCTURE where M(S) denotes the memory requirement of the multi-bit trie built using S. P m (M) is the power function of the SRAM of the size M. For a SRAM-based pipelined IP lookup engine, its worst-case power consumption can be modeled by Equation (1.4), where H denotes the pipeline depth, i.e. the number of pipeline stages. P ower memory (i) and P ower logic (i) denote the power consumption of the memory and of the logic in the ith stage, respectively. P ower = H [P ower memory (i) + P ower logic (i)] (1.4) i=1 Similar to the non-pipelined engine, we omit the power consumption of the logic. Also, assuming that the memory distribution across the pipeline stage is balanced, the optimal stride problem can be formulated as: { min [P m ( M(S) } )] max ( S, H) S H (1.5) where M(S) denotes the memory requirement of the multi-bit trie built using S. P m (M) is the power function of the SRAM of the size M. The number of memory accesses is determined by the S and H. When H < S, multiple clock cycles are needed to access a stage. To achieve high throughput, we let H S. Since S = k W, we can rewrite (1.5) to be: min min [P m( M(S(k)) )] H (1.6) k=1,2,,w S(k) H To solve (1.3) and (1.6), we can first fix k and find the optimal S(k) so that the power consumption is minimized for the given k. Then, we compare the power consumption for different k to obtain the overall optimal S Power Function of SRAM Before we solve the above optimization problem, we need to figure out the power function of the SRAM with respect to its size M: P m (M). There is some published work on comprehensive power models of SRAM [12, 32, 49]. But these detailed white box models do not show the direct relationship between the power consumption and the memory

13 DATA STRUCTURE OPTIMIZATION FOR POWER EFFICIENCY 13 size. We use CACTI tool [49] to evaluate both the dynamic and the static power consumption of SRAMs of different sizes and then obtain the function parameters through curve fitting ( black box modeling). According to [12, 32], when the word width is constant, both the highlevel dynamic and static power consumption of SRAMs can be approximately represented in the form of: P (M) = A M B (1.7) where M is the memory size, and A and B are the parameters whose values are different for dynamic and static power. We vary the SRAM size from 256 bytes to 8 Mbytes while keeping the word width to be 8 bytes, and obtain their power consumption using CACTI tool [49]. After curve fitting, we obtain that A dynamic = , B dynamic = 0.50, A static = , and B static = The results from CACTI and curve fitting are both shown in Figure 1.5. Power (Watts) Power vs. SRAM size dynamic_power_cacti dynamic_power_fitting static_power_cacti static_power_fitting SRAM size (bytes) x 10 6 Figure 1.5 Power vs. SRAM size Hence P m (M) = A dynamic M B dynamic +A static M Bstatic 10 6 (207M M). Then, (1.3) and (1.6) become (1.8) and (1.9), respectively: min min k=1,2,,w S(k) (207M(S(k)) M(S(k))) k (1.8) min min k=1,2,,w S(k) (207M(S(k))0.5 H M(S(k))) (1.9)

14 14 ENERGY EFFICIENT INTERNET INFRASTRUCTURE For a given k, when M(S(k)) is minimized, the power consumption is also minimized. Thus, the above problems can be reduced to finding the optimal stride so that the memory requirements are minimized Special Case: Uniform Stride In the original tree bitmap paper [13], the authors suggest using the same stride for all the nodes except for the root node. We call such a special fixedstride multi-bit trie as a multi-bit trie with uniform stride. The stride used by the root, s 0, is called the initial stride. Given k, we can find the optimal S by exhaustive search over different initial strides. In each iteration, s i =, i = 1, 2,, k 1. W s 0 k Dynamic Programming Srinivasan and Varghese [46] have developed a dynamic programming based solution to minimize the memory requirement of a k-level multi-bit trie. Sahni and Kim [42] made further improvement to reduce the complexity of the algorithms. However, those algorithms focused on the naive implementation of multi-bit tries, without considering the tree bitmap coding technique [13] for compressing the memory requirement of multi-bit tries. Similar to [46] and [42], we use the following notations. O denotes the uni-bit trie for the given set of prefixes. nnode(i) denotes the number of nodes at the ith level of O. np ref ix(i, j) denotes the total number of prefixes contained between the ith and the jth levels of O. T (j, r), r j +1, denotes the cost (the memory requirement) of the best way to cover levels 0 through j of O using exactly r expansion levels.

15 DATA STRUCTURE OPTIMIZATION FOR POWER EFFICIENCY 15 The dynamic programming recurrence for T is: T (j, r) = j 1 min m=max (r 2,j B s) {T (m, r 1)+ nnode(m + 1) 2 Bs 2/32+ (1.10) nnode(j + 1) + np refix(m + 1, j)} T (j, 1) = 2 j+1 (1.11) Note that all the strides except the initial stride are capped by the stride bound (B s ). In hardware implementation, the length of the bitmaps is determined by B s. Algorithm F ixedstride(w, k), as shown in Figure 1.6, computes T (W 1, k), which is the minimum memory requirement to build a k-level treebitmap-coded multi-bit trie. The complexity of Algorithm F ixedstride(w, k) is O(k W B s ). After obtaining T (W 1, k), we can follow the track of the corresponding M(, ) to find the optimal S in O(k) time Performance Evaluation We used seventeen real-life backbone routing tables from the Routing Information Service (RIS) [40]. Their characteristics are shown in Table 1.3. Note that the routing tables rrc08 and rrc09 are much smaller than others, since the collection of these two data sets ended on September 2004 and February 2004, respectively [40]. We conducted the experiments for both non-pipelined and pipelined architectures. We evaluated the impacts of different architecture parameters on the power-optimal design of the data structure. The architecture parameters include the stride type, the stride bound, and the pipeline depth. For fixed-stride tree-bitmap-coded multi-bit tries, two stride types are considered. The first uses uniform strides, as described in Section The second uses optimal strides whose value is capped by B s, as discussed in Section Results for Non-Pipelined Architecture First, we set B s = 4 and examined the results for the non-pipelined architecture using uniform and op-

16 16 ENERGY EFFICIENT INTERNET INFRASTRUCTURE Input: O Output: T (W 1, k), S(k) 1: // Compute T (W 1, k) 2: for j = 0 to W 1 do 3: T (j, 1) = 2 j+1 4: end for 5: for r = 2 to k do 6: for j = r 1 to W 1 do 7: mincost = MaxV alue 8: for m = max (r 2, j B s ) to j 1 do 9: cost = T (m, r 1) + nnode(m + 1) 2 B s 2 + nnode(j + 1) + np refix(m + 1, j) 10: if cost < mincost then 11: T (j, r) = mincost 12: M(j, r) = m 13: end if 14: end for 15: end for 16: end for 17: // Compute S(k) = {s i }, i = 0, 1,, k 1 18: m = W 1 19: for r = k to 1 do 20: s r 1 = m M(m, r) 21: m = M(m, r) 22: end for Figure 1.6 Algorithm: F ixedstride(w, k).

17 DATA STRUCTURE OPTIMIZATION FOR POWER EFFICIENCY 17 Table 1.3 Representative routing tables (snapshot on 2009/04/01) Routing table # of prefixes # of prefixes w/ length < 16 rrc (0.79%) rrc (0.83%) rrc (0.78%) rrc (0.83%) rrc (0.81%) rrc (0.84%) rrc (0.82%) rrc (0.84%) rrc (0.59%) rrc (0.75%) rrc (0.83%) rrc (0.83%) rrc (0.83%) rrc (0.81%) rrc (0.83%) rrc (0.79%) rrc (0.82%)

18 18 ENERGY EFFICIENT INTERNET INFRASTRUCTURE timal strides. The results are shown in Figure 1.7. In both cases, the power was minimized when k = 5 and S = 16, 4, 4, 4, (a) Uniform Stride 160 (b) Optimal Stride (B s = 4) Watts Watts k k Figure 1.7 Power results of the non-pipelined architecture using (a) the uniform stride and (b) the optimal stride. 250 (a) Optimal Stride (B s = 2) 350 (b) Optimal Stride (B s = 6) Watts Watts k k Figure 1.8 (b) B s = 6. Power results of the non-pipelined architecture using (a) B s = 2 and Then we varied the stride bound (B s ). Figure 1.8 shows the results by using two different stride bounds: B s = 2 and B s = 6 for the architecture that uses optimal stride. For B s = 2, the power was minimized when k = 9

19 DATA STRUCTURE OPTIMIZATION FOR POWER EFFICIENCY 19 and S = 16, 2, 2, 2, 2, 2, 2, 2, 2. For B s = 6, the minimal power was achieved when k = 4 and S = 17, 5, 5, Results for Pipelined Architecture Figure 1.9 shows the power consumption of the pipelined architecture using the two stride types. The pipeline depth (h) was set to be equal to k. The stride bound B s was 4. Both cases achieved the optimal power performance when k = 6 and S = 13, 4, 4, 4, 4, 3. 6 (a) Uniform Stride 8 (b) Optimal Stride (B s = 4) Watts 4 3 Watts h h Figure 1.9 and (b) the optimal stride. Power results of the pipelined architecture using (a) the uniform stride Figure 1.10 shows the results by using different stride bounds for the optimal stride. We set h = k. For B s = 2, the power was minimized when k = 9 and S = 16, 2, 2, 2, 2, 2, 2, 2, 2. For B s = 6, the minimal power was achieved when k = 5 and S = 16, 4, 4, 4, 4. We also conducted experiments using different pipeline depths. Both cases achieved the minimal power consumption when k = 6 and S = 13, 4, 4, 4, 4, 3. These are the same as the results for h = k (Figure 1.9(b)). This means that the pipeline depth has little impact on determining the optimal strides for pipelined architectures.

20 20 ENERGY EFFICIENT INTERNET INFRASTRUCTURE 20 (a) Optimal Stride (B s = 2) 20 (b) Optimal Stride (B s = 6) Watts 10 Watts h h Figure 1.10 B s = 6. Power results of the pipelined architecture using (a) B s = 2 and (b) 1.4 ARCHITECTURAL OPTIMIZATION TO REDUCE DYNAMIC POWER DISSIPATION This section exploits several characteristics of Internet traffic and of the pipeline architecture, to reduce the dynamic power consumption of SRAM-based IP forwarding engines. First, as observed in [20], Internet traffic contains a large amount of locality, where most packets belong to few flows. By caching the recently forwarded IP addresses, the number of memory accesses can be reduced so that power consumption is lowered. Unlike previous caching schemes, most of which need an external cache to be attached to the main forwarding engine, we integrate the caching function into the pipeline architecture itself. As a result, we do away with complicated cache replacement hardware and eliminate the power consumption of the hot cache [53]. Second, since the traffic rate varies from time to time, we freeze the logic when no packet is input. We propose a local clocking scheme where each stage is driven by an independent clock and is activated only under certain conditions. The local clocking scheme can also improve the caching performance. Third, we note that different packets may access different stages of the pipeline, which leads

21 ARCHITECTURAL OPTIMIZATION TO REDUCE DYNAMIC POWER DISSIPATION 21 to a varying access frequency onto different stages. Thus we propose a finegrained memory enabling scheme to make the memory in a stage sleep when the incoming packet is not accessing it. Our simulation results show that the proposed schemes can reduce the power consumption by up to fifteen-fold. We prototype our design on a commercial field programmable gate array (FPGA) device and show that the logic usage is low while the backbone throughput requirement (40 Gbps) is met Analysis and Motivation We obtained four backbone Internet traffic traces from the Cooperative Association for Internet Data Analysis (CAIDA) [10]. The trace information is shown in Table 1.4, where the numbers in the parenthesis are the ratio of the number of unique destination IP addresses to the total number of packets in each trace. Table 1.4 Real-life IP header traces Trace Date # of packets # of unique IPs equinix-chicago-a (6.93%) equinix-chicago-b (6.48%) equinix-sanjose-a (6.73%) equinix-sanjose-b (5.24%) Traffic Locality. According to Table 1.4, regardless of the length of the packet trace, the number of unique destination IP addresses is always much smaller than that of the packets. These results coincide with those of previous work on Internet traffic characterization [33]. Due to TCP burst, some destination IP addresses can be connected very frequently in a short time span. Hence, caching has been used effectively in exploiting such traffic locality to either

22 22 ENERGY EFFICIENT INTERNET INFRASTRUCTURE improve the IP forwarding speed [33] or help balance the load among multiple forwarding engines [20]. This paper employs the caching scheme to reduce the number of memory accesses so that the power consumption can be lowered. Traffic Rate Variation. We analyze the traffic rate in terms of the number of packets at different times. The results for the four traces are shown in Figure 1.11, where the X axis indicates the time intervals and the Y axis the number of packets within each time interval. As observed in other papers [28], the traffic rate varies from time to time. Although the router capacity is designed for the maximum traffic rate, power consumption of the IP forwarding engine can be reduced by exploiting such traffic rate variation in real life equinix chicago A equinix chicago B equinix sanjose A equinix sanjose B Figure 1.11 Traffic rate variation over the time. Access Frequency on Different Stages. The unique feature of the SRAM-based pipelined IP forwarding engine is that different stages contain different sets of trie nodes. Given various input traffic, the access frequency to different stages can vary significantly. For example, we used a backbone routing table from the Routing Information Service (RIS) [40] to generate a trie, mapped the trie onto a 25-stage pipeline, and measured the total number of memory accesses

23 ARCHITECTURAL OPTIMIZATION TO REDUCE DYNAMIC POWER DISSIPATION 23 on each stage for the four input traffic traces. The results are shown in Figure 1.12, where the access frequency of each stage is calculated by dividing the number of memory accesses on each stage by that on the first stage. The first stage is always accessed by all packets, while the last few stages are seldom accessed. According to this observation, we should disable the memory access in some stages when the packet is not accessing the memory in that stage Access frequency equinix chicago A equinix chicago B equinix sanjose A equinix sanjose B Stage ID Figure 1.12 Access frequency on each stage Architecture-Specific Techniques We propose a caching-enabled SRAM-based pipeline architecture for powerefficient IP forwarding, as shown in Figure Let H denote the pipeline depth, i.e. the number of stages in the pipeline. These H stages store the mapped trie nodes. Every time the architecture receives a packet, the incoming packet compares its destination IP address with the packets that are already in the pipeline. It will be considered as cache hit if there is a match, even though the packet has not retrieved the next-hop information yet. To preserve the packet order, the packet that is having a cache hit still goes

24 24 ENERGY EFFICIENT INTERNET INFRASTRUCTURE through the pipeline. However, no memory access is needed for this packet, so that the power consumption for this packet is reduced. IP + Next-hop info Packet SRAM SRAM SRAM Next-hop info IP =? Hit / Miss Stage i Stage (i+1) =? SRAM =? Cache hit =? =? Clk_i Valid AND Clk_(i+1) (a) (b) Figure 1.13 Pipeline with (a) inherent caching and (b) local clocking Inherent Caching Most existing caching schemes need to add an external cache to the forwarding engine. However, the cache itself can be powerintensive [53] and also needs extra logic to support cache replacement. The relatively long pipeline delay can also result in low cache hit rates in traditional caching schemes [20]. Our architecture implements the caching function without appending extra caches. As shown in Figure 1.13 (a), the pipeline itself acts as a fully associative cache, where the existing packets in all the stages are matched with the arriving packet. If the arriving packet (denoted as P kt new ) matches a previous packet (denoted as P kt exist ) that is already existing in the pipeline, P kt new has a cache hit even though P kt exist has not retrieved its next-hop information. Then P kt new will go through the pipeline with the cache hit signal set to 1. On the other hand, P kt new does not obtain the next-hop information until P kt exist exits the pipeline. As shown in Figure 1.13 (a), the packet exiting the pipeline will forward its IP address and its retrieved next-hop information to all the previous stages. The packets in the previous stages compare with the forwarded IP address. The packet matching the forwarded IP address will take the forwarded next-hop information as its own and carry the retrieved next-hop information along when traversing the rest of the pipeline.

25 ARCHITECTURAL OPTIMIZATION TO REDUCE DYNAMIC POWER DISSIPATION Local Clocking Most of the existing pipelined IP lookup engines are driven by a global clock. The logic in a stage is active even when there is no packet to be processed. This results in unnecessary power consumption. Furthermore, since the pipeline keeps forwarding the packets from one stage to the next stage at the highest clock frequency, the pipeline will contain few packets if the traffic rate is low. Since the pipeline is built as a cache which is dynamic and sensitive to input traffic, few packets in the pipeline indicates a small number of cached entries, which results in low cache hit rate. To address this issue, we propose a local clocking scheme where each stage is driven by an individual clock. Only one constraint must be met to prevent any packet loss: Constraint 1: If the clock of the previous stage is active and there is a packet in the current stage, the clock of the current stage must be active. Hence, we design the local clocking as shown in Figure 1.13 (b). The clock of a stage will not be active until the stage contains a valid packet and its preceding stage is forwarding some data to the current stage. To prevent clock skew, some delay logic is added in the data path of the clock signal of the previous stage. In a real implementation, we do not use the AND gate which may result in glitches. Instead, we use clock buffer primitives provided by Xilinx design tools [51] Fine-Grained Memory Enabling As discussed earlier, the access frequency to different stages within the pipeline varies. Current pipelined IP forwarding engines keep all memories active for all the packets, which results in unnecessary power consumption. Our fine-grained memory enabling scheme is achieved by gating the clock signal with the read enable signal for the memory in each stage. The read enable signal becomes active only when the packet goes to access the memory in the current stage. In other words, the read enable signal will remain inactive in any of the following four cases: no packet is arriving, the distance value of the arriving packet is larger than 0, the packet has already retrieved its next-hop information, or the cache hit signal carried by the packet is set to 1.

26 26 ENERGY EFFICIENT INTERNET INFRASTRUCTURE Performance Evaluation We prototyped our design (denoted Proposed ) and the baseline pipeline (denoted Baseline ) that did not integrate the proposed schemes, respectively, on FPGA using Xilinx ISE 10.1 development tools. The target device was Xilinx Virtex-5 XC5VFX200T with -2 speed grade. Table 1.5 shows the post place and route results where Both denotes both designs 1. Although our design used more logic resource than the baseline, the design consumed still a small amount of the overall on-chip logic resources. Both designs achieved a clock frequency of 125 MHz while using the same amount of BlockRAMs. Such a clock frequency results in a throughput of 40 Gbps for minimum size (40 bytes) packets, which meets the current backbone network rate. Table 1.5 Resource utilization Design Used Available Utilization Number of Slices Baseline , % Proposed , % Number of bonded IOBs Both % Number of Block RAMs Both % The dynamic power consumption of a pipelined IP lookup engine can be modeled as Equation (1.12), where p denotes the packet to be looked up, H the pipeline depth, N(p) the number of packets, and P ower memory (i, p) and P ower logic (i, p) denote the power consumption of the memory and of the logic in the ith stage by p, respectively. 1 Due to the limitation of the size of on-chip memory, both designs supported 70K prefixes which is 1/4 of the current largest backbone routing table. However, our architecture can be extended by using external SRAMs.

27 ARCHITECTURAL OPTIMIZATION TO REDUCE DYNAMIC POWER DISSIPATION 27 P ower = p H i=1 [P ower memory(i, p) + P ower logic (i, p)] N(p) (1.12) We profiled the power consumption of the memory and of the logic based on our FPGA implementation results. Using the XPower Analyzer tool provided by Xilinx, we obtained the power consumption for the baseline and our design, as shown in Figure As we expected, the power consumption by memory dominated the overall power dissipation of the pipelined IP lookup engine. Figure 1.14 engine Profiling of dynamic power consumption in a pipelined IP lookup Based on the profile data of the power consumption in the architecture, we developed a cycle-accurate simulator for our pipelined IP lookup engine. We conducted the experiments using the four real-life backbone traffic traces given in Table 1.4, and evaluated the overall power consumption. First, we examined the impact of the fine-grained memory enabling scheme. We disabled both inherent caching and local clocking, and then ran the simulation under two different conditions: (1) without fine-grained memory enabling (denoted as wo/ FME ) and (2) with fine-grained memory enabling (denoted as w/ FME ). Figure 1.15 compares the results, where the results of (2) are set to be the baseline and the results of (1) are divided by those of (2).

28 28 ENERGY EFFICIENT INTERNET INFRASTRUCTURE According to Figure 1.15, fine-grained memory enabling can achieve up to 12-fold reduction in power consumption. Figure 1.15 Power reduction with fine-grained memory enabling Second, we evaluated the impact of the inherent caching and the local clocking schemes. We enabled the fine-grained memory enabling scheme, and then ran the simulation under three different conditions: (1) without either inherent caching or local clocking (denoted as Baseline ); (2) with inherent caching but without local clocking (denoted as Cache only ); (3) with both schemes (denoted as Cache + LC ). The results are shown in Figure 1.16 where the results without both schemes are set as the baseline. Without local clocking, the reduction in power consumption using caching was very little, because of the low cache hit rate (e.g. 1.65% for equinix-chicago-b). Local clocking improved the cache hit rate (e.g. the cache hit rate for equinixchicago-b increased to 45.9%), which resulted in higher reduction in power consumption. Overall, when all the three proposed schemes were enabled, the architecture achieved 6.3, 15.2, 8.1, and 7.8 -fold reduction in power consumption, for the four traffic traces, respectively.

29 RELATED WORK 29 Figure 1.16 Power reduction with inherent caching and local clocking 1.5 RELATED WORK Reducing the power consumption of network routers has been a topic of significant interest [14, 7, 38]. Most of the existing work focuses on system- and network-level optimizations. Chabarek et al. [7] enumerate the power demands of two widely used Cisco routers. The authors further use mixed integer optimization techniques to determine the optimal configuration at each router in their sample network for a given traffic matrix. Nedevschi et al. [38] assume that the underlying hardware in network equipment supports sleeping and dynamic voltage and frequency scaling. The authors propose to shape the traffic into small bursts at edge routers to facilitate sleeping and rate adaptation. Power-efficient IP lookup engines have been studied from various aspects. However, to the best of our knowledge, little work has been done on pipelined SRAM-based IP lookup engines. Some TCAM-based solutions [52, 54] propose various schemes to partition a routing table into several blocks and perform IP lookup on one of the blocks. Similar ideas can be applied for SRAMbased multi-pipeline architectures [21]. Those partitioning-based solutions for power-efficient SRAM-based IP lookup engines do not consider either the un-

30 30 ENERGY EFFICIENT INTERNET INFRASTRUCTURE derlying data structure or the traffic characteristics and are orthogonal to the solutions proposed in this chapter. Kaxiras et al. [27] propose a SRAM-based approach called IPStash for power-efficient IP lookup. IPStash replaces the full associativity of TCAMs with set associative SRAMs to reduce power consumption. However, the set associativity depends on the routing table size and thus may not be scalable. For large routing tables, the set associativity is still large, which results in low clock rate and high power consumption. Traffic rate variation has been exploited in some recent papers for reducing power consumption in multi-core processor based IP lookup engines. In [35], clock gating is used to turn off the clock of unneeded processing engines of multi-core network processors to save dynamic power when there is a low traffic workload. In [30], a more aggressive approach of turning off these processing engines is used to reduce both dynamic and static power consumption. Dynamic frequency and voltage scaling are used in [28] and [37], respectively, to reduce the power consumption of the processing engines. However, those schemes still consume large power in the worst case when the traffic rate is consistently high. Some of those schemes require large buffers to store the input packets so that they can determine or predict the traffic rate. But the large packet buffers result in high power consumption. Also, these schemes do not consider the latency for the state transition, which can result in packet loss in case of bursty traffic. 1.6 SUMMARY Power (Energy) consumption has emerged as a new challenge in the design of packet forwarding engines for next generation Internet infrastructure. Though TCAMs are de facto solutions for high-speed packet forwarding, they suffer from high power consumption. We propose mapping state-of-the-art algorithmic solutions onto parallel architectures that are based on low-power memory such as SRAM.

31 REFERENCES 31 We exploited data structure optimization to reduce the power consumption. We formulated the problems by revisiting the conventional time-space trade-off in multi-bit tries. To minimize the worst-case power consumption for a given architecture, a dynamic programming framework was developed to determine the optimal strides for constructing tree bitmap coded multi-bit tries. Simulation using real-life backbone routing tables showed that careful design of the data structure, with awareness of the underlying architecture, could achieve dramatic reduction in power consumption. Different architectures could result in different optimal data structures with respect to power efficiency. We proposed several novel architecture-specific techniques to reduce the dynamic power dissipation in SRAM-based pipelined IP lookup engines. First, the pipeline was built as an inherent cache that exploited effectively the traffic locality with minimum overhead. Second, a local clocking scheme was proposed to exploit the traffic rate variation and to improve the caching performance. Third, a fine-grained memory enabling scheme was used to eliminate unnecessary memory accesses for the input packets. Simulation using real-life traffic traces showed that our solution achieved up to fifteen-fold reduction in dynamic power dissipation. REFERENCES 1. Mohammad J. Akhbarizadeh, Mehrdad Nourani, Deepak S. Vijayasarathi, and T. Balsara. A non-redundant ternary CAM circuit for network search engines. IEEE Trans. VLSI Syst., 14(3): , Gary Anthes. Data Centers Get A Makeover. November Florin Baboescu, Dean M. Tullsen, Grigore Rosu, and Sumeet Singh. A tree based router search engine architecture with single port memories. ISCA, pages , In Proc.

32 32 ENERGY EFFICIENT INTERNET INFRASTRUCTURE 4. Anindya Basu and Girija Narlikar. Fast incremental updates for pipelined forwarding engines. In Proc. INFOCOM, pages 64 74, CACTI Lorenzo De Carli, Yi Pan, Amit Kumar, Cristian Estan, and Karthikeyan Sankaralingam. Flexible lookup modules for rapid deployment of new protocols in high-speed routers. In Proc. SIGCOMM, Joseph Chabarek, Joel Sommers, Paul Barford, Cristian Estan, David Tsiang, and Stephen Wright. Power awareness in network design and routing. In Proc. INFOCOM, pages , Cisco ASR 1000 Series Aggregation Services Routers. sheet c pdf. 9. Cisco CSR-1 Carrier Routing System Colby Walsworth, Emile Aben, kc claffy, Dan Andersen. The caida anonymized 2009 internet traces dataset.xml. 11. Cypress Sync SRAMs Minh Q. Do, Mindaugas Drazdziulis, Per Larsson-Edefors, and Lars Bengtsson. Parameterizable architecture-level SRAM power model using circuit-simulation backend for leakage calibration. In Proc. ISQED, pages , Will Eatherton, George Varghese, and Zubin Dittia. Tree bitmap: hardware/software IP lookups with incremental updates. SIGCOMM Comput. Commun. Rev., 34(2):97 122, Maruti Gupta and Suresh Singh. Greening of the Internet. In Proc. SIGCOMM, pages 19 26, Pankaj Gupta, Steven Lin, and Nick McKeown. Routing lookups in hardware at memory access speeds. In Proc. INFOCOM, pages , Pankaj Gupta and Nick McKeown. Algorithms for packet classification. IEEE Network, 15(2):24 32, Jahangir Hasan and T. N. Vijaykumar. Dynamic pipelining: making ip-lookup truly scalable. In Proc. SIGCOMM, pages , 2005.

33 REFERENCES IDT Network Search Engines Weirong Jiang and Viktor K. Prasanna. A memory-balanced linear pipeline architecture for trie-based IP lookup. In Proc. Hot Interconnects (HotI 07), pages 83 90, Weirong Jiang and Viktor K. Prasanna. Parallel IP Using Multiple SRAMbased Pipelines. In Proc. IPDPS, Weirong Jiang and Viktor K. Prasanna. Towards green routers: Depth-bounded multi-pipeline architecture for power-efficient IP lookup. In Proc. IPCCC, pages , Weirong Jiang and Viktor K. Prasanna. Reducing dynamic power dissipation in pipelined forwarding engines. In Proc. ICCD, Weirong Jiang, Qingbo Wang, and Viktor K. Prasanna. Beyond TCAMs: An SRAM-based parallel multi-pipeline architecture for terabit IP lookup. In Proc. INFOCOM, pages , Juniper Networks SRX 5000 Services Gateways Juniper Networks T-series Routing Platforms Juniper Networks T1600 Core Router Stefanos Kaxiras and Georgios Keramidas. IPStash: a set-associative memory approach for efficient IP-lookup. In INFOCOM, pages , Alan Kennedy, Xiaojun Wang, Zhen Liu, and Bin Liu. Low power architecture for high speed packet classification. In Proc. ANCS, pages , Kun Suk Kim and Sartaj Sahni. Efficient construction of pipelined multibit-trie router-tables. IEEE Trans. Comput., 56(1):32 43, Ravi Kokku, Upendra B. Shevade, Nishit S. Shah, Mike Dahlin, and Harrick M. Vin. Energy-Efficient Packet Processing. In Sailesh Kumar, Michela Becchi, Patrick Crowley, and Jonathan Turner. CAMP: fast and efficient IP lookup architecture. In Proc. ANCS, pages 51 60, 2006.

Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup

Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup Weirong Jiang, Qingbo Wang and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Weirong Jiang, Viktor K. Prasanna University of Southern California Norio Yamagaki NEC Corporation September 1, 2010 Outline

More information

A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup

A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup 5th IEEE Symposium on High-Performance Interconnects A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup Weirong Jiang and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering

More information

Scalable High Throughput and Power Efficient IP-Lookup on FPGA

Scalable High Throughput and Power Efficient IP-Lookup on FPGA Scalable High Throughput and Power Efficient IP-Lookup on FPGA Hoang Le and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, USA {hoangle,

More information

SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA. Hoang Le, Weirong Jiang, Viktor K. Prasanna

SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA. Hoang Le, Weirong Jiang, Viktor K. Prasanna SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA Hoang Le, Weirong Jiang, Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los

More information

CS419: Computer Networks. Lecture 6: March 7, 2005 Fast Address Lookup:

CS419: Computer Networks. Lecture 6: March 7, 2005 Fast Address Lookup: : Computer Networks Lecture 6: March 7, 2005 Fast Address Lookup: Forwarding/Routing Revisited Best-match Longest-prefix forwarding table lookup We looked at the semantics of bestmatch longest-prefix address

More information

FPGA Implementation of Lookup Algorithms

FPGA Implementation of Lookup Algorithms 2011 IEEE 12th International Conference on High Performance Switching and Routing FPGA Implementation of Lookup Algorithms Zoran Chicha, Luka Milinkovic, Aleksandra Smiljanic Department of Telecommunications

More information

Power Efficient IP Lookup with Supernode Caching

Power Efficient IP Lookup with Supernode Caching Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu * and Lide Duan Department of Electrical & Computer Engineering Louisiana State University Baton Rouge, LA 73 {lpeng, lduan1}@lsu.edu

More information

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Shijie Zhou, Yun R. Qu, and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering, University of Southern

More information

Dynamic Pipelining: Making IP- Lookup Truly Scalable

Dynamic Pipelining: Making IP- Lookup Truly Scalable Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University SIGCOMM 05 Rung-Bo-Su 10/26/05 1 0.Abstract IP-lookup

More information

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011 // Bottlenecks Memory, memory, 88 - Switch and Router Design Dr. David Hay Ross 8b dhay@cs.huji.ac.il Source: Nick Mckeown, Isaac Keslassy Packet Processing Examples Address Lookup (IP/Ethernet) Where

More information

IP Forwarding. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli

IP Forwarding. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli IP Forwarding CSU CS557, Spring 2018 Instructor: Lorenzo De Carli 1 Sources George Varghese, Network Algorithmics, Morgan Kauffmann, December 2004 L. De Carli, Y. Pan, A. Kumar, C. Estan, K. Sankaralingam,

More information

Frugal IP Lookup Based on a Parallel Search

Frugal IP Lookup Based on a Parallel Search Frugal IP Lookup Based on a Parallel Search Zoran Čiča and Aleksandra Smiljanić School of Electrical Engineering, Belgrade University, Serbia Email: cicasyl@etf.rs, aleksandra@etf.rs Abstract Lookup function

More information

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA 2010 International Conference on Field Programmable Logic and Applications Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Weirong Jiang, Viktor K. Prasanna Ming Hsieh Department

More information

Energy-Efficient Parallel Packet Forwarding

Energy-Efficient Parallel Packet Forwarding Chapter 8 Energy-Efficient Parallel Packet Forwarding Weirong Jiang and Viktor K. Prasanna Abstract. As the Internet traffic continues growing rapidly, parallel packet forwarding becomes a necessity in

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

An Architecture for IPv6 Lookup Using Parallel Index Generation Units

An Architecture for IPv6 Lookup Using Parallel Index Generation Units An Architecture for IPv6 Lookup Using Parallel Index Generation Units Hiroki Nakahara, Tsutomu Sasao, and Munehiro Matsuura Kagoshima University, Japan Kyushu Institute of Technology, Japan Abstract. This

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

IP packet forwarding, or simply, IP-lookup, is a classic

IP packet forwarding, or simply, IP-lookup, is a classic Scalable Tree-based Architectures for IPv4/v6 Lookup Using Prefix Partitioning Hoang Le, Student Member, IEEE, and Viktor K. Prasanna, Fellow, IEEE Abstract Memory efficiency and dynamically updateable

More information

LONGEST prefix matching (LPM) techniques have received

LONGEST prefix matching (LPM) techniques have received IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 14, NO. 2, APRIL 2006 397 Longest Prefix Matching Using Bloom Filters Sarang Dharmapurikar, Praveen Krishnamurthy, and David E. Taylor, Member, IEEE Abstract We

More information

Growth of the Internet Network capacity: A scarce resource Good Service

Growth of the Internet Network capacity: A scarce resource Good Service IP Route Lookups 1 Introduction Growth of the Internet Network capacity: A scarce resource Good Service Large-bandwidth links -> Readily handled (Fiber optic links) High router data throughput -> Readily

More information

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers ABSTRACT Jing Fu KTH, Royal Institute of Technology Stockholm, Sweden jing@kth.se Virtual routers are a promising

More information

Routers: Forwarding EECS 122: Lecture 13

Routers: Forwarding EECS 122: Lecture 13 Input Port Functions Routers: Forwarding EECS 22: Lecture 3 epartment of Electrical Engineering and Computer Sciences University of California Berkeley Physical layer: bit-level reception ata link layer:

More information

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table Recent Results in Best Matching Prex George Varghese October 16, 2001 Router Model InputLink i 100100 B2 Message Switch B3 OutputLink 6 100100 Processor(s) B1 Prefix Output Link 0* 1 100* 6 1* 2 Forwarding

More information

Line-rate packet processing in hardware: the evolution towards 400 Gbit/s

Line-rate packet processing in hardware: the evolution towards 400 Gbit/s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 259 268 doi: 10.14794/ICAI.9.2014.1.259 Line-rate packet processing in hardware:

More information

Implementation of Boundary Cutting Algorithm Using Packet Classification

Implementation of Boundary Cutting Algorithm Using Packet Classification Implementation of Boundary Cutting Algorithm Using Packet Classification Dasari Mallesh M.Tech Student Department of CSE Vignana Bharathi Institute of Technology, Hyderabad. ABSTRACT: Decision-tree-based

More information

Routers: Forwarding EECS 122: Lecture 13

Routers: Forwarding EECS 122: Lecture 13 Routers: Forwarding EECS 122: Lecture 13 epartment of Electrical Engineering and Computer Sciences University of California Berkeley Router Architecture Overview Two key router functions: run routing algorithms/protocol

More information

ECE697AA Lecture 20. Forwarding Tables

ECE697AA Lecture 20. Forwarding Tables ECE697AA Lecture 20 Routers: Prefix Lookup Algorithms Tilman Wolf Department of Electrical and Computer Engineering 11/14/08 Forwarding Tables Routing protocols involve a lot of information Path choices,

More information

Efficient Packet Classification using Splay Tree Models

Efficient Packet Classification using Splay Tree Models 28 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.5B, May 2006 Efficient Packet Classification using Splay Tree Models Srinivasan.T, Nivedita.M, Mahadevan.V Sri Venkateswara

More information

15-744: Computer Networking. Routers

15-744: Computer Networking. Routers 15-744: Computer Networking outers Forwarding and outers Forwarding IP lookup High-speed router architecture eadings [McK97] A Fast Switched Backplane for a Gigabit Switched outer Optional [D+97] Small

More information

Shape Shifting Tries for Faster IP Route Lookup

Shape Shifting Tries for Faster IP Route Lookup Shape Shifting Tries for Faster IP Route Lookup Haoyu Song, Jonathan Turner, John Lockwood Applied Research Laboratory Washington University in St. Louis Email: {hs1,jst,lockwood}@arl.wustl.edu Abstract

More information

Packet Classification Using Dynamically Generated Decision Trees

Packet Classification Using Dynamically Generated Decision Trees 1 Packet Classification Using Dynamically Generated Decision Trees Yu-Chieh Cheng, Pi-Chung Wang Abstract Binary Search on Levels (BSOL) is a decision-tree algorithm for packet classification with superior

More information

A Trie Merging Approach with Incremental Updates for Virtual Routers

A Trie Merging Approach with Incremental Updates for Virtual Routers 213 Proceedings IEEE INFOCOM A Trie Merging Approach with Incremental Updates for Virtual Routers Layong Luo *, Gaogang Xie *,Kavé Salamatian, Steve Uhlig, Laurent Mathy, Yingke Xie * * Institute of Computing

More information

Last Lecture: Network Layer

Last Lecture: Network Layer Last Lecture: Network Layer 1. Design goals and issues 2. Basic Routing Algorithms & Protocols 3. Addressing, Fragmentation and reassembly 4. Internet Routing Protocols and Inter-networking 5. Router design

More information

Shape Shifting Tries for Faster IP Route Lookup

Shape Shifting Tries for Faster IP Route Lookup Shape Shifting Tries for Faster IP Route Lookup Haoyu Song, Jonathan Turner, John Lockwood Applied Research Laboratory Washington University in St. Louis Email: {hs1,jst,lockwood}@arl.wustl.edu Abstract

More information

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers Tania Mishra and Sartaj Sahni Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL 326 {tmishra,

More information

Data Structures for Packet Classification

Data Structures for Packet Classification Presenter: Patrick Nicholson Department of Computer Science CS840 Topics in Data Structures Outline 1 The Problem 2 Hardware Solutions 3 Data Structures: 1D 4 Trie-Based Solutions Packet Classification

More information

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Towards Effective Packet Classification J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Outline Algorithm Study Understanding Packet Classification Worst-case Complexity

More information

Three Different Designs for Packet Classification

Three Different Designs for Packet Classification Three Different Designs for Packet Classification HATAM ABDOLI Computer Department Bu-Ali Sina University Shahid Fahmideh street, Hamadan IRAN abdoli@basu.ac.ir http://www.profs.basu.ac.ir/abdoli Abstract:

More information

Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression

Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression Hoang Le Ming Hsieh Dept. of Electrical Eng. University of Southern California Los Angeles, CA 989, USA Email: hoangle@usc.edu

More information

FlashTrie: Hash-based Prefix-Compressed Trie for IP Route Lookup Beyond 100Gbps

FlashTrie: Hash-based Prefix-Compressed Trie for IP Route Lookup Beyond 100Gbps FlashTrie: Hash-based Prefix-Compressed Trie for IP Route Lookup Beyond 100Gbps Masanori Bando and H. Jonathan Chao Department of Electrical and Computer Engineering Polytechnic Institute of NYU, Brooklyn,

More information

A Pipelined IP Address Lookup Module for 100 Gbps Line Rates and beyond

A Pipelined IP Address Lookup Module for 100 Gbps Line Rates and beyond A Pipelined IP Address Lookup Module for 1 Gbps Line Rates and beyond Domenic Teuchert and Simon Hauger Institute of Communication Networks and Computer Engineering (IKR) Universität Stuttgart, Pfaffenwaldring

More information

A Hybrid IP Lookup Architecture with Fast Updates

A Hybrid IP Lookup Architecture with Fast Updates A Hybrid IP Lookup Architecture with Fast Updates Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian To cite this version: Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian.

More information

Lecture 11: Packet forwarding

Lecture 11: Packet forwarding Lecture 11: Packet forwarding Anirudh Sivaraman 2017/10/23 This week we ll talk about the data plane. Recall that the routing layer broadly consists of two parts: (1) the control plane that computes routes

More information

Unique Journal of Engineering and Advanced Sciences Available online: Research Article

Unique Journal of Engineering and Advanced Sciences Available online:   Research Article ISSN 2348-375X Unique Journal of Engineering and Advanced Sciences Available online: www.ujconline.net Research Article A POWER EFFICIENT CAM DESIGN USING MODIFIED PARITY BIT MATCHING TECHNIQUE Karthik

More information

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification Yaxuan Qi, Jun Li Research Institute of Information Technology (RIIT) Tsinghua University, Beijing, China, 100084

More information

AN EFFICIENT HYBRID ALGORITHM FOR MULTIDIMENSIONAL PACKET CLASSIFICATION

AN EFFICIENT HYBRID ALGORITHM FOR MULTIDIMENSIONAL PACKET CLASSIFICATION AN EFFICIENT HYBRID ALGORITHM FOR MULTIDIMENSIONAL PACKET CLASSIFICATION Yaxuan Qi 1 and Jun Li 1,2 1 Research Institute of Information Technology (RIIT), Tsinghua University, Beijing, China, 100084 2

More information

Scalable Packet Classification on FPGA

Scalable Packet Classification on FPGA 1668 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 9, SEPTEMBER 2012 Scalable Packet Classification on FPGA Weirong Jiang, Member, IEEE, and Viktor K. Prasanna, Fellow,

More information

High Performance Architecture for Flow-Table Lookup in SDN on FPGA

High Performance Architecture for Flow-Table Lookup in SDN on FPGA High Performance Architecture for Flow-Table Lookup in SDN on FPGA Rashid Hatami a, Hossein Bahramgiri a and Ahmad Khonsari b a Maleke Ashtar University of Technology, Tehran, Iran b Tehran University

More information

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline Forwarding and Routers 15-744: Computer Networking L-9 Router Algorithms IP lookup Longest prefix matching Classification Flow monitoring Readings [EVF3] Bitmap Algorithms for Active Flows on High Speed

More information

CS 268: Route Lookup and Packet Classification

CS 268: Route Lookup and Packet Classification Overview CS 268: Route Lookup and Packet Classification Packet Lookup Packet Classification Ion Stoica March 3, 24 istoica@cs.berkeley.edu 2 Lookup Problem Identify the output interface to forward an incoming

More information

Multi-dimensional Packet Classification on FPGA: 100 Gbps and Beyond

Multi-dimensional Packet Classification on FPGA: 100 Gbps and Beyond Multi-dimensional Packet Classification on FPGA: 00 Gbps and Beyond Yaxuan Qi, Jeffrey Fong 2, Weirong Jiang 3, Bo Xu 4, Jun Li 5, Viktor Prasanna 6, 2, 4, 5 Research Institute of Information Technology

More information

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors Yaxuan Qi (presenter), Bo Xu, Fei He, Baohua Yang, Jianming Yu and Jun Li ANCS 2007, Orlando, USA Outline Introduction

More information

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid

More information

Problem Statement. Algorithm MinDPQ (contd.) Algorithm MinDPQ. Summary of Algorithm MinDPQ. Algorithm MinDPQ: Experimental Results.

Problem Statement. Algorithm MinDPQ (contd.) Algorithm MinDPQ. Summary of Algorithm MinDPQ. Algorithm MinDPQ: Experimental Results. Algorithms for Routing Lookups and Packet Classification October 3, 2000 High Level Outline Part I. Routing Lookups - Two lookup algorithms Part II. Packet Classification - One classification algorithm

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

Content Addressable Memory with Efficient Power Consumption and Throughput

Content Addressable Memory with Efficient Power Consumption and Throughput International journal of Emerging Trends in Science and Technology Content Addressable Memory with Efficient Power Consumption and Throughput Authors Karthik.M 1, R.R.Jegan 2, Dr.G.K.D.Prasanna Venkatesan

More information

Multiroot: Towards Memory-Ef cient Router Virtualization

Multiroot: Towards Memory-Ef cient Router Virtualization Multiroot: Towards Memory-Ef cient Router Virtualization Thilan Ganegedara, Weirong Jiang, Viktor Prasanna University of Southern California 3740 McClintock Ave., Los Angeles, CA 90089 Email: {ganegeda,

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Da Tong, Viktor Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Email: {datong, prasanna}@usc.edu

More information

CSE 123A Computer Networks

CSE 123A Computer Networks CSE 123A Computer Networks Winter 2005 Lecture 8: IP Router Design Many portions courtesy Nick McKeown Overview Router basics Interconnection architecture Input Queuing Output Queuing Virtual output Queuing

More information

Recursively Partitioned Static IP Router-Tables *

Recursively Partitioned Static IP Router-Tables * Recursively Partitioned Static IP Router-Tables * Wencheng Lu Sartaj Sahni {wlu,sahni}@cise.ufl.edu Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL

More information

Binary Search Schemes for Fast IP Lookups

Binary Search Schemes for Fast IP Lookups 1 Schemes for Fast IP Lookups Pronita Mehrotra, Paul D. Franzon Abstract IP route look up is the most time consuming operation of a router. Route lookup is becoming a very challenging problem due to the

More information

Scalable Lookup Algorithms for IPv6

Scalable Lookup Algorithms for IPv6 Scalable Lookup Algorithms for IPv6 Aleksandra Smiljanić a*, Zoran Čiča a a School of Electrical Engineering, Belgrade University, Bul. Kralja Aleksandra 73, 11120 Belgrade, Serbia ABSTRACT IPv4 addresses

More information

Computer Sciences Department

Computer Sciences Department Computer Sciences Department Flexible Lookup Modules for Rapid Deployment of New Protocols in High-Speed Routers Lorenzo De Carli Yi Pan Amit Kumar Cristian Estan Karthikeyan Sankaralingam Technical Report

More information

Novel Hardware Architecture for Fast Address Lookups

Novel Hardware Architecture for Fast Address Lookups Novel Hardware Architecture for Fast Address Lookups Pronita Mehrotra Paul D. Franzon Department of Electrical and Computer Engineering North Carolina State University {pmehrot,paulf}@eos.ncsu.edu This

More information

Homework 1 Solutions:

Homework 1 Solutions: Homework 1 Solutions: If we expand the square in the statistic, we get three terms that have to be summed for each i: (ExpectedFrequency[i]), (2ObservedFrequency[i]) and (ObservedFrequency[i])2 / Expected

More information

Journal of Network and Computer Applications

Journal of Network and Computer Applications Journal of Network and Computer Applications () 4 Contents lists available at ScienceDirect Journal of Network and Computer Applications journal homepage: www.elsevier.com/locate/jnca Memory-efficient

More information

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including Introduction Router Architectures Recent advances in routing architecture including specialized hardware switching fabrics efficient and faster lookup algorithms have created routers that are capable of

More information

Updating Designed for Fast IP Lookup

Updating Designed for Fast IP Lookup Updating Designed for Fast IP Lookup Natasa Maksic, Zoran Chicha and Aleksandra Smiljanić School of Electrical Engineering Belgrade University, Serbia Email:maksicn@etf.rs, cicasyl@etf.rs, aleksandra@etf.rs

More information

Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification

Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification V.S.Pallavi 1, Dr.D.Rukmani Devi 2 PG Scholar 1, Department of ECE, RMK Engineering College, Chennai, Tamil Nadu, India

More information

Parallel-Search Trie-based Scheme for Fast IP Lookup

Parallel-Search Trie-based Scheme for Fast IP Lookup Parallel-Search Trie-based Scheme for Fast IP Lookup Roberto Rojas-Cessa, Lakshmi Ramesh, Ziqian Dong, Lin Cai, and Nirwan Ansari Department of Electrical and Computer Engineering, New Jersey Institute

More information

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including Router Architectures By the end of this lecture, you should be able to. Explain the different generations of router architectures Describe the route lookup process Explain the operation of PATRICIA algorithm

More information

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley,

More information

High-Performance Network Data-Packet Classification Using Embedded Content-Addressable Memory

High-Performance Network Data-Packet Classification Using Embedded Content-Addressable Memory High-Performance Network Data-Packet Classification Using Embedded Content-Addressable Memory Embedding a TCAM block along with the rest of the system in a single device should overcome the disadvantages

More information

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING James J. Rooney 1 José G. Delgado-Frias 2 Douglas H. Summerville 1 1 Dept. of Electrical and Computer Engineering. 2 School of Electrical Engr. and Computer

More information

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT Hardware Assisted Recursive Packet Classification Module for IPv6 etworks Shivvasangari Subramani [shivva1@umbc.edu] Department of Computer Science and Electrical Engineering University of Maryland Baltimore

More information

FPGA Based Packet Classification Using Multi-Pipeline Architecture

FPGA Based Packet Classification Using Multi-Pipeline Architecture International Journal of Wireless Communications and Mobile Computing 2015; 3(3): 27-32 Published online May 8, 2015 (http://www.sciencepublishinggroup.com/j/wcmc) doi: 10.11648/j.wcmc.20150303.11 ISSN:

More information

Load Balancing with Minimal Flow Remapping for Network Processors

Load Balancing with Minimal Flow Remapping for Network Processors Load Balancing with Minimal Flow Remapping for Network Processors Imad Khazali and Anjali Agarwal Electrical and Computer Engineering Department Concordia University Montreal, Quebec, Canada Email: {ikhazali,

More information

Scaling routers: Where do we go from here?

Scaling routers: Where do we go from here? Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28 th, 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm

More information

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture Generic Architecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

More information

Scalable Ternary Content Addressable Memory Implementation Using FPGAs

Scalable Ternary Content Addressable Memory Implementation Using FPGAs Scalable Ternary Content Addressable Memory Implementation Using FPGAs Weirong Jiang Xilinx Research Labs San Jose, CA, USA weirongj@acm.org ABSTRACT Ternary Content Addressable Memory (TCAM) is widely

More information

Scalable Packet Classification on FPGA

Scalable Packet Classification on FPGA Scalable Packet Classification on FPGA 1 Deepak K. Thakkar, 2 Dr. B. S. Agarkar 1 Student, 2 Professor 1 Electronics and Telecommunication Engineering, 1 Sanjivani college of Engineering, Kopargaon, India.

More information

IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1

IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1 2005 IEEE International Symposium on Signal Processing and Information Technology IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1 G.T. Kousiouris and D.N. Serpanos Dept. of Electrical and

More information

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES Greg Hankins APRICOT 2012 2012 Brocade Communications Systems, Inc. 2012/02/28 Lookup Capacity and Forwarding

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Efficient Packet Classification for Network Intrusion Detection using FPGA

Efficient Packet Classification for Network Intrusion Detection using FPGA Efficient Packet Classification for Network Intrusion Detection using FPGA ABSTRACT Haoyu Song Department of CSE Washington University St. Louis, USA hs@arl.wustl.edu FPGA technology has become widely

More information

Efficient Construction Of Variable-Stride Multibit Tries For IP Lookup

Efficient Construction Of Variable-Stride Multibit Tries For IP Lookup " Efficient Construction Of Variable-Stride Multibit Tries For IP Lookup Sartaj Sahni & Kun Suk Kim sahni, kskim @ciseufledu Department of Computer and Information Science and Engineering University of

More information

Scalable IP Routing Lookup in Next Generation Network

Scalable IP Routing Lookup in Next Generation Network Scalable IP Routing Lookup in Next Generation Network Chia-Tai Chan 1, Pi-Chung Wang 1,Shuo-ChengHu 2, Chung-Liang Lee 1, and Rong-Chang Chen 3 1 Telecommunication Laboratories, Chunghwa Telecom Co., Ltd.

More information

PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS

PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS THE UNIVERSITY OF NAIROBI DEPARTMENT OF ELECTRICAL AND INFORMATION ENGINEERING FINAL YEAR PROJECT. PROJECT NO. 60 PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS OMARI JAPHETH N. F17/2157/2004 SUPERVISOR:

More information

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table H. Michael Ji, and Ranga Srinivasan Tensilica, Inc. 3255-6 Scott Blvd Santa Clara, CA 95054 Abstract--In this paper we examine

More information

Midterm Review. Congestion Mgt, CIDR addresses,tcp processing, TCP close. Routing. hierarchical networks. Routing with OSPF, IS-IS, BGP-4

Midterm Review. Congestion Mgt, CIDR addresses,tcp processing, TCP close. Routing. hierarchical networks. Routing with OSPF, IS-IS, BGP-4 Midterm Review Week 1 Congestion Mgt, CIDR addresses,tcp processing, TCP close Week 2 Routing. hierarchical networks Week 3 Routing with OSPF, IS-IS, BGP-4 Week 4 IBGP, Prefix lookup, Tries, Non-stop routers,

More information

Routing Lookup Algorithm for IPv6 using Hash Tables

Routing Lookup Algorithm for IPv6 using Hash Tables Routing Lookup Algorithm for IPv6 using Hash Tables Peter Korppoey, John Smith, Department of Electronics Engineering, New Mexico State University-Main Campus Abstract: After analyzing of existing routing

More information

Dynamic Pipelining: Making IP-Lookup Truly Scalable

Dynamic Pipelining: Making IP-Lookup Truly Scalable Dynamic Pipelining: Making IP-Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar {hasanj, vijay} @ecn.purdue.edu School of Electrical and Computer Engineering, Purdue University Abstract A truly scalable

More information

DESIGN AND IMPLEMENTATION OF OPTIMIZED PACKET CLASSIFIER

DESIGN AND IMPLEMENTATION OF OPTIMIZED PACKET CLASSIFIER International Journal of Computer Engineering and Applications, Volume VI, Issue II, May 14 www.ijcea.com ISSN 2321 3469 DESIGN AND IMPLEMENTATION OF OPTIMIZED PACKET CLASSIFIER Kiran K C 1, Sunil T D

More information

Multiway Range Trees: Scalable IP Lookup with Fast Updates

Multiway Range Trees: Scalable IP Lookup with Fast Updates Multiway Range Trees: Scalable IP Lookup with Fast Updates Subhash Suri George Varghese Priyank Ramesh Warkhede Department of Computer Science Washington University St. Louis, MO 63130. Abstract In this

More information

Episode 5. Scheduling and Traffic Management

Episode 5. Scheduling and Traffic Management Episode 5. Scheduling and Traffic Management Part 3 Baochun Li Department of Electrical and Computer Engineering University of Toronto Outline What is scheduling? Why do we need it? Requirements of a scheduling

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

Scaling Internet Routers Using Optics Producing a 100TB/s Router. Ashley Green and Brad Rosen February 16, 2004

Scaling Internet Routers Using Optics Producing a 100TB/s Router. Ashley Green and Brad Rosen February 16, 2004 Scaling Internet Routers Using Optics Producing a 100TB/s Router Ashley Green and Brad Rosen February 16, 2004 Presentation Outline Motivation Avi s Black Box Black Box: Load Balance Switch Conclusion

More information