Dynamic Pipelining: Making IP- Lookup Truly Scalable

Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University SIGCOMM 05 Rung-Bo-Su 10/26/05 1

0.Abstract IP-lookup scheme must address five challenges of scalability, namely: routing-table size, lookup throughput, implementation cost, power dissipation, and routing-table update cost. 2

Outline 1. Introduction 2. Background 3. Pipelined and Scalable IP-Lookup 4. Brief Review of TCAM-based Schemes 5. Methodology 6. Experimental Results 7. Conclusions 3

1.Introduction Fiber optics enabling high line-rates. Two major problems for IP-lookup First, 2 ns per packet (for a 160 Gbps line-rate and minimum packet size of 40 bytes). Second, a large number of prefixes. 4

1.Introduction Key component: routing-table memory is used to search through the prefixes to locate the one that matches the incoming packet. 5

1.Introduction Five key scaling requirements: Memory required. Keep up with the ever-increasing linerates. Keep the complexity of heat removal and the cost of cooling reasonable. Update implementation cost and complexity 6

1.Introduction Two categories: Trie-based TCAMs. 7

1.Introduction Tries scale well in power but they do not scale well in throughput if they are not pipelined. Two approaches for pipelining tries are: Hardware-level pipelining (HLP) Datastructure-level pipelining (DLP) To solve DLP s problems, we propose scalable dynamic pipelining (SDP). 8

2.Background Requirements: (1) To avoid denial-of-service attacks and instabilities in the network, minimum sized packets streaming in at full line-rate. (2) Provide enough memory (3) Choose the prefix with the longest match. 9

2.Background Trie-Based IP-lookup Schemes 10

2.Background Multiple-bit Stride Tries(striding) 11

2.Background The Need for Pipelined Tries One memory access may take longer than the packet inter-arrival time. The problem is aggravated that perform multiple memory accesses for one lookup. 12

3.Pipelined and Scalable IP- Lookup The observation that pipelining can be used to solve the scalability problem of IP-lookup is not new. Hardware-level pipelined (HLP) scheme. Data-structure-level pipelined (DLP) scheme. 13

3.Pipelined and Scalable IP- Lookup Hardware-Level Pipelining k is the number of levels in the multi-bit trie. d, the total delay of one memory access. one lookup every t seconds. HLP hardware- level pipelines the entire memory holding the trie into k*d/t stages. 14

3.Pipelined and Scalable IP- Lookup Hardware-Level Pipelining 15

3.Pipelined and Scalable IP- Lookup Hardware-Level Pipelining Decoder X Y Memory Array Access 2 X 2 Y Output Multiplex 16

3.Pipelined and Scalable IP- Lookup Hardware-Level Pipelining Decoder X Y Memory Array Access 2 Y 2 X Multiplex 17

3.Pipelined and Scalable IP- Lookup Data-Structure-Level Pipelining places each level of the trie in a different memory, so that each memory is accessed only once per packet lookup. Does not rely on expensive memory technologies or deep hardware pipelining, it scales well in power and implementation cost. 18

3.Pipelined and Scalable IP- Lookup 19

3.Pipelined and Scalable IP- Lookup Three remaining challenges: Scalability in memory size in route-update cost and in lookup throughput. 20

3.Pipelined and Scalable IP- Lookup DLP s Scalability Problems in Memory Size each memory stage should be sufficient for any prefix distribution. for the prefix distribution shown in Figure 4 its worst-case memory size would be no better. 21

3.Pipelined and Scalable IP- Lookup DLP s Scalability Problems in Route-update Cost Multibit trie: Arbitrarily many nodes. Tree Bitmap: Almost doubles the size of each trie node. 22

3.Pipelined and Scalable IP- Lookup DLP s Non-Scalability in Throughput Scalable Dynamic Pipelining(SDP) 00* 0* * 10* 1* 000* 100* 1010* 23

3.Pipelined and Scalable IP- Lookup DELETE: 24

3.Pipelined and Scalable IP- Lookup Jump Nodes: k bits must have an array of 2 k pointers Often there may be only one child and the remaining pointers are null. 25

3.Pipelined and Scalable IP- Lookup Jump Nodes: 26

3.Pipelined and Scalable IP- Lookup Per-Stage Memory Bound (a)binary search tree with N leaves (b)memory size of a trie with jump-nodes for the worst-case prefix distribution of Figure 4, compared to size of 1-bit trie 27

3.Pipelined and Scalable IP- Lookup Per-Stage Memory Bound (c) The space taken at various levels by a trie with jump-nodes, for various prefix distributions 28

3.Pipelined and Scalable IP- Lookup System Architecture shadow trie: a copy of the trie containing all the required auxiliary information. accessed only during the construction or update of the trie. it using slow and cheap memory (DRAM). the modifications access only the shadow trie and the IP-lookups access only the SDP trie. 29

3.Pipelined and Scalable IP- Lookup Ensures that no read operation may encounter the data-structure in an inconsistent or erroneous state. 30

3.Pipelined and Scalable IP- Lookup Optimum Cost Incremental Route-updates 31

3.Pipelined and Scalable IP- Lookup Memory Management Overhead Scalability in Lookup Rate 32

4. Brief Review of TCAM-based Schemes Content Addressable Memory (CAM): Compares all memory locations against the input key to find matching entries. Ternary Content Addressable Memory (TCAM): Supports wild card bits in the entries. Finds the longest matching prefixes in one operation. 33

4. Brief Review of TCAM-based Schemes a single access activates all memory locations, as opposed to just one, a TCAM dissipates a lot more power compared to RAM. TCAMs are pipelined at the hardware level. TCAMs do not scale well in power and implementation cost at high linerates. 34

5. Methodology Utilize CACTI 3.2. CACTI is a tool that models accurately. SRAM and CAM structures. Only for 100nm CMOS technology. 35

6. Experimental Results (a) Worst-case per-stage memory versus trie-levels for DLP 36

6. Experimental Results (b) Worst-case total memory versus trie-levels for HLP 37

6. Experimental Results (c) A comparison of total worst-case memory versus routing table size for various IP-lookup schemes. 38

6. Experimental Results Comparison of power dissipation versus line-rate for various schemes with tables sizes of (a) 250,000 (b) 500,000 (c) 1 million 39

6. Experimental Results Comparison of chip area versus line-rate for various schemes with table sizes of (a) 250,000 (b) 500,000 (c) 1 million prefixes. 40

6. Experimental Results Summary of Results HLP does not scale well in total memory size, power dissipation, route-update cost, and implementation cost. DLP does not scale well in total memory size, lookup throughput, and routeupdate cost. TCAMs do not scale well in implementation cost and power dissipation. 41

7. Conclusions Proposed scalable dynamic pipelining (SDP) Three key innovations: prove a worst-case per-stage memory bound which is significantly tighter than those of previous schemes. This route-update cost is obviously the optimum. Scalability at the data-structure level and hardware level. 42

7. Conclusions SDP naturally scales in power and implementation cost. Using detailed hardware simulation. SDP is the only scheme that achieves all the five scalability requirements. 43