Scalable IP Routing Lookup in Next Generation Network

Scalable IP Routing Lookup in Next Generation Network Chia-Tai Chan 1, Pi-Chung Wang 1,Shuo-ChengHu 2, Chung-Liang Lee 1, and Rong-Chang Chen 3 1 Telecommunication Laboratories, Chunghwa Telecom Co., Ltd. 7F, No. 9 Lane 74 Hsin-Yi Rd. Sec. 4, Taipei, Taiwan 106, R.O.C. {ctchan,abu,chlilee}@cht.com.tw 2 Department of Info. Management Ming-Hsin University of Science and Technology 1 Hsin-Hsing Rd. Hsin-Fong, Hsinchu, Taiwan 304, R.O.C. schu@mis.must.edu.tw 3 Department of Logistics Engineering and Management National Taichung Institute of Technology No. 129, Sec. 3, Sanmin Rd., Taichung, Taiwan 404, R.O.C. rcchens@ntit.edu.tw Abstract. Ternary content-addressable memory has been widely used to perform fast routing lookups. It is able to accomplish the best matching prefix problem in O(1) time without considering the number of prefixes and its lengths. As compared to the software-based solutions, the Ternary content-addressable memory can offer sustained throughput and simple system architecture. It is attractive for IPv6 routing lookup. However, it also comes with several shortcomings, such as the limited number of entries, expansive cost and power consumption. Accordingly, we propose an efficient algorithm to reduce the required size of Ternary contentaddressable memory. The proposed scheme can eliminate 98 percentage of Ternary content-addressable memory entries by adding tiny DRAM. We also address related issues in supporting IPv6 anycasting. 1 Introduction Due to the advance of the World Wide Web and the promise of future e- commerce, it has shown that the Internet access continues to grow exponentially. The Internet has been facing the depletion of IPv4 address. Network administrators must increasingly rely on network address translation (NAT) technologies to deploy network. However, it complicates the network management and breaks the end-to-end principle of the Internet. Some applications cannot work across a NAT device, such as IPsec. Furthermore, the Internet hosts are no longer just computers, but a whole new range of information appliances, that will require global IP addresses. All these issues are the main driving force of IPv6 for its large address space. For example, the current IPv6 address allocation policy recommendation is to H.-K. Kahng (Ed.): ICOIN 2003, LNCS 2662, pp. 46 55, 2003. c Springer-Verlag Berlin Heidelberg 2003

Scalable IP Routing Lookup in Next Generation Network 47 allocate a 48-bit prefix to every site on the Internet, whether homes, small offices, or large enterprise sites. The 48-bit prefix allows 65,000 subnets within each site, each of which could accommodate a virtually infinite number of hosts. IPv6 also brings such benefits as stateless auto-configuration, more efficient mobility management and integrated IPsec. The major obstacle for the design of the high-speed router is the relatively slow IP lookup scheme. To forward packets toward their destinations, a router must perform forwarding decision, the next hop for the incoming packet, based on the information gathered by the routing protocols. Since the development of CIDR in 1993 [1], IP routes have been identified by a (route prefix, prefix length) pair, where the prefix length varies from 1 to 32 bits. Due to the variable prefix lengths, the search of best match prefix (BMP) may be time consuming for a backbone router with a large number of table entries. The exponential growth of the Internet hosts has further stressed the routing system. It is difficult for the packet-forwarding rate to keep up with the increased traffic demand. Specifically, the address lookup operation is a major bottleneck in the forwarding performance of today s routers. Ternary content-addressable memory (TCAM) is one popular hardware device to perform fast IP lookups. As compared to the software-based solutions, the TCAM can offer sustained throughput and simple system architecture, thus makes it attractive for IPv6 routing lookup. However, it also comes with several shortcomings, such as the limited number of entries, power consumption and expansive cost. Specifically, it will need fourfold size of TCAM to process the IPv6 address (128-bit) with identical IPv4 entries. In this article, we propose an efficient algorithm to eliminate 98 percentage of TCAM entries by adding tiny DRAM and also address related issues in supporting IPv6 anycasting. The rest of the paper is organized as follows. Firstly, the related algorithms are introduced in Section 2. Section 3 presents the proposed algorithm. The experiment results are presented in Section 4. Finally, a summary is given in Section 5. 2 Related Works There has been an extensive study in constructing the routing tables during the past few years. The proposals include both hardware and software solutions. In[2], Degermark et al. use a trie-like data structure. The main idea of their work is to quantify the prefix lengths to levels of 16, 24 and 32 bits and expand each prefix in the table to the next higher level. It is able to compact a large routing table with 40,000 entries into a table with size 150-160 Kbytes. The minimum and maximum numbers of memory accesses for a lookup are two and nine, respectively in hardware implementation. Gupta et al. presented fast routinglookup schemes based on a huge DRAM [3]. The scheme accomplishes a routing lookup with the maximum of two memory accesses in the forwarding table of 33 megabytes. By adding an intermediate-length table, the forwarding table can be reduced to 9 megabytes; however, the maximum number of memory accesses

48 Chia-Tai Chan et al. for a lookup is increased to three. When implemented in a hardware pipeline, it can achieve one route lookup every memory access. This furnishes 20 million packets per second (MPPS) approximately. Huang et al. further improve it by fitting the forwarding table into SRAM [4]. Regarding software solutions, algorithms based on tree, hash or binary search have been proposed. Srinivasan et al. [5] present a data structure based on binary tree with multiple branches. By using a standard trie representation with arrays of children pointers, insertions and deletions of prefixes are supported. However, to minimize the size of the tree, dynamic programming is needed. In [6], Karlsson et al. solve the BMP problem by LC tries and linear search. Waldvogel et al. propose a lookup scheme based on a binary search mechanism [7]. This scheme scales very well as the size of address and routing tables grows. It requires a worst-case time of log 2 (address bits) hash lookups. Thus, five hash lookups are needed for IPv4, and seven for IPv6 (128-bit). The software-based schemes can be further improved by using multiway and multicolumn search techniques [8]. Although these approaches feature certain advantages, however, they either use complicated data structures [1, 3, 5, 8] or are not scalable for IPv6 [2, 3, 6]. 3 IPv6-aware Router Design 3.1 Router Architecture Figure 1 schematically depicts the architecture of the IPv6-aware router. With the advent of the high-speed switch capacity and increased traffic volume, distributed routing architecture has became a significant improvement to achieve high capacity in router design. It mainly consists of a network processor module and forwarding engines interconnected to the switching fabric. The network processor executes the routing protocols, such as BGP and OSPF, and maintains a routing table (RT). In each line card, the forwarding engine employs a forwarding table (FT) to make the routing decision for packet forwarding. The forwarding table is derived from the routing table and contains an index of IP prefix associated with an outgoing interface. This separation is used to ensure the routing instability does not impact the performance of packet forwarding engine. The conceptual configuration of the FE is shown in Figure 2. For an incoming IP packet, the header verification, header update, and route-lookup process (based on the destination IP address) are initiated simultaneously. If the IP header is not correct, the packet is dropped, and the lookup is terminated. Otherwise, the packet header is updated (TTL decrement and checksum update), and the route-lookup process provides the next hop (port number) where the packet should be forwarded. The MAC-address substitution module then substitutes the source-mac address and the destination-mac address of the packet before it is forwarded into the interface port. The bottleneck of the FE is the route lookup process, so that our study focuses on the design of a fast and scalable IP lookup scheme.

Scalable IP Routing Lookup in Next Generation Network 49 Network Processor FT RT FE LINE CARD 1 FE LINE CARD 1 FE LINE CARD 2 FE LINE CARD 2 FE LINE CARD N FE LINE CARD N Switching Fabric Fig. 1. Architecture of the IPv6-aware router 3.2 TCAM Entry-Reduction Algorithm With the compelling technical advantages, TCAM based networking devices become a preferred solution for fast, sophisticated IP packet forwarding. The original of the trend is in humble beginnings. The first binary Content Addressable Memories brought to market in the early 1990s and suffered from various limitations. It was too expensive and the software based look-up solutions were adequate to modest traffic-forwarding loads at that time. However, as faster line rates began choking the legacy table lookup solutions, CAM based design became increasingly attractive due to their massively parallel and highly deterministic search characteristics to rapidly perform IP lookup. The introduction of TCAM opened new possibilities, particularly for the BMP problems. TCAM devices are able to prioritize search results in such a way that multiple search matches, corresponding to different prefix lengths, could be resolved in accordance with BMP requirements. We use an example to explain the TCAM operation, as shown in Figure 3. There are six routing prefixes expressed as the form <prefix/prefixlength>. For an incoming packet with destination address 140.113.215.207, the TCAM performs searching within all the prefixes in parallel. Several prefixes may match the destination address (P1 IP Header TTL Decrement and Checksum Update Header Verification IPv4/IPv6 Route Lookup MAC Address Substitution Fig. 2. Function Components of the IPv4/IPv6 forwarding engine

50 Chia-Tai Chan et al. Destination IP Address 140.113.215.207 Prefix Nexthop (P1) 140.113.215/24 210.19.4.33 (P2) 140.113.147/24 210.19.4.10 (P3) 120.3.96/20 188.3.20.3 (P4) 140.113/16 210.19.4.10 (P5) 96.40/16 300.1.1.3 (P6) 12/8 8.2.100.9 Match Indicate Priority Encoder (P1) 140.113.215/24 210.19.4.33 TCAM Entries Fig. 3. lengths Solve the BMP problem by sorting the prefixes in decreasing order of and P4 in this case). A priority encoder then selects the longest matching entry as the result. Even though the application of TCAM technology is growing gradually, it still comes with some limitations. It is clear that TCAM operates with lower clock rate, much higher power consumption/price and larger package as compared with SRAM. In next generation network, the TCAM will be suffering from a limited number of entries. Table management is another issue of TCAM. As described above, the prefixes in the TCAM are listed in sorted order. However, forwarding tables in routers are dynamic; prefixes can be inserted or deleted due to the changes in network status. These changes can occur at the rate as high as 1,000 prefixes per second [9], and hence it is desirable to obtain quick TCAM updates and keep the incremental update time as small as possible. Forwarding table updates complicate the keeping of sorted prefixes in the TCAM. This issue is explained with the example in Figure 3. Assume that a new prefix < 120.3.128/18 > is inserted into the forwarding table. It must be stored between prefixes P3 (< 120.3.96/20 >) andp4(< 140.113/16 >) to maintain the sorted order. In the worst case, the inserted prefix is compared to all existed prefixes. Furthermore, inserting the prefix in its correct place involve moving other elements. The naive solution is not efficient for a forwarding engine with large amount of entries. Another solution is to keep few locations for following updates. In [9], Gupta et al. proposed a PLO OPT algorithm. By keeping all the unused entries in the center of the TCAM, each prefix insertion/deletion will cause prefixes swap between different lengths. Hence, the worst-case update time is W/2. The reduction of TCAM entries can improve the TCAM performance in terms of power consumption, price and board usage. Consequently, we introduce a novel TCAM entry-reduction algorithm. The new scheme can reduce the TCAM entries, and also make the current TCAM to support large IPv6 routing table. The basic idea is to merge multiple routing prefixes into one representative prefix. The generated prefixes are inserted into the TCAM and the original multiple prefixes are recorded in the extra memory, such as DRAM. The routing

Scalable IP Routing Lookup in Next Generation Network 51 Prefixes Default Route P 1 00000 P 2 00001 P 3 001 01 P 5 010000 P 6 010001 P 7 0101 P 8 0111 P 9 11 P 1 P 2 Binary Tree Representation Default Route P 3 P 7 P 8 P 5 P 6 P 9 Fig. 4. Representation of the routing prefixes with binary tree lookup consists of one TCAM and one DRAM access. Although one extra memory access is required, it can use pipeline design to alleviate the performance degradation. However, the system design cost can be reduced dramatically since DRAM is much cheap and power saving. To begin with the proposed algorithm, we have to construct the binary tree from the routing prefixes. Assume there are 10 prefixes in the routing table, including the default route. The TCAM will need ten entries to record the complete routing information. The binary tree is constructed according to the bit stream of the routing prefixes, as shown in Figure 4. From the binary tree, we can divide the tree into multiple subtrees. Let the height of the subtrees equal to 2, the binary tree can be divided into four Binary Tree Representation Default Route P 9 P 3 P 1 P 2 P 7 P 8 P 5 P 6 Fig. 5. Four subtrees with a height of two-bit

52 Chia-Tai Chan et al. subtrees, as shown in Figure 5. Each routing prefix will be contained in at least one subtree. Consequently, the four bit streams corresponding to the roots of the subtree are inserted into TCAM, rather than the original prefixes. The detailed algorithm of the subtree construction is listed below. It selects the subtree from the leaves of binary tree. It is because that each leaf (i.e., routing prefix) must be covered. The bottom-up construction can obtain minimum number of subtrees. Subtree Construction Algorithm: Input:The root of the constructed binary tree from the routing table Output:The constructed subtrees and the corresponding bit stream Constructor (Parent, Current.Root, Current.Depth); Begin If((the depth of the deepest child in the subtree - Current.Depth) == H) Generate Subtree (Parent, Current.Root); Else Constructor(New.Parent,Current.Root >Left.Child,Current.Depth+1); Constructor(New.Parent,Current.Root >Right.Child,Current.Depth+1); If ((the length of the longest unprocessed prefix - Current.Depth) == H) Generate Subtree (Parent, Current.Root); Endif End Routing Lookup: The routing lookup procedure is divided into two parts. The first one is to find out the best match subtree. The second step is to extract the best match prefix within the subtree. Since the size of the subtree is much smaller, we can expand it completely and put it into the DRAM. In our example, each subtree will be expanded to four entries. The entries of TCAM will indicate their corresponding addresses of subtrees, as shown in Figure 6. Consequently, the follow-up two bits behind the length of matched bits are used to select the coincident entry. For example, we perform the search for address 0101. It will match the P3 in the TCAM whose length is 2 and corresponding subtree is S2. Consequently, the third and fourth bits ( 0 and 1 respectively) indicate the second entry in the S2 and the best matching prefix is P2. Route Update: Since the original routing prefixes have been encapsulated in the subtrees, each route update must enquire the best matching subtree to check whether the updated prefix can be covered. If yes, the related entries in the DRAM will be modified. Otherwise, the routing prefix is inserted into the TCAM directly. After accumulating a certain number of such entries, we can rebuild the subtrees to keep the number of TCAM entries few. The worst-case update time is equal to max(w/2, 2 height of subtree ). Routing Lookup for IPv6 Anycasting: Anycasting is a new addressing scheme in IPv6. Each anycast address could indicate a set of servers. Any con-

Scalable IP Routing Lookup in Next Generation Network 53 Prefix (P1) 0100/4 (P2) 00/2 (P3) 01/2 Nexthop S 4 S 3 S 2 (P4) */0 S 1 TCAM Entries 00 01 10 11 Default Default P 9 P 7 P 8 P 1 P 2 P 3 P 6 P 3 DRAM Entries Fig. 6. Reorganize the binary tree into four subtrees with a height of two-bit P 5 nection to the anycast address will be routed to the nearest server. This can help to balance the server load and improve network resiliency. A route to an anycast address can be treated as a host route. Thus the routing lookup for anycasting is exact-matching, rather than best-matching. We could handle the anycast addresses by using an additional CAM (or hash table). The routing lookup for each packet is performed for both TCAM and CAM while the result of CAM has higher priority.. 4 Performance Evaluation Through experiments, we demonstrate that the proposed scheme features much lower TCAM entries with less extra DRAM. Currently, the IPv6 routing tables consists only a few hundred prefixes. To realize the scalability of the proposed scheme, we use the real data available from the IPMA [10] and NLANR [11] projects for comparison, these data provide a daily snapshot of the routing tables used by some major Network Access Points (NAPs). The major performance metrics include the number of generated TCAM entries and the required DRAM storage. Figure 7 shows the process results for different routing tables. We set the height of the subtree as four and eight. For the largest table with 102271 prefixes (NLANR), it generates 30,260 and 6,010 TCAM entries accompanying with 473 Kbytes and 1.5 Mbytes DRAM, respectively. Intuitively, the number of TCAM entries is reduced with larges subtree, but the required DRAM is also increased as well. Both performance metrics are proportionate to the number of prefixes. By changing the height of the subtree, the TCAM entries and the required DRAM can be adjusted according to the practical environments. It has been observed that there are a large amount of single-path prefixes in the routing tables [6]. The sparseness caused by the single-path prefixes will make the subtree construction un-efficient. We can adopt the path compression

54 Chia-Tai Chan et al. Number of TCAM Entries 40,000 35,000 30,000 25,000 20,000 15,000 10,000 TCAM Entries (H=8) TCAM Entries (H=4) Memory Required (H=8) Memory Required (H=4) 1,600 1,400 1,200 1,000 800 600 400 Required DRAM (KBytes) 5,000 200 0 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 100,000 110,000 Number of Prefixes Fig. 7. The performance metrics for different routing tables to eliminate the single-path prefixes before the subtree construction. As shown in Figure 8, the performance is improved significantly. The TCAM entries can be reduced to as low as 3,536 entries with only 884 Kbytes DRAM (Subtree Height = 8) which shows a dramatically improvement. Note that the lookup result may not be the matched prefixes due to the path compression, thus an extra memory access to its shorter prefix would be necessary. 24,000 1,000 Number of TCAM Entries 20,000 16,000 12,000 8,000 TCAM Entries (H=8) TCAM Entries (H=4) Memory Required (H=8) Memory Required (H=4) 900 800 700 600 500 400 300 Required DRAM (KBytes) 4,000 200 100 0 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 100,000 110,000 Number of Prefixes Fig. 8. The performance metrics for different routing tables.(subtree Height = 4, 8; Path Compression Enabled)

5 Conclusion Scalable IP Routing Lookup in Next Generation Network 55 This study investigates the related issues in TCAM-based FE design. To make use of the TCAM in IPv6 routing lookup, we need a more efficient approach. The proposed algorithm reduces the number of TCAM entries by merging routing prefixes into subtrees according to the associative positions. The subtrees roots and their interior information are stored in the TCAM and DRAM, respectively. Each routing lookup procedure consists of one TCAM and one DRAM access which can proceed in pipelining. By adjusting the height of subtrees, we can decide the number of generated TCAM entries and the required DRAM size. If the technique of path compression is applied, both the required TCAM and DRAM can be further reduced. But the scheme will need one extra memory access due to possible incorrect match. In the experiments, we illustrate various performance metrics with different settings. In the best results, the proposed algorithm can eliminate 98 percentage of TCAM entries with adding only 2.2 Mbytes DRAM. We also discuss the process for anycast addresses. We believe that this scheme would simplify the design of the IPv6 routers by alleviating the TCAM cost dramatically. References [1] Y. Rekhter, T. Li, An Architecture for IP Address Allocation with CIDR. RFC 1518, Sept. 1993. 47, 48 [2] M. Degermark, A. Brodnik, S.Carlsson, and S. Pink. Small Forwarding Tables for Fast Routing Lookups, In Proc. ACM SIGCOMM 97, pages 3-14, Cannes, France, Sept. 1997. 47, 48 [3] P. Gupta, S. Lin, and N. McKeown, Routing Lookups in Hardware at Memory Access Speeds, In Proc. IEEE INFOCOM 98, San Francisco, USA, March 1998. 47, 48 [4] N. F. Huang, S. M. Zhao, and J. Y. Pan, A Fast IP Routing Lookup Scheme for Gigabit Switch Routers, In Proc. IEEE INFOCOM 99, New York, USA, March 1999. 48 [5] V. Srinivasan and G. Varghese: Fast IP lookups using controlled prefix expansion. ACM Trans. On Computers, Vol. 17. (1999) 1-40. 48 [6] S. Nilsson and G. Karlsson, IP-Address Lookup Using LC-Tries, IEEE JSAC, 17(6):1083-1029, June 1999. 48, 53 [7] M. Waldvogel, G. vargnese, J. Turner, and B. Plattner, Scalable High Speed IP Routing Lookups, In Proc. ACM SIGCOMM 97, pages 25-36, Cannes, France, Sept. 1997. 48 [8] B. Lampson, V. Srinivasan and G. Varghese, IP Lookups Using Multiway and Multicolumn Search, IEEE/ACM Trans. On Networking, 7(4):324-334, June 1999. 48 [9] D. Shah and P. Gupta, Fast updating Algorithms for TCAMs, IEEE Micro Mag., 21(1):36-47, Jan.-Feb. 2001. 50 [10] Merit Networks, Inc. Internet Performance Measurement and Analysis (IPMA) Statistics and Daily Reports. See http : //www.merit.edu/ipma/routing table/. 53 [11] NLANR Project. See http : //moat.nlanr.net/. 53