An Efficient Parallel TCAM Scheme for the Forwarding Engine of the Next-generation Router

Size: px
Start display at page:

Download "An Efficient Parallel TCAM Scheme for the Forwarding Engine of the Next-generation Router"

Transcription

1 An Efficient Parallel TCAM Scheme for the Forwarding Engine of the Next-generation Router Bin Zhang, Jiahai Yang, Jianping Wu, Qi Li, Donghong Qin Network Research Center, Tsinghua University Tsinghua National Laboratory for Information Science and Technology(TNList) Beijing, China zhang Abstract Ternary Content-Addressable Memory (TCAM) is a popular hardware device for fast IP address lookup. High link transmission speed of Internet backbone demands more powerful IP address lookup engine. Restricted by the memory access speed, the lookup engine for next-generation routers demands exploiting parallelism among multiple TCAM chips. However, most existing schemes improve lookup performance and reduce power consumption but ignore the update efficiency. In this paper, we propose a crossed address range division and shared caching scheme. We improve the update efficiency significantly by buddy update method while keep low power dissipation by decreasing the number of the triggered TCAMs access in each lookup operation. The lookup throughput is ultra high through adaptive load balance. Our simulation results show that the proposed scheme can achieve an average lookup speedup factor greater than 11 with 12 TCAM chips, on the cost of 10% more memory space and an additional cache chip. I. INTRODUCTION Fast lookup is a key technique for the development of high performance routers [1]. Existing schemes can be classified into Trie-based software approaches and TCAM based hardware approaches. Although many improved methods have been proposed to optimize the original Trie structures [2],[3], Trie-based IP lookup schemes still require several memory accesses per lookup and those accesses may be serialized. It is hard to further improve the performance of the lookup speed in Trie-based approaches because of its intrinsic characteristics, such as access speed of DRAM/SRAM. In contrast, TCAM can perform a lookup operation in a single cycle owing to its parallel access characteristics. Moreover, updating the forwarding table in TCAM-based schemes is generally simpler than that in trie-based schemes. Therefore, TCAMbased lookup has received much attention recently. However, challenges in TCAM-based lookup arise due to the following reasons: a) An IP address may match multiple prefixes in the forwarding table and the longest matching prefix (LMP) should be chosen; b) The 18MB TCAM can only operate at a speed of up to 266MHz and perform 133 millions lookup per second [4], which is barely enough to keep up with the 40Gbps line rate today; c) The size of the routing table has been increasing at a rate of about 10-50k entries per year in the past few years [5] and more storage is needed when IPv6 is widely deployed; d) The changes of prefixes in routing table can occur at the rate of approximately prefixes per second because of the instability of Internet. Frequent updates may consume many computation cycles in the IP lookup engine and result in the degradation of the whole system. e) High power consumption is also traditionally one of the major problems in building the lookup engine. Hence there exist three major issues to be considered to build a lookup engine: a) lookup performance; b) update time; c) power consumption. Existing parallel schemes [6], [7], [8], [9], [10] were put forward to partially improve the system efficiency. [6], [7] focus only on the update time; [8], [9], [10] focus on both lookup throughput and power consumption. It seems hard to deal with all three issues simultaneously. Motivated by this situation, in this paper, we propose a crossed address range division and shared caching scheme to improve the lookup performance, and use a buddy update method to improve the update efficiency. Besides, the scheme keeps the low power consumption by decreasing the number of the triggered TCAM access in each lookup operation. The main ideas of our scheme are as follows: 1) Calculate the TCAM chip number b needed on the basis of the forwarding table entries, chip size and percentage of free space on each chip in the initial state; 2) Partition the forwarding table IP prefixes into b parts roughly equal based on address range, and every part overlaps with its neighbors for traffic balance among them; 3) Update prefixes efficiently by well organizing prefixes and buddy update method; 4) Add a cache TCAM chip for leveraging the whole traffic, the cache mechanism proposed is based on traffic statistical feature. Our scheme is easy to be implemented and scalable and flexible. The rest of the paper is organized as follows. Related works on parallel TCAM schemes are described in section 2. In section 3 we propose our parallel scheme and describe the functionality of each component. We present our theoretical performance analysis in section 4 and simulation results in section 5. Finally, we conclude our work in section 6. II. RELATED WORKS Chip-level parallel TCAMs are deployed to circumvent the limitation of a single TCAM, but there also exist some problems: 1) traffic load balance; 2) update time; 3) power dissipation.

2 Fig. 1. Schematics of the complete implementation architecture Many researches have been conducted to optimize the performance of TCAM-based lookup engines. The partition table is divided by output port into several smaller tables in [6]. The number of tables is equal to the number of output ports on the router. Each table holds a collection of all the entries that map to the output port it corresponds with. Since all entries in a partitioned table map to the same output port, there is no longer a need to keep the entries sorted. When a search occurs, each TCAM looks up the IP address in parallel. Each table outputs the matched lengths to a selection logic. After the selection logic chooses the longest length, the packet is forwarded to the output port based on which table had the longest prefix match. The basic idea of [7] is to partition all prefixes into different sets based on the relationship among them. For a given destination IP address, the prefixes which have ancestor-descendant relation will be matched simultaneously. The depth of a prefix search trie currently does not exceed 7 even including the default prefix so there can be at most 7 matches. If the forwarding table is partitioned into several TCAMs so that there is no ancestordescendant relation in each partitioned TCAM, then it is guaranteed that there exists at most one match in each TCAM. [6], [7] focus only on the update time. The update efficiency is O(1) of both schemes, but the lookup performance is poor and power consumption is high. Panigrahy et al. [8] partition all prefixes into 8 different parts equally based on the address range. They put each part in a TCAM chip. When a search occurs, each TCAM looks up the IP address in parallel based on address range. The packet is forwarded to the output port based on which table had the longest prefix match. Hence the lookup performance can be improved. The power consumption can be saved by decreasing the number of the triggered TCAMs access in each lookup operation, but the improved look performance is very limited due to no further efficient mechanism. To further improve the lookup performance, Zheng et al. [9] proposed a parallel lookup architecture in which each TCAM chip had a redundant partition to balance traffic load. It was assumed that the lookup traffic distribution among IP prefixes can be derived from the long-time statistics of traffic traces. However, the Internet traffic could be very bursty. Mapping routing table partitions into chips based on long-term traffic distribution cannot effectively balance the workload of individual chip during bursty period. When the traffic is temporarily biased to a limited number of routing prefixes, the multiple selectors will frequently access the same block, which will drastically hamper the whole throughput. To address this issue, Lin et al. [10] proposed a logical cache scheme. Instead of mapping routing table partitions into TCAM chips relying on long-term traffic statistics, they reserved a partition on each TCAM chip as a logical cache for dynamic load balancing, which can balance the traffic load adaptively. Both schemes [9], [10] neglect an important factor the update time when they try to improve the lookup and power efficiency. The inner chip partition mechanism of [9], [10] makes the whole system reconstruct when one partition on any of the parallel TCAM chip is full during update. The frequent reconstruction can decrease the whole system efficiency dramatically. The inefficient update of TCAM is the performance bottleneck of whole system. It is very hard to directly improve the update efficiency based on the parallel schemes in [9], [10]. How to redesign the whole parallel scheme to keep the high throughput, low power consumption and at the same time to achieve a fast update is the goal of our scheme. III. PROPOSED CHIP-LEVEL PARALLEL SCHEME The detailed implementation architecture of the parallel lookup engine is presented in Fig. 1. The basic ideas of our scheme focus on: 1) the number of entries on each TCAM chip is approximately equal and well organized for fast update and least reconstruction; 2) the distribution of prefixes on TCAM chip is based on address range, and the address range of a TCAM chip overlaps with its neighbors to balance their traffic; 3) a cache TCAM chip is reserved to hold popular destination IP addresses to balance all traffic. When an incoming packet arrives in, the destination IP

3 address is extracted and delivered to the indexing logic for a comparison. The indexing logic will return a partition number indicating which TCAM (or two neighbor TCAMs) may contain the matching prefixes and tag the packet with it. Before sent to the adaptive load balance logic, the packet will be tagged with a time stamp by the time stamp logic because the incoming packet can be processed in a non-fifo order due to different queuing length and cache missing. The packet is finally sent to the adaptive load balance logic to decide which TCAMs input queue should be sent to. The adaptive load balance logic will select the idlest one among the matching TCAMs and the cache TCAM. There are three rules: 1) If the queuing length of the matching TCAM is equal to the cache TCAM, the matching TCAM (so called home TCAM) has the priority; 2) If the queuing length of two matching TCAMs is equal it will select either one randomly; 3) Otherwise select the idlest one. After the stream reaches the TCAM, the result will be sent to the re-ordering logic which maintains the original sequence by using the time stamp attached. After re-ordering, it outputs the final result. Notice that when cache missing occurs, the packet should be sent back to the adaptive load balance logic instead of directly to the matching TCAM suggested in [10] because the packet maybe hit the cache in the next cycle. Our scheme tries to balance the traffic distribution on parallel TCAM chips and improve their update efficiency, so the following three key issues must be considered: 1) How to partition all prefixes to make them distribute among TCAM chips equally. 2) How to organize the entries on each TCAM to improve the update efficiency and how to update the TCAM chip. 3) How to balance the traffic on all TCAM chips. A. Partition Method The first thing in this approach is the partition rule. A good partition rule must meet two primary conditions: 1) It must create roughly equal partitions; 2) It must be simple so that the indexing logic becomes simple and fast. In other words, locating the target partition by given search key should be possible in one cycle using simple hardware. Generally three kinds of method for partitioning the entire routing table were proposed, i.e., key-id based [9], prefix Triebased [11] and range-based [8], [10] partitioning methods. The key-id approach suffered from uneven sub-table sizes and uncontrolled redundancy, which resulted in a higher memory and power cost. Trie-based partitioning can lower the redundancy and unify sub-table sizes, but it required an extra index TCAM to perform the block selection. As a result, two TCAM accesses were needed for each lookup request. Range-based partitioning can lower the redundancy and unify sub-table sizes to the most through splitting the routing table into multiple buckets with identical size according to the address range, which was introduced by Panigrahy et al. [8] and implemented by Lin et al. using pre-order splitting algorithm [10]. Thus, the range-based partitioning can satisfy condition 1). x s n w q k B0 Fig. 3. Fig. 2. B1 Schematics of the Indexing Logic B2 Prefix falling on multiple partitions B3 Prefix falling into a partition Dividing Prefix into 4 Equal Size Partitions TABLE I PARAMETERS IN OUR SCHEME TCAM number TCAM chip capability (overall prefixes number it can hold) TCAM chip initial occupancy percentage Whole prefix entries number Percentage of the overlapping area Entries number on each chip (excluding the overlapping area) Fig. 2 illustrates the structure of the indexing logic for TCAM chip selection. It includes pairs of parallel comparing logics and an index table. Each pair of parallel comparing logic corresponds to one TCAM chip and is composed of two registers which store the boundary address range of each TCAM partition. Next to the parallel comparing logics is an index table with encoder. The index table which stores the partition distribution information returns a partition number indicating which chip or two neighbor chips may contain the prefix matching the IP address by the encoded information. Because the data width of the indexing logic is fixed and only a simple compare operation is executed, it can work at very high speed. Hence range-based partitioning can also meet condition 2). Based on our analysis we use the range-based partitioning method to divide the routing table. Before partition we must confirm the partition number. Only one bucket/partition is put into a TCAM chip in our scheme, which means the bucket/partition number is equal to the TCAM chip number (except for the cache TCAM chip). This may achieve four advantages: 1) to improve the update efficiency by reducing the reconstruction times; 2) to reduce the TCAM cost by not needing the support of the partitiondisable technique; 3) to save the memory by reducing the repeated number of entries in each TCAM chip; 4) to reduce the registers number in the indexing logic. As Fig. 3 illustrated, there are at most 32 2 =64boundary B4

4 prefixes for IPv4 that need to be present in a TCAM chip (these prefixes fall into multiple ranges), because any prefix that falls into multiple ranges must be on the path from one of these boundaries to the root. Otherwise, it will strictly lie in the interior of one of the regions carved out by these paths. In our scheme, the overlapping entries between two neighbors are far more than 64, so we ignore 64 in the following calculation. The parameters of our scheme are illustrated in Table I. For the first and last TCAM chip, there are only q 2 overlapping areas, we can get: w = x s (n q 2 ) s q 2 (1) When designing the lookup engine, we can get the TCAM chip capability s and the prefix entries number w, and decide the TCAM initial occupancy percentage n and the percentage of the overlapping area q based on demands, then we can calculate the chip number x by (1). For example: if s =40k 36b, n = 50%,w = 200k, q = 10%, we can get x 11, in addition, there is one chip for cache with the same size. Hence we get 12 chips altogether. After that we need to calculate the entries number k in non-overlapping part on each chip, we may use (2) to calculate k. k = w + s q 2 s q (2) x 2 In the example above, we can get k 16756, in addition, there are 2048(= 10% 40k 2 ) entries in each overlapping area. If we calculate the 11 TCAMs numbers together: ) = There will be 4 entries left compared with the total entries number w (w = 200k = = ). We put the residual 4 entries in the last partition. The first and last chip initial occupancy percentages are: %, %; Other TCAM chips initial occupancy percentage is: %. We get a scalable and flexible scheme since we can get the parallel TCAM chip number based on all prefixes number and the TCAM chip capability. We can make every TCAM entries number roughly equal by (1),(2). We fill the TCAM chip with the pre-order splitting method originally proposed in [10]. The main function of the pre-order splitting method is to partition the prefixes into 11 parts roughly equal and record the address range of each part. B. Entries Organization and Update Mechanism As shown in Fig. 4, we put the buckets into TCAMs when getting buckets from pre-order splitting method. The overlapping part is put on the top of the TCAM and at the bottom of the next TCAM, the non-overlapping part is put in the middle of the TCAM. The size of two empty spaces is made equal for efficient update. Bucket1 corresponds to TCAM1,, bucket11 corresponds to TCAM11 respectively. Each partition overlaps with its neighbors, so we need to save two references the high and the low. The high indicates high boundary of the non-overlapping part and low boundary of the non-overlapping part; the low indicates low Fig. 4. Overlapping range-based TCAM architecture boundary of the non-overlapping part and high boundary of the non-overlapping part. We can use all references to divide the partition into two parts the overlapping part and the non-overlapping part, and put them into different position as illustrated above. From Fig. 4, we may draw several properties that will benefit for efficient update: 1) Entries in the TCAM are ordered with the length of IP prefix; 2) A small part of address range overlaps with its neighbors; 3) The position of the different part of the address range is well organized. We propose a buddy update method (Fig. 5) for the TCAM update based on the entries organization. It works as follows: The first case of prefix update is that the update happens in non-full TCAM. The prefix P will be compared with the boundary points to find corresponding partition. If nonoverlapping partition, the prefix P will be copied to the corresponding position at inserting and just be deleted at deleting. We must keep the IP prefix ordered during update, so we need to adjust entries position when inserting or deleting an entry. If the number of entries above the updating prefix P is greater than that of below, we move the below entries. On the contrary, we move the above. For prefix in overlapping area, the update operation should be done in both of the two neighbor TCAMs. During update when one of the two free spaces on each TCAM chip is exhausted we need to re-adjust

5 entries to make the size of two free spaces still equal. We could easily detect the update efficiency in this case. If the total number of the entries is n, and the TCAM s number is x, the update efficiency would be O(n/2x) in this case. The second case of prefix update is that the update happens in a full TCAM chip. The full chip needs to get free space from one of its neighbors in this case. The full chip will detect the free space of all its neighbors to find a larger one which we call it buddy. It is not difficult to balance the size of two adjacent TCAM chips dynamically using pre-order algorithm. New reference nodes and new address range of the two buddies are redecided after the pre-order algorithm. The two buddies can get equal free space size after adjustment. A threshold of the free space size will be set. The buddy adjustment should be done when the buddy s free space size is greater than the threshold. The reason we set a threshold is that we should not go on buddy adjustment when all the neighbors are nearly full, which is too inefficiency. The system has to be reconstructed when all neighbors free space size is less than the threshold. Since we have taken care of the future update at designing chip s initial occupancy percentage, it is very unlikely to have reconstruction happen. Normally in this situation we should notice whether the hardware needs to be updated to adapt to the growing prefixes. The update efficiency would be O(2n/x) in this case. C. Traffic Balance There are two mechanisms to balance traffic. The first is redundancy mechanism and the second is cache mechanism. Chip-level TCAM redundancy mechanism is derived originally from [9]. Different from [9], the redundancy mechanism in our scheme is proposed by overlapping the address range among neighbors. The redundancy mechanism may work well if traffic is evenly distributed or a little uneven, but the system performance will decline sharply for the bursty traffic. It is easy to think of using cache scheme to further improve the performance. A logical cache structure was put forward in [10], but the logical cache update will stop the lookup operation of the chip which will dramatically decrease the lookup efficiency due to frequent cache update. In our scheme we propose a shared cache TCAM to hinder this happen. Derived from the traditional cache schemes [12], [13], [14], a straightforward design with cache is to deploy a first stage caches working in front of a second stage data TCAMs. An obvious drawback of this conventional approach is two TCAM accesses were needed for each lookup request. Hence the cache is required to operate at N times the speed of TCAM if there are N parallel TCAMs in the system, which is impractical [10]. Different from the traditional cache schemes, our shared cache works in parallel with other TCAM chips, which means only one TCAM access is needed for each lookup request. The cache TCAM chip will not hinder the operation of other TCAM chips so the cache chip can have the same speed with other TCAM chips, which saves the hardware cost greatly. There are two kinds of algorithms for cache mechanisms proposed in route lookup: caching destination IP addresses Buddy_update (P,op) { /*Compare Prefix P with all boundary references to detect the home TCAMs need update */ for (i =0; i<=max(lrefence_index); i++){ if (P>=lReference_node[i](lowRange)) and (P<= hreference_node[i](highrange)) then{ //P TCAM[i] and P is in non-overlapping part tcam_update(p, i, op); if (P>=hReference_node[i](highRange)+1) and (P<= lreference_node[i](lowrange)+1) then{ //P TCAM[i] and P TCAM[i+1] P is in overlapping part of TCAMi and TCAMi+1 //we need update TCAMi and TCAMi+1 tcam_update(p, i, op); tcam_update(p,i+1,op); Tcam_update(P,id,op) { Search P and find its position; If (op= insert ) and (P exsits) then{ cover(p); //if p have exsited just cover the old entry Exit; If (tcam[id] is not full) then{ If (upper_space(size)=0) or (lower_space(size)=0) then Move entries to make upper_space(size)= lower_space(size); If upper_entry(p)>lower_entry(p) then{ If op= insert then{ move upper entries upwards; Insert(P); Else if op= delete then{ Delete(P) Move lower entries upwards; Else { // upper_ entry (P)<=lower_ entry(p) If op= insert then{ move lower entries downwards; Insert(P); Else if op= delete then{ Delete(P) Move upper entries downwards; Else{ // TCAM is full If tcam[i-1](size)> tcam[i+1](size) then buddy=i-1; Else buddy=i+1; If tcam[buddy](size)>threshold then Preorder-split(2); Adjust two buddies; Else{ Preorder-split(b); // b is all partitions number Reconstruct the whole system; Fig. 5. Buddy Update Method (DIPs) [12], [13], [14] and caching prefixes [15]. We use the caching DIPs method since caching prefixes needs to consider LMP. Although Mohammad et al. [15] put forward RRC-ME algorithm to tackle LMP, the immediate cache update in the event of cache missing is very difficult since the system has to evaluate the minimal expansion prefix decomposition of the corresponding prefix. Caching DIPs can also save hardware cost since the caching TCAM chip can be replaced by a cheeper Binary CAM chip if only caching DIPs. Another reason we use the caching DIPs is the high degree of locality of DIPs [14]. Based on the adaptive load balance logic rules in our scheme, we may see that the unusual DIPs are more likely to be sent to its home TCAM chip and the popular DIPs seems to be sent to the cache TCAM. If the traffic DIPs distribution is a heavy-tailed distribution which means large numbers of DIPs occurs fewer and a few DIPs occurs more times, our arbitration mechanism can obviously benefit from it. We need to study the traffic DIPs distribution for our cache

6 mechanism. The traces we used were collected from the core router of the China Educational & Research Network (CERNET) located at Tsinghua University from 2007/07/07 to 2007/07/17. CERNET is a national academic computer network. It is one of the biggest networks in China with over 18 millions users. There are over 1300 schools, research or government institutions among 31 provinces connected to CERNET via 8 regional network centers. The traces were collected by a special hardware device and only contain IP headers for saving space. Through long time statistics of the DIPs distribution from different traces, we found the similar distribution of different time and different time scales (less than half an hour, we do not test longer time scales). We plot the frequency versus the emerging times of DIPs in log-log scale in Fig. 6. Fig. 6 a) - d) present a distribution of 10000, , and continuous packets DIPs respectively. From Fig. 6 we notice that the DIPs distribution approximately follows power-law and the tail is much heavier than the power-law in different time scales, which implies that not only the high degree of temporal locality exists but also the temporal locality could last a long time and be very steady. We believe that DIPs lookup speed can be increased by taking advantage of the steady temporal locality. If an address is referenced, it will likely be used again soon and last for a period of time. From Fig. 6 d) we can see more than 10 5 DIPs occur only once and 1 DIP occurs 10 5 times. We propose the cache TCAM to capture commonly referenced addresses prefixes for fast lookup. Our statistics shows that we can get an average hit rate over 98% for the immediate update TCAM cache mechanism with LRU algorithm. One reason we propose the immediate update is the temporal locality, another reason is that in our scheme the packet should be sent back to the adaptive load balance logic when cache missing. The adaptive load balance logic may send the missing packet back to cache TCAM again, the packet may hit the cache in the next cycle by immediate update. On the contrary, the missing packet will be sent directly to the home TCAM in [10] due to their slow-update mechanism, because cache missing will always happen if the missing packet is sent to other TCAMs logical cache during the update interval. It will seriously increase the burden of the home TCAM during the slow-update interval due to temporal locality. IV. PERFORMANCE ANALYSIS A. Lookup Performance The speedup factor is 1 in the worst case in [9], [10]. The worst case in [9] happens when the bursty stream only hits one TCAM chip during an interval. The worst case in [10] happens when the bursty stream hits one TCAM chip and the matching prefixes are not in logical cache during an interval. Since the logical cache size is very limited and the cache mechanism adopted in [10] is slow-update, in which the cache can not be updated during the slow-update interval, cache missing happens all the time in the worst case. In our scheme the worst case also happens when bursty stream only hits one Freqency Freqency a) Consecutive Packets Times c) Consecutive Packets Times Fig. 6. Freqency Freqency b) Consecutive Packets Times d) Consecutive Packets Times The Destination IP address Distribution TCAM. The cache size in our architecture is big enough to hold all DIPs during that interval and the cache will be updated immediately when cache missing happens, which can assure that the cache TCAM will work in parallel with the home TCAM in the worst case, so the worst case speedup factor can be 2 for our scheme. The speedup factor of our scheme is apparently n (n indicates all TCAM chips number) in the best case as in [9], [10] when all TCAMs work in parallel. The speedup factor in average case depends on the incoming traffic s feature. Zheng et al. [9] use queuing theory to model the lookup engine and assume that the arrival process of the incoming stream is a Poisson process, which is not the case for real Internet traffic [16]. Lin et al. [10] calculated the lower bound speedup factor which can reach n 1 on the basis that the average hit rate is no less than n 2 n 1. Here we will not give a theoretical analysis of the speedup factor in the average case because it is very hard for real traffic. We will give an experimental result for the average speedup factor in section 5. B. Update Efficiency One TCAM chip is divided into several buckets in [9], [10]. The whole system has to reconstruct when any bucket is full on any chip even if there are many free spaces in other buckets on the same chip, which will drastically hamper the whole system s performance. If the total number of entries is n, and the buckets number is y, the update efficiency is O(n/y) for inner-bucket update. The system has to reconstruct for interbucket update. There is only one bucket on each chip in our system; the system only needs inner-chip update before the corresponding chip reaches full state. Even if the chip is full we can use buddy adjustment to get free space from its buddy. The reconstruction of our system is very unlikely to happen, which will increase the update efficiency of the whole system dramatically. If the total number of entries is n, and the TCAM s number is x,

7 TABLE II TRACES AND RESULTS Trace Time(h) 6:00 10:00 12:00 13:00 14:00 15:00 17:30 18:00 18:30 19:00 19:30 20:00 21:00 23:00 23:30 Speedup Factor a)trace1 on chips b)trace2 on chips c)trace3 on chips d)speedup Factor Distribution of Trace1-3 Fig. 7. Trace1-3 Distribution on Each TCAM and Speedup Factors a)trace4-6 b)trace7-9 c)trace10-12 d)trace Fig. 8. Trace4-15 Speedup Factors Distribution the update efficiency is O(n/2x) for inner-chip update and O(2n/x) for inter-chip update on average. C. Power Consumption There are two methods to decrease the number of entries triggered during a lookup operation. One is to store the entries in multiple small chips instead of a single large one. The other is to partition one chip into small blocks and trigger only one of them during a lookup operation, which needs the hardware (TCAM) to support the partition-disable function. Our proposed scheme uses the first method and needs not the support of the partition-disable function. Let n denotes the chips number, and x denotes the parallel working chips number. Assume the power consumption for TCAM without partitioning search is P, the power consumption should be Px/n at any time. The max power consumption is P when all chips work in parallel, and the lookup speedup factor is apparently n in this case. We may notice that the system power consumption is in proportion to the system throughput. The schemes in [9], [10] use the second method to decrease the power consumption. Let s denotes the whole blocks number in all chips, and t denotes the parallel working blocks number. The power consumption should be Ps/t at any time in their schemes. The power consumption in our scheme is a little higher compared with schemes of [9], [10], but we can get much more advantages: 1) the system cost is saved as the hardware does not need the partition-disable function support and has more simple indexing logic; 2) the most important advantage is that the reconstruction times is sharply decreased. V. PERFORMANCE EVALUATIONS A. Experiment Setup We conduct a series of experiments and simulations on matlab to measure the system s performance and adaptability to different types of traffic load distributions. we conduct experiment by a program to simulate the process of our proposed forwarding engine. we use real traffic traces and route table from a core router in CERNET to evaluate the performance. The packet traces used are collected on the core router of the CERNET from 2007/07/07 to 2007/07/17, and the router s route table size is about 200k. According to our analysis in Section III-A we can partition the route table into 11 parts using pre-order splitting method and put them into TCAM chips as illustrated in Fig. 4. Assume the input queue length is 10 and the speed is stationary for a traffic trace,

8 to evaluate the speedup factor we can approximately divide the whole serving process into discrete subprocesses. The first subprocess handles the first 120 packets of given trace, the second subprocess handles the next 120 packets which are composed of the remaining packets of the first subprocess and the successive packets of the trace, etc. We evaluate the speedup factor of the trace by calculating the average value of all subprocesses speedup factors. B. Results We use the first day s traces to illustrate. Table II presents 15 ten-minute traces collected in 2007/07/07 at different time. Fig. 7 a)-c) show the traffic distribution on the parallel TCAM chips of the first three traces respectively. The ideal condition is that each TCAM chip is responsible for handling 10 packets for the 120 packets, which the lookup speedup factor can achieve 12 compared with non-parallel schemes. For real conditions, from Fig. 7 a)-c) we can see that the remaining packet number of trace1 is 10, trace2 is 7 and trace3 is 8 after the same time passed in the ideal condition. In addition, there are 10 (1 98%) = 0.2 cache missing packets on the cache TCAM chip for each trace. The total remaining numbers are 10.2, 7.2, 8.2 for each trace, so the speedup factors are: = 10.98, = 11.28, = for trace1-3. We continue this process until all packets of trace1-3 are finished. Fig. 7 d) presents the speedup factors distribution of the first 200 subprocesses of trace1-3. Fig. 8 a)-d) present the speedup factors distribution for the remaining traces. We calculate the average value for each trace. The average speedup factor of each trace is presented in Table II. The result shows that all traces speedup factor is greater than 11. Another interesting phenomenon we can detect from the result is that the working hour traces speedup factor is greater than that of the spare time. The possible reason may be that more flows superpose together in working hours which make DIPs distribution of the traffic more dispersive compared with spare time. We have tested traces collected in 10 days, and the test results show that the system can achieve an average lookup speedup factor greater than 11. The lookup performance of our scheme is about the same as the scheme in [10] and superior to the scheme in [9]. Therefore, our scheme is able to achieve a well dynamic load-balance on the occurrence of internet bursty traffic. VI. CONCLUSION Increasing the lookup throughput and improving the update efficiency while reducing power consumption are the three primary issues in this paper. Our main contributions are as follows: 1) Proposed an efficient buddy update method for the well organized entries on TCAM chips; 2) Put forward a crossed address range division and shared caching scheme for dynamic traffic balance; 3) Observed the distinctive feature of DIPs distribution and used the feature for our cache mechanism. The lookup performance of our scheme can be further improved if the cache TCAM chip can be faster than others. The power consumption of our scheme was decreased by reducing the number of the triggered TCAM access during a lookup operation. The power consumption may be a little more than the schemes in [9], [10], but the update efficiency was greatly improved and the cost was also saved compared with schemes in [9], [10]. Our scheme is flexible and scalable, and can be easily extended for IPv6 address lookup. Our test results show that under real traffic traces, the proposed scheme can achieve an average lookup speedup factor greater than 11 with 12 TCAM chips, on the cost of 10% more memory space. ACKNOWLEDGMENT The authors would like to express their appreciations to Bin Liu [9], [10], Dong Lin [10] and Hui Wang for their discussions and suggestions. This work is supported by the National Basic Research Program of China under Grant No. 2009CB320505, the National Science and Technology Supporting Plan of China under Grant No. 2008BAH37B05, and the National High-Tech Research and Development Plan of China under Grant No. 2008AA01A303 and 2009AA01Z251. REFERENCES [1] M. Ruiz-Sanchez, E. Biersack, and W. Dabbous, Survey and taxonomy of ip address lookup algorithms, IEEE Network, vol. 15, pp. 8 23, Mar.-Apr [2] N. Huang, M. Zhao, and J. Pan, A fast ip routing lookup scheme for gigabits switching routers, in IEEE INFOCOM, [3] D. Yu and B. S. B. Wei, Forwarding engine for fast routing lookups and updates, in IEEE GLOBECOM, [4] CYRESS. [Online]. Available: [5] G. Huston. Route table analysis reports. [Online]. Available: [6] E. Ng and G. Lee, Eliminating sorting in ip lookup devices using partitioned table, in The 16th IEEE International Conf. on Application Specific Systems, Architecture and Processors, [7] K. Jinsoo and K. Junghwan, An efficient ip lookup architecture with fast update using single-match tcams, in WWIC, [8] R. Panigrahy and S. Sharma, Reducing tcam power consumption and increasing throughput, in IEEE HotI 10, [9] K. Zheng, C. Hu, H. Liu, and B. Liu, An ultra-high throughput and power efficient tcam-based ip lookup engine, in IEEE INFOCOM, [10] D. Lin, Y. Zhang, C. Hu, B. Liu, X. Zhang, and D. Pao, Route table partitioning and load balancing for parallel searching with tcams, in The 21st IEEE International Parallel and Distributed Processing Symposium, [11] F. Zane, G. Narlikar, and A. Basu, Coolcams: Power-efficient tcams for forwarding engines, in IEEE INFOCOM, vol. 1, Mar./Apr. 2003, pp [12] W. Shyu, C. Wu, and T. Hou, Efficiency analyses on routing cache replacement algorithms, in IEEE International Conference on Communications, [13] T. Chiueh and P. Pradhan, High-performance ip routing table lookup using cpu caching, in IEEE INFOCOM, [14] B. Talbot, T. Sherwood, and B. Lin, Ip caching for terabit speed routers, in IEEE GLOBECOM, [15] J. Mohammad, Akhbarizadeh, and M. Nourani, Efficient prefix cache for network processors, in the 12th Annual IEEE Symposium on High Performance Interconnects, [16] V. Paxson and S. Floyd, Wide-area traffic: The failure of poisson modeling, IEEE/ACM Transactions on Networking, vol. 1, pp , 1995.

Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs I

Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs I Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs I Dong Lin 1, Yue Zhang 1, Chengchen Hu 1, Bin Liu 1, Xin Zhang 2, and Derek Pao 3 1 Dept. of Computer Science and Technology

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

Dynamic Routing Tables Using Simple Balanced. Search Trees

Dynamic Routing Tables Using Simple Balanced. Search Trees Dynamic Routing Tables Using Simple Balanced Search Trees Y.-K. Chang and Y.-C. Lin Department of Computer Science and Information Engineering National Cheng Kung University Tainan, Taiwan R.O.C. ykchang@mail.ncku.edu.tw

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING James J. Rooney 1 José G. Delgado-Frias 2 Douglas H. Summerville 1 1 Dept. of Electrical and Computer Engineering. 2 School of Electrical Engr. and Computer

More information

Routing Lookup Algorithm for IPv6 using Hash Tables

Routing Lookup Algorithm for IPv6 using Hash Tables Routing Lookup Algorithm for IPv6 using Hash Tables Peter Korppoey, John Smith, Department of Electronics Engineering, New Mexico State University-Main Campus Abstract: After analyzing of existing routing

More information

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including Router Architectures By the end of this lecture, you should be able to. Explain the different generations of router architectures Describe the route lookup process Explain the operation of PATRICIA algorithm

More information

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including Introduction Router Architectures Recent advances in routing architecture including specialized hardware switching fabrics efficient and faster lookup algorithms have created routers that are capable of

More information

Power Efficient IP Lookup with Supernode Caching

Power Efficient IP Lookup with Supernode Caching Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu * and Lide Duan Department of Electrical & Computer Engineering Louisiana State University Baton Rouge, LA 73 {lpeng, lduan1}@lsu.edu

More information

FPGA Implementation of Lookup Algorithms

FPGA Implementation of Lookup Algorithms 2011 IEEE 12th International Conference on High Performance Switching and Routing FPGA Implementation of Lookup Algorithms Zoran Chicha, Luka Milinkovic, Aleksandra Smiljanic Department of Telecommunications

More information

Packet Classification Using Dynamically Generated Decision Trees

Packet Classification Using Dynamically Generated Decision Trees 1 Packet Classification Using Dynamically Generated Decision Trees Yu-Chieh Cheng, Pi-Chung Wang Abstract Binary Search on Levels (BSOL) is a decision-tree algorithm for packet classification with superior

More information

Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup

Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup Weirong Jiang, Qingbo Wang and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern

More information

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Shijie Zhou, Yun R. Qu, and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering, University of Southern

More information

PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS

PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS THE UNIVERSITY OF NAIROBI DEPARTMENT OF ELECTRICAL AND INFORMATION ENGINEERING FINAL YEAR PROJECT. PROJECT NO. 60 PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS OMARI JAPHETH N. F17/2157/2004 SUPERVISOR:

More information

Efficient Prefix Cache for Network Processors

Efficient Prefix Cache for Network Processors Efficient Prefix Cache for Network Processors Mohammad J. Akhbarizadeh and Mehrdad Nourani Center for Integrated Circuits & Systems The University of Texas at Dallas Richardson, TX 7583 feazadeh,nouranig@utdallas.edu

More information

An Improved Cache Mechanism for a Cache-based Network Processor

An Improved Cache Mechanism for a Cache-based Network Processor An Improved Cache Mechanism for a Cache-based Network Processor Hayato Yamaki 1, Hiroaki Nishi 1 1 Graduate school of Science and Technology, Keio University, Yokohama, Japan Abstract Internet traffic

More information

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers ABSTRACT Jing Fu KTH, Royal Institute of Technology Stockholm, Sweden jing@kth.se Virtual routers are a promising

More information

Tree, Segment Table, and Route Bucket: A Multistage Algorithm for IPv6 Routing Table Lookup

Tree, Segment Table, and Route Bucket: A Multistage Algorithm for IPv6 Routing Table Lookup Tree, Segment Table, and Route Bucket: A Multistage Algorithm for IPv6 Routing Table Lookup Zhenqiang LI Dongqu ZHENG Yan MA School of Computer Science and Technology Network Information Center Beijing

More information

Implementation of Boundary Cutting Algorithm Using Packet Classification

Implementation of Boundary Cutting Algorithm Using Packet Classification Implementation of Boundary Cutting Algorithm Using Packet Classification Dasari Mallesh M.Tech Student Department of CSE Vignana Bharathi Institute of Technology, Hyderabad. ABSTRACT: Decision-tree-based

More information

Updating Designed for Fast IP Lookup

Updating Designed for Fast IP Lookup Updating Designed for Fast IP Lookup Natasa Maksic, Zoran Chicha and Aleksandra Smiljanić School of Electrical Engineering Belgrade University, Serbia Email:maksicn@etf.rs, cicasyl@etf.rs, aleksandra@etf.rs

More information

NDNBench: A Benchmark for Named Data Networking Lookup

NDNBench: A Benchmark for Named Data Networking Lookup NDNBench: A Benchmark for Named Data Networking Lookup Ting Zhang, Yi Wang, Tong Yang, Jianyuan Lu and Bin Liu* Tsinghua National Laboratory for Information Science and Technology Department of Computer

More information

Frugal IP Lookup Based on a Parallel Search

Frugal IP Lookup Based on a Parallel Search Frugal IP Lookup Based on a Parallel Search Zoran Čiča and Aleksandra Smiljanić School of Electrical Engineering, Belgrade University, Serbia Email: cicasyl@etf.rs, aleksandra@etf.rs Abstract Lookup function

More information

An Efficient TCAM Update Scheme for Packet Classification

An Efficient TCAM Update Scheme for Packet Classification 03 IEEE 7th International Conference on Advanced Information Networking and Applications An Efficient TCAM Update Scheme for Packet Classification Yeim-Kuan Chang Department of Computer Science and Information

More information

IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1

IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1 2005 IEEE International Symposium on Signal Processing and Information Technology IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1 G.T. Kousiouris and D.N. Serpanos Dept. of Electrical and

More information

Dynamic Pipelining: Making IP- Lookup Truly Scalable

Dynamic Pipelining: Making IP- Lookup Truly Scalable Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University SIGCOMM 05 Rung-Bo-Su 10/26/05 1 0.Abstract IP-lookup

More information

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011 // Bottlenecks Memory, memory, 88 - Switch and Router Design Dr. David Hay Ross 8b dhay@cs.huji.ac.il Source: Nick Mckeown, Isaac Keslassy Packet Processing Examples Address Lookup (IP/Ethernet) Where

More information

Last Lecture: Network Layer

Last Lecture: Network Layer Last Lecture: Network Layer 1. Design goals and issues 2. Basic Routing Algorithms & Protocols 3. Addressing, Fragmentation and reassembly 4. Internet Routing Protocols and Inter-networking 5. Router design

More information

Selective Boundary Cutting For Packet Classification SOUMYA. K 1, CHANDRA SEKHAR. M 2

Selective Boundary Cutting For Packet Classification SOUMYA. K 1, CHANDRA SEKHAR. M 2 ISSN 2319-8885 Vol.04,Issue.34, August-2015, Pages:6786-6790 www.ijsetr.com SOUMYA. K 1, CHANDRA SEKHAR. M 2 1 Navodaya Institute of Technology, Raichur, Karnataka, India, E-mail: Keerthisree1112@gmail.com.

More information

Efficient hardware architecture for fast IP address lookup. Citation Proceedings - IEEE INFOCOM, 2002, v. 2, p

Efficient hardware architecture for fast IP address lookup. Citation Proceedings - IEEE INFOCOM, 2002, v. 2, p Title Efficient hardware architecture for fast IP address lookup Author(s) Pao, D; Liu, C; Wu, A; Yeung, L; Chan, KS Citation Proceedings - IEEE INFOCOM, 2002, v 2, p 555-56 Issued Date 2002 URL http://hdlhandlenet/0722/48458

More information

Scalable Lookup Algorithms for IPv6

Scalable Lookup Algorithms for IPv6 Scalable Lookup Algorithms for IPv6 Aleksandra Smiljanić a*, Zoran Čiča a a School of Electrical Engineering, Belgrade University, Bul. Kralja Aleksandra 73, 11120 Belgrade, Serbia ABSTRACT IPv4 addresses

More information

A Hybrid Load Balance Mechanism for Distributed Home Agents in Mobile IPv6

A Hybrid Load Balance Mechanism for Distributed Home Agents in Mobile IPv6 A Hybrid Load Balance Mechanism for Distributed Home Agents in Mobile IPv6 1 Hui Deng 2Xiaolong Huang 3Kai Zhang 3 Zhisheng Niu 1Masahiro Ojima 1R&D Center Hitachi (China) Ltd. Beijing 100004, China 2Dept.

More information

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley,

More information

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Ali Al-Dhaher, Tricha Anjali Department of Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois

More information

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING James J Rooney 1 José G Delgado-Frias 2 Douglas H Summerville 1 1 Department of Electrical and Computer Engineering 2 School of Electrical Engineering and Computer

More information

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification Yaxuan Qi, Jun Li Research Institute of Information Technology (RIIT) Tsinghua University, Beijing, China, 100084

More information

University of Alberta. Sunil Ravinder. Master of Science. Department of Computing Science

University of Alberta. Sunil Ravinder. Master of Science. Department of Computing Science University of Alberta CACHE ARCHITECTURES TO IMPROVE IP LOOKUPS by Sunil Ravinder A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree

More information

Performance Improvement of Hardware-Based Packet Classification Algorithm

Performance Improvement of Hardware-Based Packet Classification Algorithm Performance Improvement of Hardware-Based Packet Classification Algorithm Yaw-Chung Chen 1, Pi-Chung Wang 2, Chun-Liang Lee 2, and Chia-Tai Chan 2 1 Department of Computer Science and Information Engineering,

More information

Efficient Queuing Architecture for a Buffered Crossbar Switch

Efficient Queuing Architecture for a Buffered Crossbar Switch Proceedings of the 11th WSEAS International Conference on COMMUNICATIONS, Agios Nikolaos, Crete Island, Greece, July 26-28, 2007 95 Efficient Queuing Architecture for a Buffered Crossbar Switch MICHAEL

More information

Load Balancing with Minimal Flow Remapping for Network Processors

Load Balancing with Minimal Flow Remapping for Network Processors Load Balancing with Minimal Flow Remapping for Network Processors Imad Khazali and Anjali Agarwal Electrical and Computer Engineering Department Concordia University Montreal, Quebec, Canada Email: {ikhazali,

More information

Lecture 11: Packet forwarding

Lecture 11: Packet forwarding Lecture 11: Packet forwarding Anirudh Sivaraman 2017/10/23 This week we ll talk about the data plane. Recall that the routing layer broadly consists of two parts: (1) the control plane that computes routes

More information

Multi-Field Range Encoding for Packet Classification in TCAM

Multi-Field Range Encoding for Packet Classification in TCAM This paper was presented as part of the Mini-Conference at IEEE INFOCOM 2011 Multi-Field Range Encoding for Packet Classification in TCAM Yeim-Kuan Chang, Chun-I Lee and Cheng-Chien Su Department of Computer

More information

SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA. Hoang Le, Weirong Jiang, Viktor K. Prasanna

SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA. Hoang Le, Weirong Jiang, Viktor K. Prasanna SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA Hoang Le, Weirong Jiang, Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los

More information

Efficient TCAM Encoding Schemes for Packet Classification using Gray Code

Efficient TCAM Encoding Schemes for Packet Classification using Gray Code Efficient TCAM Encoding Schemes for Packet Classification using Gray Code Yeim-Kuan Chang and Cheng-Chien Su Department of Computer Science and Information Engineering National Cheng Kung University Tainan,

More information

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < : A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks Visvasuresh Victor Govindaswamy,

More information

The GLIMPS Terabit Packet Switching Engine

The GLIMPS Terabit Packet Switching Engine February 2002 The GLIMPS Terabit Packet Switching Engine I. Elhanany, O. Beeri Terabit Packet Switching Challenges The ever-growing demand for additional bandwidth reflects on the increasing capacity requirements

More information

HIGH PERFORMANCE CACHE ARCHITECTURES FOR IP ROUTING: REPLACEMENT, COMPACTION AND SAMPLING SCHEMES RUIRUI GUO

HIGH PERFORMANCE CACHE ARCHITECTURES FOR IP ROUTING: REPLACEMENT, COMPACTION AND SAMPLING SCHEMES RUIRUI GUO HIGH PERFORMANCE CACHE ARCHITECTURES FOR IP ROUTING: REPLACEMENT, COMPACTION AND SAMPLING SCHEMES By RUIRUI GUO A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR

More information

Iran (IUST), Tehran, Iran Gorgan, Iran

Iran (IUST), Tehran, Iran Gorgan, Iran Hamidreza Mahini 1, Reza Berangi 2 and Alireza Mahini 3 1 Department of ELearning, Iran University of Science and Technology (IUST), Tehran, Iran h_mahini@vu.iust.ac.ir 2 Department of Computer Engineering,

More information

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture Generic Architecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

More information

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers Tania Mishra and Sartaj Sahni Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL 326 {tmishra,

More information

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Towards Effective Packet Classification J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Outline Algorithm Study Understanding Packet Classification Worst-case Complexity

More information

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network Improving the Data Scheduling Efficiency of the IEEE 802.16(d) Mesh Network Shie-Yuan Wang Email: shieyuan@csie.nctu.edu.tw Chih-Che Lin Email: jclin@csie.nctu.edu.tw Ku-Han Fang Email: khfang@csie.nctu.edu.tw

More information

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT Hardware Assisted Recursive Packet Classification Module for IPv6 etworks Shivvasangari Subramani [shivva1@umbc.edu] Department of Computer Science and Electrical Engineering University of Maryland Baltimore

More information

Variable Step Fluid Simulation for Communication Network

Variable Step Fluid Simulation for Communication Network Variable Step Fluid Simulation for Communication Network Hongjoong Kim 1 and Junsoo Lee 2 1 Korea University, Seoul, Korea, hongjoong@korea.ac.kr 2 Sookmyung Women s University, Seoul, Korea, jslee@sookmyung.ac.kr

More information

Router Architectures

Router Architectures Router Architectures Venkat Padmanabhan Microsoft Research 13 April 2001 Venkat Padmanabhan 1 Outline Router architecture overview 50 Gbps multi-gigabit router (Partridge et al.) Technology trends Venkat

More information

Improvement of Buffer Scheme for Delay Tolerant Networks

Improvement of Buffer Scheme for Delay Tolerant Networks Improvement of Buffer Scheme for Delay Tolerant Networks Jian Shen 1,2, Jin Wang 1,2, Li Ma 1,2, Ilyong Chung 3 1 Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science

More information

AN EFFICIENT HYBRID ALGORITHM FOR MULTIDIMENSIONAL PACKET CLASSIFICATION

AN EFFICIENT HYBRID ALGORITHM FOR MULTIDIMENSIONAL PACKET CLASSIFICATION AN EFFICIENT HYBRID ALGORITHM FOR MULTIDIMENSIONAL PACKET CLASSIFICATION Yaxuan Qi 1 and Jun Li 1,2 1 Research Institute of Information Technology (RIIT), Tsinghua University, Beijing, China, 100084 2

More information

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification. Fang Yu, T.V. Lakshman, Martin Austin Motoyama, Randy H.

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification. Fang Yu, T.V. Lakshman, Martin Austin Motoyama, Randy H. SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu, T.V. Lakshman, Martin Austin Motoyama, Randy H. Katz Presented by: Discussion led by: Sailesh Kumar Packet Classification

More information

An Approach to Efficient and Reliable design in Hierarchical Mobile IPv6

An Approach to Efficient and Reliable design in Hierarchical Mobile IPv6 An Approach to Efficient and Reliable design in Hierarchical Mobile IPv6 Taewan You 1, Seungyun Lee 1, Sangheon Pack 2, and Yanghee Choi 2 1 Protocol Engineering Center, ETRI, 161 Gajoung-dong, Yusong-gu,

More information

15-744: Computer Networking. Routers

15-744: Computer Networking. Routers 15-744: Computer Networking outers Forwarding and outers Forwarding IP lookup High-speed router architecture eadings [McK97] A Fast Switched Backplane for a Gigabit Switched outer Optional [D+97] Small

More information

V6Gene: A Scalable IPv6 Prefix Generator for Route Lookup Algorithm

V6Gene: A Scalable IPv6 Prefix Generator for Route Lookup Algorithm Scalable IPv6 Prefix Generator for Benchmark 1 V6Gene: A Scalable IPv6 Prefix Generator for Route Lookup Algorithm Benchmark i Abstract-- Most conventional IPv4-based route lookup algorithms are no more

More information

A Multi-stage IPv6 Routing Lookup Algorithm Based on Hash Table and Multibit Trie Xing-ya HE * and Yun YANG

A Multi-stage IPv6 Routing Lookup Algorithm Based on Hash Table and Multibit Trie Xing-ya HE * and Yun YANG 2017 International Conference on Computer, Electronics and Communication Engineering (CECE 2017) ISBN: 978-1-60595-476-9 A Multi-stage IPv6 Routing Lookup Algorithm Based on Hash Table and Multibit Trie

More information

Memory Management. Memory Management. G53OPS: Operating Systems. Memory Management Monoprogramming 11/27/2008. Graham Kendall.

Memory Management. Memory Management. G53OPS: Operating Systems. Memory Management Monoprogramming 11/27/2008. Graham Kendall. Memory Management Memory Management Introduction Graham Kendall Memory Management consists of many tasks, including Being aware of what parts of the memory are in use and which parts are not Allocating

More information

CS 268: Route Lookup and Packet Classification

CS 268: Route Lookup and Packet Classification Overview CS 268: Route Lookup and Packet Classification Packet Lookup Packet Classification Ion Stoica March 3, 24 istoica@cs.berkeley.edu 2 Lookup Problem Identify the output interface to forward an incoming

More information

Globecom. IEEE Conference and Exhibition. Copyright IEEE.

Globecom. IEEE Conference and Exhibition. Copyright IEEE. Title FTMS: an efficient multicast scheduling algorithm for feedbackbased two-stage switch Author(s) He, C; Hu, B; Yeung, LK Citation The 2012 IEEE Global Communications Conference (GLOBECOM 2012), Anaheim,

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

ADAPTIVE LINK WEIGHT ASSIGNMENT AND RANDOM EARLY BLOCKING ALGORITHM FOR DYNAMIC ROUTING IN WDM NETWORKS

ADAPTIVE LINK WEIGHT ASSIGNMENT AND RANDOM EARLY BLOCKING ALGORITHM FOR DYNAMIC ROUTING IN WDM NETWORKS ADAPTIVE LINK WEIGHT ASSIGNMENT AND RANDOM EARLY BLOCKING ALGORITHM FOR DYNAMIC ROUTING IN WDM NETWORKS Ching-Lung Chang, Yan-Ying, Lee, and Steven S. W. Lee* Department of Electronic Engineering, National

More information

H3C S9500 QoS Technology White Paper

H3C S9500 QoS Technology White Paper H3C Key words: QoS, quality of service Abstract: The Ethernet technology is widely applied currently. At present, Ethernet is the leading technology in various independent local area networks (LANs), and

More information

Three-section Random Early Detection (TRED)

Three-section Random Early Detection (TRED) Three-section Random Early Detection (TRED) Keerthi M PG Student Federal Institute of Science and Technology, Angamaly, Kerala Abstract There are many Active Queue Management (AQM) mechanisms for Congestion

More information

A Balancing Algorithm in Wireless Sensor Network Based on the Assistance of Approaching Nodes

A Balancing Algorithm in Wireless Sensor Network Based on the Assistance of Approaching Nodes Sensors & Transducers 2013 by IFSA http://www.sensorsportal.com A Balancing Algorithm in Wireless Sensor Network Based on the Assistance of Approaching Nodes 1,* Chengpei Tang, 1 Jiao Yin, 1 Yu Dong 1

More information

Growth of the Internet Network capacity: A scarce resource Good Service

Growth of the Internet Network capacity: A scarce resource Good Service IP Route Lookups 1 Introduction Growth of the Internet Network capacity: A scarce resource Good Service Large-bandwidth links -> Readily handled (Fiber optic links) High router data throughput -> Readily

More information

On TCP friendliness of VOIP traffic

On TCP friendliness of VOIP traffic On TCP friendliness of VOIP traffic By Rashmi Parthasarathy WSU ID # 10975537 A report submitted in partial fulfillment of the requirements of CptS 555 Electrical Engineering and Computer Science Department

More information

Clustering-Based Distributed Precomputation for Quality-of-Service Routing*

Clustering-Based Distributed Precomputation for Quality-of-Service Routing* Clustering-Based Distributed Precomputation for Quality-of-Service Routing* Yong Cui and Jianping Wu Department of Computer Science, Tsinghua University, Beijing, P.R.China, 100084 cy@csnet1.cs.tsinghua.edu.cn,

More information

1 Introduction to Parallel Computing

1 Introduction to Parallel Computing 1 Introduction to Parallel Computing 1.1 Goals of Parallel vs Distributed Computing Distributed computing, commonly represented by distributed services such as the world wide web, is one form of computational

More information

An Architecture for IPv6 Lookup Using Parallel Index Generation Units

An Architecture for IPv6 Lookup Using Parallel Index Generation Units An Architecture for IPv6 Lookup Using Parallel Index Generation Units Hiroki Nakahara, Tsutomu Sasao, and Munehiro Matsuura Kagoshima University, Japan Kyushu Institute of Technology, Japan Abstract. This

More information

Frequency-based NCQ-aware disk cache algorithm

Frequency-based NCQ-aware disk cache algorithm LETTER IEICE Electronics Express, Vol.11, No.11, 1 7 Frequency-based NCQ-aware disk cache algorithm Young-Jin Kim a) Ajou University, 206, World cup-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 443-749, Republic

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Scalable High Throughput and Power Efficient IP-Lookup on FPGA

Scalable High Throughput and Power Efficient IP-Lookup on FPGA Scalable High Throughput and Power Efficient IP-Lookup on FPGA Hoang Le and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, USA {hoangle,

More information

A FORWARDING CACHE VLAN PROTOCOL (FCVP) IN WIRELESS NETWORKS

A FORWARDING CACHE VLAN PROTOCOL (FCVP) IN WIRELESS NETWORKS A FORWARDING CACHE VLAN PROTOCOL (FCVP) IN WIRELESS NETWORKS Tzu-Chiang Chiang,, Ching-Hung Yeh, Yueh-Min Huang and Fenglien Lee Department of Engineering Science, National Cheng-Kung University, Taiwan,

More information

Improving Range Query Performance on Historic Web Page Data

Improving Range Query Performance on Historic Web Page Data Improving Range Query Performance on Historic Web Page Data Geng LI Lab of Computer Networks and Distributed Systems, Peking University Beijing, China ligeng@net.pku.edu.cn Bo Peng Lab of Computer Networks

More information

An Enhanced Dynamic Packet Buffer Management

An Enhanced Dynamic Packet Buffer Management An Enhanced Dynamic Packet Buffer Management Vinod Rajan Cypress Southeast Design Center Cypress Semiconductor Cooperation vur@cypress.com Abstract A packet buffer for a protocol processor is a large shared

More information

LOAD SCHEDULING FOR FLOW-BASED PACKET PROCESSING ON MULTI-CORE NETWORK PROCESSORS

LOAD SCHEDULING FOR FLOW-BASED PACKET PROCESSING ON MULTI-CORE NETWORK PROCESSORS LOAD SCHEDULING FOR FLOW-BASED PACKET PROCESSING ON MULTI-CORE NETWORK PROCESSORS Fei He,2, Yaxuan Qi,2, Yibo Xue 2,3 and Jun Li 2,3 Department of Automation, Tsinghua University, Beijing, China 2 Research

More information

On Achieving Fairness and Efficiency in High-Speed Shared Medium Access

On Achieving Fairness and Efficiency in High-Speed Shared Medium Access IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 11, NO. 1, FEBRUARY 2003 111 On Achieving Fairness and Efficiency in High-Speed Shared Medium Access R. Srinivasan, Member, IEEE, and Arun K. Somani, Fellow, IEEE

More information

A priority based dynamic bandwidth scheduling in SDN networks 1

A priority based dynamic bandwidth scheduling in SDN networks 1 Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems

More information

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table Recent Results in Best Matching Prex George Varghese October 16, 2001 Router Model InputLink i 100100 B2 Message Switch B3 OutputLink 6 100100 Processor(s) B1 Prefix Output Link 0* 1 100* 6 1* 2 Forwarding

More information

ANALYSIS OF A DYNAMIC LOAD BALANCING IN MULTIPROCESSOR SYSTEM

ANALYSIS OF A DYNAMIC LOAD BALANCING IN MULTIPROCESSOR SYSTEM International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol. 3, Issue 1, Mar 2013, 143-148 TJPRC Pvt. Ltd. ANALYSIS OF A DYNAMIC LOAD BALANCING

More information

Multi-path Routing for Mesh/Torus-Based NoCs

Multi-path Routing for Mesh/Torus-Based NoCs Multi-path Routing for Mesh/Torus-Based NoCs Yaoting Jiao 1, Yulu Yang 1, Ming He 1, Mei Yang 2, and Yingtao Jiang 2 1 College of Information Technology and Science, Nankai University, China 2 Department

More information

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table H. Michael Ji, and Ranga Srinivasan Tensilica, Inc. 3255-6 Scott Blvd Santa Clara, CA 95054 Abstract--In this paper we examine

More information

A Proposal for a High Speed Multicast Switch Fabric Design

A Proposal for a High Speed Multicast Switch Fabric Design A Proposal for a High Speed Multicast Switch Fabric Design Cheng Li, R.Venkatesan and H.M.Heys Faculty of Engineering and Applied Science Memorial University of Newfoundland St. John s, NF, Canada AB X

More information

Chapter 24 Congestion Control and Quality of Service 24.1

Chapter 24 Congestion Control and Quality of Service 24.1 Chapter 24 Congestion Control and Quality of Service 24.1 Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 24-1 DATA TRAFFIC The main focus of congestion control

More information

Incorporation of TCP Proxy Service for improving TCP throughput

Incorporation of TCP Proxy Service for improving TCP throughput Vol. 3, 98 Incorporation of Proxy Service for improving throughput G.P. Bhole and S.A. Patekar Abstract-- slow start algorithm works well for short distance between sending and receiving host. However

More information

Application of TRIE data structure and corresponding associative algorithms for process optimization in GRID environment

Application of TRIE data structure and corresponding associative algorithms for process optimization in GRID environment Application of TRIE data structure and corresponding associative algorithms for process optimization in GRID environment V. V. Kashansky a, I. L. Kaftannikov b South Ural State University (National Research

More information

TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE

TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE Fei He 1, 2, Fan Xiang 1, Yibo Xue 2,3 and Jun Li 2,3 1 Department of Automation, Tsinghua University, Beijing, China

More information

Router s Queue Management

Router s Queue Management Router s Queue Management Manages sharing of (i) buffer space (ii) bandwidth Q1: Which packet to drop when queue is full? Q2: Which packet to send next? FIFO + Drop Tail Keep a single queue Answer to Q1:

More information

Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification

Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification V.S.Pallavi 1, Dr.D.Rukmani Devi 2 PG Scholar 1, Department of ECE, RMK Engineering College, Chennai, Tamil Nadu, India

More information

DESIGN AND IMPLEMENTATION OF OPTIMIZED PACKET CLASSIFIER

DESIGN AND IMPLEMENTATION OF OPTIMIZED PACKET CLASSIFIER International Journal of Computer Engineering and Applications, Volume VI, Issue II, May 14 www.ijcea.com ISSN 2321 3469 DESIGN AND IMPLEMENTATION OF OPTIMIZED PACKET CLASSIFIER Kiran K C 1, Sunil T D

More information

QUT Digital Repository:

QUT Digital Repository: QUT Digital Repository: http://eprints.qut.edu.au/ Gui, Li and Tian, Yu-Chu and Fidge, Colin J. (2007) Performance Evaluation of IEEE 802.11 Wireless Networks for Real-time Networked Control Systems. In

More information

Switched Network Latency Problems Solved

Switched Network Latency Problems Solved 1 Switched Network Latency Problems Solved A Lightfleet Whitepaper by the Lightfleet Technical Staff Overview The biggest limiter to network performance is the control plane the array of processors and

More information

Exploiting Graphics Processors for High-performance IP Lookup in Software Routers

Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang School of Computer Science Fudan University Shanghai, China Email:{jzhao,06300720198,xinw}@fudan.edu.cn

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors. By: Anvesh Polepalli Raj Muchhala

Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors. By: Anvesh Polepalli Raj Muchhala Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors By: Anvesh Polepalli Raj Muchhala Introduction Integrating CPU and GPU into a single chip for performance

More information