TUPLE PRUNING USING BLOOM FILTERS FOR PACKET CLASSIFICATION

Similar documents
Computer Networks 56 (2012) Contents lists available at SciVerse ScienceDirect. Computer Networks

Implementation of Boundary Cutting Algorithm Using Packet Classification

Priority Area-based Quad-Tree Packet Classification Algorithm and Its Mathematical Framework

Packet Classification Using Dynamically Generated Decision Trees

Reducing False Positives of a Bloom Filter using Cross-Checking Bloom Filters

Selective Boundary Cutting For Packet Classification SOUMYA. K 1, CHANDRA SEKHAR. M 2

Real Time Packet Classification and Analysis based on Bloom Filter for Longest Prefix Matching

Fast Packet Classification Using Bloom filters

Grid of Segment Trees for Packet Classification

On Adding Bloom Filters to Longest Prefix Matching Algorithms

Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification. Fang Yu, T.V. Lakshman, Martin Austin Motoyama, Randy H.

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers

ClassBench: A Packet Classification Benchmark. By: Mehdi Sabzevari

Packet Classification using Rule Caching

A Multi Gigabit FPGA-based 5-tuple classification system

Problem Statement. Algorithm MinDPQ (contd.) Algorithm MinDPQ. Summary of Algorithm MinDPQ. Algorithm MinDPQ: Experimental Results.

Packet classification based on priority with shortest path algorithm

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1

Fast Packet Classification Algorithms

A Scalable Approach for Packet Classification Using Rule-Base Partition

Three Different Designs for Packet Classification

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

An Enhanced Bloom Filter for Longest Prefix Matching

PACKET classification is an enabling function for a variety

Data Structures for Packet Classification

High-Performance Packet Classification on GPU

An Efficient TCAM Update Scheme for Packet Classification

Frugal IP Lookup Based on a Parallel Search

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005

Scalable Packet Classification using Distributed Crossproducting of Field Labels

Dynamic Routing Tables Using Simple Balanced. Search Trees

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors

Memory-Efficient 5D Packet Classification At 40 Gbps

MULTI-MATCH PACKET CLASSIFICATION BASED ON DISTRIBUTED HASHTABLE

Multi-Field Range Encoding for Packet Classification in TCAM

DESIGN AND IMPLEMENTATION OF OPTIMIZED PACKET CLASSIFIER

Rule Caching for Packet Classification Support

Efficient TCAM Encoding Schemes for Packet Classification using Gray Code

Performance Improvement of Hardware-Based Packet Classification Algorithm

NETWORK SECURITY PROVISION BY MEANS OF ACCESS CONTROL LIST

Efficient Packet Classification using Splay Tree Models

Parallel-Search Trie-based Scheme for Fast IP Lookup

ECE697AA Lecture 21. Packet Classification

Fast and Scalable IP Address Lookup with Time Complexity of Log m Log m (n)

High-Performance Packet Classification on GPU

FPX Architecture for a Dynamically Extensible Router

Bloom Filters. References:

Scalable Packet Classification using Distributed Crossproducting of Field Labels

Hash-Based String Matching Algorithm For Network Intrusion Prevention systems (NIPS)

FPGA Implementation of Lookup Algorithms

CS 268: Route Lookup and Packet Classification

Scalable Packet Classification on FPGA

AN EFFICIENT HYBRID ALGORITHM FOR MULTIDIMENSIONAL PACKET CLASSIFICATION

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table

An Efficient Parallel IP Lookup Technique for IPv6 Routers Using Multiple Hashing with Ternary marker storage

New Directions in Traffic Measurement and Accounting. Need for traffic measurement. Relation to stream databases. Internet backbone monitoring

Lecture 11: Packet forwarding

An Ultra High Throughput and Memory Efficient Pipeline Architecture for Multi-Match Packet Classification without TCAMs

IP Address Lookup and Packet Classification Algorithms

Packet Classification: From Theory to Practice

Design of a Near-Minimal Dynamic Perfect Hash Function on Embedded Device

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Routing Lookup Algorithm for IPv6 using Hash Tables

LONGEST prefix matching (LPM) techniques have received

A Robust Bloom Filter

Length Indexed Bloom Filter Based Forwarding In Content Centeric Networking

Recursive Flow Classification: An Algorithm for Packet Classification on Multiple Fields

One Memory Access Bloom Filters and Their Generalization

SINCE the ever increasing dependency on the Internet, there

Homework 1 Solutions:

Efficient Packet Classification on FPGAs also Targeting at Manageable Memory Consumption

A Configurable Packet Classification Architecture for Software- Defined Networking

Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

Last Lecture: Network Layer

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification

Packet Classification Using Standard Access Control List

Packet Inspection on Programmable Hardware

National Chiao Tung University, HsinChu, Taiwan

Bitmap Intersection Lookup (BIL) : A Packet Classification s Algorithm with Rules Updating

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Bloom Filter for Network Security Alex X. Liu & Haipeng Dai

Disjoint Superposition for Reduction of Conjoined Prefixes in IP Lookup for Actual IPv6 Forwarding Tables

Scalable Packet Classification for IPv6 by Using Limited TCAMs

DBS: A Bit-level Heuristic Packet Classification Algorithm for High Speed Network

Scalable Enterprise Networks with Inexpensive Switches

Rule Caching in Software- Define Networkings. Supervisor: Prof Weifa Liang Student: Zhenge Jia, u Date of presentation: 24 th May 2016

HIGH-PERFORMANCE PACKET PROCESSING ENGINES USING SET-ASSOCIATIVE MEMORY ARCHITECTURES

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

Tree-Based Minimization of TCAM Entries for Packet Classification

Chapter 12: Indexing and Hashing. Basic Concepts

THE rapid growth of Internet traffic requires routers to

Optimized Paging Cache Mappings for efficient location management Hyun Jun Lee, Myoung Chul Jung, and Jai Yong Lee

ITTC High-Performance Networking The University of Kansas EECS 881 Packet Switch I/O Processing

Configuring ACLs. ACL overview. ACL categories. ACL numbering and naming

Stochastic Pre-Classification for SDN Data Plane Matching

Network Verification Using Atomic Predicates (S. S. Lam) 3/28/2017 1

Multi-pattern Signature Matching for Hardware Network Intrusion Detection Systems

Transcription:

... TUPLE PRUNING USING BLOOM FILTERS FOR PACKET CLASSIFICATION... TUPLE PRUNING FOR PACKET CLASSIFICATION PROVIDES FAST SEARCH AND A LOW IMPLEMENTATION COMPLEXITY. THE TUPLE PRUNING ALGORITHM REDUCES THE SEARCH SPACE TO A SUBSET OF TUPLES DETERMINED BY INDIVIDUAL FIELD LOOKUPS THAT CAUSE OFF-CHIP MEMORY ACCESSES. THE AUTHORS PROPOSE A TUPLE-PRUNING ALGORITHM THAT REDUCES THE SEARCH SPACE THROUGH BLOOM FILTER QUERIES, WHICH DO NOT REQUIRE OFF-CHIP MEMORY ACCESSES. Hyesook Lim So Yeon Kim Ewha Womans University...Packet classification enables routers to support various value-added services, such as blocking traffic from insecure sites, giving preferential treatment to premium traffic, and routing based on traffic type and source. Routers classify arriving packets by comparing them to a set of predefined rules and finding the highest priority rule ortherulethatbestmatchesthepacket header fields (the best matching rule, or BMR). A rule consists of a set of fields made up of the IP source prefix, the IP destination prefix, the source port range, the destination port range, and the protocol type and flags. The difficulty of packet classification is in performing multiple field lookups at wire speed for every incoming packet given that the packet arrival rate can be several million packets per second. Various algorithms have attempted to find an effective solution. Most of these efforts use high bandwidth and a smallon-chipmemory,whilelocatingthe rule database in a slower and higher capacity off-chip memory. 2 Many packet classification algorithms, such as tuple space pruning, 3 cross-producting, coarse-grained tuple space, 5 and modified cross-producting, 6 perform a separate lookup on each field to narrow the search space. Hence, these algorithms cause off-chip memory accesses for both the individual field lookups and the final combined lookup. We propose to replace each field lookup with an on-chip Bloom filter and add a new tuple Bloom filter to further reduce unnecessary off-chip memory accesses. The proposed idea can be applied to any of the above algorithms. Here, we show how we apply it to tuple space pruning. (See the Related work in using tuple spaces for packet classification sidebar for a discussion of this approach.) Bloom filter theory Many current networking applications use Bloom filters. 5-7 A Bloom filter is a spaceefficient data structure consisting of a bit vector that is used to test whether an element is a member of a set. A Bloom filter supports two operations: programming and querying. Programming A Bloom filter, which represents a set P ¼ {x, x 2,..., x n }ofn elements, is described by an array of m bits, initially all set to. For each element in P, k different hash functions... 8 Published by the IEEE Computer Society 272-732//$26. c 2 IEEE

... Related work in using tuple spaces for packet classification A tuple space search algorithm converts a packet classification problem into plural exact match problems in tuple space. A tuple is defined as a vector of D lengths for D-dimensional (or field) packet classification. It is denoted as (i, i 2,..., i D ), where i d (for d ¼,..., D) is the length of the dth field. For example, a 2D packet classification using two IP prefix fields can have 33 33 different tuples given to 32-bit lengths. Hence, a packet arriving at a link can be queried with each tuple. In searching a tuple, we can use an exact-match method, such as hashing, since a tuple is composed of known lengths. Because it is too time consuming to query every tuple in classifying a packet, Srinivasan et al. introduced a practical solution, called tuple space pruning (TSP). Assume a two-field TSP algorithm, which, for a given input, performs individual field lookups. If it finds matches at p different lengths for a field and at q different lengths for the other field, the number of intersected tuples that should be queried is reduced to p by q. Additionally, because there is a high possibility of many inactive tuples that are not associated with any rule, the number becomes less than p by q. Another interesting pruning approach is to partition the tuple space into coarse-grained tuples with dissimilar rules to limit the number of subsets to be searched. 2 Yet another pruning approach uses precomputed markers to direct to the next tuple in the search space. 3 The TSP algorithm is easy to realize and provides fast search performance. However, it can be improved. First, it requires individual field lookups, which cause off-chip memory accesses. Second, the intersected list of tuples includes unnecessary tuples, since the tuple is generated only by combining the lengths without considering values. We describe these issues in more detail in the main article. References. V. Srinivasan, S. Suri, and G. Varghese, Packet ClassificationUsingTupleSpaceSearch, Proc. ACM SIGCOMM, ACM Press, 999, pp.35-6. 2. H. Song, J. Turner, and S. Dharmapurikar, Packet Classification Using Coarse-Grained Tuple Spaces, Proc. Architecture for Networking and Comm. Systems (ANCS), ACM Press, 26, pp. -5. 3. P. Wang et al., Scalable Packet Classification for Enabling Internet Differentiated Services, IEEE Trans. Multimedia, vol. 8, no. 6, 26, pp. 239-29. are computed in such a way that the resulting hash index (which is a pointer to a Bloom filter bit location) j is of the range j < m. All the Bloom filter bit locations corresponding to j are set to. Querying To test whether an element x is a member of P, we perform the following query. For input x, we generate k hash indices using the same hash functions we used to program the filter. We then check the bit locations in the Bloom filter corresponding to the hash indices. If at least one of the locations is, the element is absolutely not a member of P. If it were a member of the set, the bit location corresponding to the hash index would have been set to during programming. This result is called negative. If all the bit locations are set to, the result is called positive. However, even if all the bit locations are set to, they might not have been set only by the element under querying. Some other elements could have set them. This type of positive result is termed false positive. On the whole, a Bloom filter might produce false positives but never false negatives. The proposed algorithm Here, we show the 2D version of our proposed algorithm. We can extend this algorithm for an arbitrary number of fields. Figure shows the overall architectures of the tuple space pruning (TSP) algorithm and our proposed algorithm (using the source and destination prefix fields). Compared to TSP, the proposed architecture replaces each individual field lookup with a Bloom filter either a source Bloom filter (src-bloom filter) or a destination Bloom filter (dst-bloom filter). We also add a tuple Bloom filter (tuple-bloom filter). When programming a fixed number of elements into a Bloom filter, the Bloom filter s size and the number of hash functions affect performance, which is determined by the false positive rate. Analytic discussions on Bloom filter characteristics are available elsewhere. 7,8 Determining the optimum size of a Bloom filter and the optimum number of hash functions needed to minimize the false positives is beyond this article s scope. Because using between 2 and 5 hash functions for a small-size Bloom filter (about to 8 times the given number of elements)... MAY/JUNE 2 9

... FEATURE Input packet Source prefix lookup Off-chip table Off-chip table Matched lengths Intersected list Off-chip hash table (a) Destination prefix lookup Matched lengths Input packet Source Bloom filter Positive lengths On-chip Intersected list Tuble Bloom filter Positive tuples Off-chip hash table Destination Bloom filter Positive lengths (b) Figure. A comparison of the overall structures of the algorithms: the tuple space pruning (TSP) algorithm (a), and the proposed tuple pruning algorithm (b).... 5 IEEE MICRO minimizes the probability of a false positive, 7 for simplicity we describe the proposed algorithm using an example with fixed-size Bloom filters and two hash functions. Any arbitrary hash generator can serve as a hash function. The Bloom filter used in the proposed algorithm must accommodate prefixes of arbitrary lengths in a single Bloom filter. It requires a hash generator that produces hash indices for variable-length inputs. A cyclic redundancy check (CRC) generator suits our purpose. 9 CRC generators scramble bits of a given input and produce a fixedlength binary sequence known as a CRC code, regardless of the input length. We can easily obtain any number of hash indices (each of which is used as a pointer to a Bloom filter bit location or to a hash table entry) from the generated CRC code by selecting different combinations of bits. Let P t be a rule set composed of source and destination prefix pairs, and L t be the set of the distinct length pairs of P t.ifp t ¼ {R(*, *), R2(*, *), R3(*, *), R(*, *), R5(*, *), R6(*, *)}, then L t ¼ {(2, ), (, 2), (2, 3), (3, 3), (3, 2), (, )}. Let P and P 2 be the set of distinctsourceandthesetofdistinctdestination prefixes included in P t,andl and L 2 be the sets of distinct lengths. Then, P ¼ {*, *, *, *}, P 2 ¼ {*, *, *, *}, L ¼ {, 2, 3}, and L 2 ¼ {, 2, 3}. Figure 2 shows the detailed structure of the proposed algorithm. Figure 2a shows the CRC-8 generator. A random number ( in this example) initializes the registers of the 8-bit CRC generator. Figure 2b shows the proposed architecture programmed for P t. Figure 2c shows the hash table s resultant entry structure when we use the two prefix fields for tuple pruning. It is a simple hash table storing 5D rules without additional data structure. Its width is 22 bytes. A port range is represented by a start and an end in a single entry. We assume the rules mapped to the same hash table entry are stored by a linked list in decreasing order of priority (as shown in Figure 2b for rules R and R6), and the last field stores a pointer for the linked list. When programming the src-bloom filter for P, we enter a source prefix bit serially

2 3 Input 5 6 7 (a) On-chip src-bf tuple-bf 2 Input CRC-8 generator L = {, 2, 3} dst-bf CRC-8 generator 3 5 6 7 2 3 x CRC-8 generator 2 3 5 6 7 8 9 2 3 Off-chip hash table R5 2 3 R2 R 5 R3 6 7 R R6 L 2 = {, 2, 3} 5 6 7 5 L t = {(,), (,2), (2,), (2,3), (3,2), (3,3)} (b) bit 3 bits 6 bits 32 bits 6 bits 32 bits 6 bits 6 bits 6 bits 6 bits bit 8 bits 3 bits Entry valid Rule no. Src. prefix length Src. prefix Dst. Source prefix Dst. prefix port length start Source port end Dst. port start Dst. port end Protocol wild Protocol type Linked list (c) Figure 2. The proposed architecture programmed for an example rule set: the cyclic redundancy check (CRC)-8 generator (a), the programmed Bloom filters and the off-chip hash table (b), and the entry structure of the off-chip hash table (c). into the CRC generator. After entering the last bit of the prefix, we obtain a CRC code. We choose two hash indices from the generated CRC code and repeat this procedure for all elements in P, and remember L to use in the search procedure. Similarly, we program P 2 to the dst-bloom filter, and remember L 2. Table shows the CRC codes and hash indices we used for programming the proposed architecture. We programmed the 8-bit src-bloom filter and dst-bloom filter in Figure 2b using two hash indices chosen from the first and last three bits of the CRC code. We can use any combination of the two prefixes to program the tuple-bloom filter. Here we use the concatenated strings of the two prefixes. For the 6-bit tuple-bloom filter shown in Figure 2b, we chose the hash indices from the first and the last four bits of... MAY/JUNE 2 5

... FEATURE Table. The distinct prefix, cyclic redundancy check (CRC) code, and hash indices used for programming the proposed architecture. Distinct prefixes or concatenated prefixes CRC code Bloom filter indices (decimal) Hash table index (decimal) Source *, 7 N/A * 6, 7 *, 6 * 2, 2 Destination *, 7 N/A * 7, 3 * 2, 6 Tuple R (2, ) * 3, 5 7 R2 (, 2) * 5, 3 3 R3 (2, 3) *, 3 5 R (3, 3) * 2, R5 (3, 2) * 5, 9 R6 (, ) * 2, 5 7... 52 IEEE MICRO the CRC code, as Table shows. Assuming that the number of hash table entries is 2 dlog 2 N e,wheren is the number of rules, we need a 3-bit hash index to store a rule into the hash table. We programmed the hash table in Figure 2b using the last three bits of the CRC code as shown in Table. Figure 3 describes the proposed algorithm s search procedure. Assume a search where the input length is bits (it is 32 bits in IPv) and the input packet has a source and destination address pair (A, A 2 ) ¼ (, ). Table 2 shows the CRC code (generated when we enter the substrings of the input into the CRC generator), the corresponding Bloom filter indices, and the Bloom filter result. For L ¼ {, 2, 3}, referring to the src- Bloom filter in Figure 2b, the -bit string of the source has a negative result since the bit value of entry 3 (which is indexed by a Bloom filter index of * in Table 2) is zero. The 2- and 3-bit strings have positive results, so L (A ) ¼ {2, 3}. Similarly, for L 2 ¼ {,2,3},L 2 (A 2 ) ¼ {, 2, 3}. Note that a zero length cannot be programmed to the Bloom filter, so it is always positive. Therefore, the intersected list is L (A ) L 2 (A 2 ) ¼ {(2, ), (2, 2), (2, 3), (3, ), (3, 2), (3, 3)}. However, the (2, 2) and (3, ) tuples are inactive tuples that are not included in L t. Hence, L c ¼ {(2, ), (2, 3), (3, 2), (3, 3)}. Table 2 also shows the queried tuple, the concatenated string for the tuple, the corresponding CRC code, the Bloom filter indices, and the Bloom filter results. Since tuple (2, ), (3, 2), and (3, 3) turn out to be negative by referring to the tuple-bloom filter bits in Figure 2b corresponding to the Bloom filter indices in Table 2, the hash table is not accessed for these tuples. Hence the off-chip hash table needs to be accessed only once for the tuple (2, 3). Table 2 also shows the corresponding hash table index. The algorithm compares the input to the entry on index 5 of the hash table shown in Figure 2b, and find that it matches R3 in first two fields. The rule R3 is returned as the BMR if all the remaining fields are matched. If we apply the TSP algorithm to the same example, the source lookup against P produces a match of length 2, and the destination lookup against P 2 produces matches of lengths and 3. Each lookup will cause at least to 5 off-chip memory accesses. The intersected list of tuples is (2, ) and (2, 3), and the TSP algorithm will access the hash table for these two tuples.

D_Bloom filter_search (A i ){ for (l = 32; l >;l--) { if ( l is not a member of L i ) break; // l is an inactive length else { CRC_code = CRC_32 (S(A i, l)); ind = CRC_code [32: 32 - l i + ]; //most significant l i bits of CRC code, where size of a src-bloom filter(or dst-bloom filter) is 2 li ind2 = CRC_code [ l i - :]; //least significant l i bits if ( (D_Bloom filter[ind]&d_bloom filter[ind2]) == ) //positive put l into L i (A i ); } return L i (A i ); } Search (in_packet) { BMR = N-; //default BMR is the lowest priority rule L (A ) = D_Bloom filter_search (A ); // L (A )={l l L and S(A, l) is a positive} L 2 (A 2 ) = D_Bloom filter_search (A 2 ); // L 2 (A 2 )={l l L 2 and S(A 2, l) is a positive} L c = L t (L (A ) L 2 (A 2 )); // L L 2 is the intersected set of L (A ) and L 2 (A 2 ) while (L c is not empty) { //for each element of L c, where (l, l 2 ) is a tuple of the element tuple_value = concate (S(A, l ), S(A 2, l 2 )); CRC_code = CRC_6 (tuple_value); ind = CRC_code [63: 63 - l t + ]; //most significant l t bits of CRC code, where size of tuple-bloom filter is 2 lt ind2 = CRC_code [ l t - :]; //least significant l t bits -- -- } if ( (tuplebloom filter[ind]&tuplebloom filter[ind2]) == ) { //positive ind_hash = CRC_code [ l h - :]; //least significant l h bits of CRC code, where size of hash table is 2 l h rule = Hash_Table [ind_hash]; if ( (in_packet == rule) & (priority(rule) is higher than BMR)) BMR = priority(rule); } remove the current element from L c ; } return BMR; Figure 3. The proposed algorithm s search procedure, where A and A 2 are the given source and destination addresses of an input packet, and S(A i, l) is the substring of the most significant l bits of A i. Therefore, the TSP algorithm requires to 2 off-chip accesses, whereas the proposed algorithm requires only one. Note that the tuple (2, ) caused an unnecessary access in the TSP algorithm, but was filtered out by the tuple-bloom filter in the proposed algorithm. In the TSP algorithm, a tuple consists only of lengths, and values are not used in determining the tuples to be accessed. However, in the proposed algorithm, the tuple- Bloom filter uses the combined prefix values and filters out unnecessary tuples, such as the tuple (2, ) in this example. For a D-dimensional packet classification for D > 2, we extend the proposed algorithm by using Bloom filters for each field after converting the port ranges to prefixes. As more fields are involved in the tuple space, the number of tuples increases, which negatively affects search performance. Alternatively, using a single field minimizes the number of tuples (which is not actually a tuple since the tuple generally means involving more than a field), but the number of rules associated with a tuple increases. In this case, different rules can... MAY/JUNE 2 53

... FEATURE Table 2. The Bloom filter query results for input (, ). Length Prefix CRC code Bloom filter indices Bloom filter result Hash table index Source address * 3, 6 Negative N/A 2 *, 6 Positive 3 * 2, 7 Positive Destination address 2 * 7, 7 Positive N/A 3 * 7, 3 Positive Tuple (2, ) * 8, Negative N/A (2, 3) *, 3 Positive 5 (3, 2) *, 9 Negative N/A (3, 3) * 8, Negative N/A Table 3. The number of distinct prefixes (or tuples) and the distinct lengths. Source Destination Tuple Type Rule sets No. of rules N n(p ) n(l ) n(p 2 ) n(l 2 ) n(p t ) n(l t ) ACL ACL 92 7 3 266 25 6 65 ACL5,66 6 898 3 2,3 2 IPC IPC 972 282 9 8 2 876 62 IPC5,68 2 33 6 33 2,876 68 FW FW 852 33 25 67 5 37 9 FW5,35 6 33 2 33, 579... 5 IEEE MICRO have the same value for the chosen field. These rules are mapped to the same hash table entry, degrading the search performance. For efficient tuple space search, the number of tuples should be small while the field values composing the tuple space should have great variety. Hence, it is necessary to select the proper number and type of fields when composing the tuple space. We are currently investigating how to determine the number and type of fields that will optimize the tuple pruning performance. Performance evaluation We performed simulations for rule sets created by Classbench, which is widely used in evaluating the performance of packet classification algorithms. 5,6 In these simulations, we used two prefix fields for tuple pruning. We stored the remaining fields, including the port ranges, in an off-chip hash table, and compared all fields with a given input when the hash entry was accessed. Because the proposed method doesn t necessarily convert a port range into a number of prefixes, it is simpler to implement. We generated three types of 5D rule sets access control list (ACL), IP chain (IPC), and firewall (FW) with two sets for each type one for about, rules and the other for about 5, rules. We also generated input traces. Table 3 shows the characteristics of the rule sets. Let n(s) be the number of elements included in a set S. The ACL has the smallest and the IPC has the largest number of distinct tuples in n(l t ). The FW has the smallest variety and the IPC has the largest variety in n(p t ). These characteristics affect the Bloom filter performance and the off-chip hash table performance, as we will show.

Let hn(p)i be the smallest multiple of 2 which is equal to or greater than n(p) that is, hnðpþi ¼ 2 dlog 2 np ð Þe : We can adjust the sizes of the source, destination, and tuple Bloom filters proportional to hn(p )i, hn(p 2 )i, andhn(p t )i, respectively. We first investigated the performance of the individual Bloom filters (the src-bloom filter and the dst-bloom filter) related to size. We fixed the tuple-bloom filter size at hn(p t )i, and increased the size of the individual Bloom filters by a factor of, 8, 6, and 32. Figure shows the number of tuple-bloom filter queries, negatives, and positives, all in terms of the average per packet. As the size of the individual Bloom filters increases, the number of queries to the tuple-bloom filter decreases quickly. Therefore, the individual Bloom filters tuple pruning performance is proportional to the filters size, and the large size effectively filters out the lengths that cannot match. The number of positives (that is, the summation of true and false positives) is directly related to the number of off-chip hash table accesses, and this number decreases as the sizes of the individual Bloom filters increase. The number of true positives is the number of rules matching the two prefix fields among all the positives, and the number of true matches is the number of rules matching all the fields among the true positives. Our next simulation sought to determine the effectiveness of the tuple-bloom filter related to its size since the false positive rate is inversely related to a Bloom filter s size. For fixed-size source and destination Bloom filters, hn(p )i and hn(p 2 )i, Figure 5 shows the number of tuple-bloom filter negatives, positives, true positives, false positives, and true matches, all in terms of the average per packet, as the size of tuple-bloom filter increases by a factor of, 2,..., 32. Because the size of the individual Bloom filters is fixed, the query number to the tuple- Bloom filter is constant 3., 6.8, and.5fortheacl5,ipc5,andfw5,respectively. As the tuple-bloom filter size increases, the number of negatives increases. Number Number Number (a) (b) (c) 6 2 8 6 2 5 3 2 5 3 2 8 6 Source/destination Bloom filter size 8 6 Source/destination Bloom filter size Thus, the tuple-bloom filter effectively filters out unnecessary tuples. The number of positives (and the number of false positives) decreases as the tuple-bloom filter size Query Negative True positive 8 6 Source/destination Bloom filter size 32 32 False positive Positive True-match Figure. The number of queries, negatives, true positives, false positives, positives, and true matches in terms of the average per packet in the tuple-bloom filter for the 5D rule sets: access control list (ACL5) (a), IP chain (IPC5) (b), and firewall (FW5)(c). 32... MAY/JUNE 2 55

... FEATURE Number Number Number (a) (b) (c) 8 6 2 35 3 25 2 5 5 35 3 25 2 5 5 2 8 6 Tuple Bloom filter size 2 8 6 32 Tuple Bloom filter size Negative Positive True positive 2 8 6 32 Tuple Bloom filter size 32 False positive True-match Figure 5. The number of negatives, positives, true positives, false positives, and true matches in terms of the average per packet in the tuple-bloom filter for the 5D rule sets as the tuple-bloom filter s size increases by a factor of to 32: access control list (ACL5) (a), IP chain (IPC5) (b), and firewall (FW5)(c).... 56 IEEE MICRO increases. For a tuple-bloom filter size 8hn(P t )i, the average number of tuple- Bloom filter positives per packet is.,.5, and 9.6 for the ACL5, IPC5, and FW5, respectively. Table compares the proposed algorithm s performance with other algorithms in terms of memory requirements and the average and the worst-case search performance per packet measured by the number of memory accesses. We can determine the proper sizes of the Bloom filters for our proposed algorithm based on the decreasing rate of false positives in Figures and 5. Because we used two hash indices in this simulation, the false positives do not decrease much for anything bigger than factor 8. The simulation results in Table use Bloom filters with sizes 8hn(P )i, 8hn(P 2 )i, and 8hn(P t )i. The hash table entry number is hn(n)i, where hnðn Þi ¼ 2 dlog 2 nn ð Þe ; and N is the number of rules. In an ideal case, the average number of offchip hash accesses is equal to the average number of tuple-bloom filter positives in our proposed algorithm, but it is shown to be larger. Notably, the FW type has many more hash accesses than tuple-bloom filter positives. One reason is that in our simulation, we assume the rules mapped to a single hash entry are compared sequentially. Because the FW has the smallest variety in P t, as Table 3 shows, many different rules have a same source-destination pair and map to a single hash entry; they cause many accesses by being compared sequentially. We can solve this issue by applying some of the other fields to tuple pruning. The other reason is that two different rules having a different source-destination pair were collided to the same hash entry. We can solve this issue by finding a better hash function that distributes the rules more uniformly. Techniques to organize the hash table to reduce the number of collided entries are described elsewhere. 8,9 In the TSP algorithm, each field lookup consumes to 5 off-chip memory accesses using the algorithm of binary search on levels. Moreover, the TSP algorithm does not use the combined prefix values in reducing the number of tuples. Hence, its search

Table. The performance comparison to the other algorithms. Metrics Rule sets No. of Area-based rules (N) Proposed TSP 3 H-trie quad-trie Prioritybased quad-trie 2 vector Memory requirement ACL 92 22. 62.8 82.9 56. 29.9 53.3 (Kbytes) ACL5,66 7.2 273.2.5 2.2 5.6 2,793. IPC 972 23. 63.8 2.6 7.2 3.9 5.3 IPC5,68 72.7 8.2 22.7 23.3 39.9 2,53. FW 852 22. 2.9 39. 35.2 27.2.9 FW5,35 7.3 72. 9. 79.8 36. 2,3. Memory accesses per ACL 92 6.8 7.5 77.2 38.6 35.6 66. packet (average) ACL5,66. 9.2 8. 5. 59.6 6. IPC 972 7.9 27.8 7.9 9.5 73.6 63.6 IPC5,68 8.7 36.3 85.6 3.8 22. 5.9 FW 852 33.6 5. 52. 369.3 97.9 96.6 FW5,35 39.5.2 69.2 66.5 57. 738.8 Memory accesses per ACL 92 55 65 2 6 75 68 packet (worst case) ACL5,66 59 59 77 9 3 76 IPC 972 9 2 28 9 6 8 IPC5,68 65 92 5 295 23 FW 852 75 93 7 293 38 FW5,35 85 89 6 93 999, performance is much worse than the proposed algorithm. Detailed description on hierarchical trie (H-trie), area-based quad-trie (AQT), bit-vector (BV), and priority-based quad-trie (PQT) algorithms can be found in previous work.,2 Even though the simulation result is the simplest case of our proposed algorithm, which uses two hash functions and assumes rules mapped to a same entry are sequentially compared, the proposed algorithm shows the best performance in all metrics. The proposed algorithm s memory requirements, shown in Table, are the summation of the memories for the Bloom filters (2 to 6 Kbytes) and the off-chip hash table (2 to 68 Kbytes). If we use a -bit counter for each Bloom filter bit to provide the incremental deletion of rules, the required memory of the Bloom filters would be 8 to 2 Kbytes, still small enough to fit into a chip. For the off-chip hash table, if we use a 25-MHz QDRII SRAM 2 with 36-bit width, each hash entry of 22 bytes is read through five accesses, taking 2 nanoseconds. Since the proposed algorithm consumes 7 to memory accesses per packet, it takes to 8 ns. Hence, the average throughput is.25 to 7. million packets per second (Mpps). In an ideal case, considering the to 2 average number of tuple-bloom filter positives shown in Figure 5, the proposed algorithm could achieve a throughput of.7-2.5 Mpps. B ecause the performance of tuple pruning depends on the number and type of selected fields, and because the Bloom filter has a probabilistic data structure, it is not easy to formulate the performance of our proposed algorithm. The mathematical analysis on optimizing the pruning performance through the Bloom filters should be investigated further. Recently, new network applications demanding a multimatch packet classification have emerged. In these applications, all matching results including the BMR must be returned. We therefore need efficient algorithms that can perform both a highest priority match and a multimatch packet classification. Because it is simple to return all matching results in the hash table lookups, the proposed algorithm naturally enables both the highest priority match and the multimatch packet classification. MICRO... MAY/JUNE 2 57

... FEATURE Acknowledgments This work was supported by the National Research Foundation of Korea (NRF) through a grant funded by the Korean government (2-83) and by the Korean Ministry of Knowledge Economy under the HNRC-ITRC support program supervised by the NIPA (NIPA-2-C9-- ).... References. H.J. Chao, Next Generation Routers, Proc. IEEE, vol. 9, no. 9, 22, pp. 58-588. 2. H. Yu and R. Mahapatra, A Memory- Efficient Hashing by Multi-Predicate Bloom Filters for Packet Classification, Proc. IEEE Int l Conf. Computer Comm. (INFOCOM), IEEE Press, 28, pp. 267-275. 3. V. Srinivasan, S. Suri, and G. Varghese, Packet Classification Using Tuple Space Search, Proc. ACM SIGCOMM, ACM Press, 999, pp.35-6.. V. Srinivasan et al., Fast and Scalable Layer Four Switching, Proc. ACM SIGCOMM, ACM Press, 998, pp. 9-22. 5. H.Song,J.Turner,andS.Dharmapurikar, Packet Classification Using Coarse- Grained Tuple Spaces, Proc. Architecture for Networking and Comm. Systems (ANCS), ACM Press, 26, pp. -5. 6. S. Dharmapurikar et al., Fast Packet Classification Using Bloom Filters, Proc. Architecture for Networking and Comm. Systems (ANCS), ACM Press, 26, pp. 6-7. 7. S. Dharmapurikar, P. Krishnamurthy, and D. Taylor, Longest Prefix Matching Using Bloom Filters, IEEE/ACM Trans. Networking, vol., no. 2, 26, pp. 397-9. 8. H. Song et al., Fast Hash Table Lookup Using Extended Bloom Filter: An Aid of Network Processing, Proc. ACM SIGCOMM, 25, pp. 8-92. 9. C. Martinez, D. Pandya, and W. Lin, On Designing Fast Non-uniformly Distributed IP Address Lookup Hashing Algorithms, IEEE/ACM Trans. Networking, vol. 7, no. 6, 29, pp.96-925.. F. Yu and T.V. Lakshnam, Efficient Multimatch Packet Classification and Lookup with TCAM, IEEE Micro, vol. 25, no., 25, pp. 5-59.. D.E. Taylor, J.S. Turner, ClassBench: A Packet Classification Benchmark, IEEE/ACM Trans. Networking, vol. 5, no. 3, 27, pp. 99-5. 2. H. Lim, M. Kang, and C. Yim, Two-dimensional Packet Classification Algorithm Using a Quad-tree, Computer Comm., vol. 3, no. 6, 27, pp. 396-5. Hyesook Lim is an associate professor in the Department of Electronics Engineering at Ewha Womans University, Seoul, Korea. Her research interests include router design issues such as IP address lookup, packet classification, and deep packet inspection. Lim has a PhD in electrical and computer engineering from the University of Texas at Austin. She is a member of IEEE. So Yeon Kim is a research engineer at Samsung Electronics. Her research interests include fast IP address lookup and packet classification algorithms. Kim has an MS from the Department of Electronics Engineering at Ewha Womans University, Seoul, Korea. Direct questions and comments about this article to Hyesook Lim, Ewha Womans University, Seoul 2-75, Korea; hlim@ ewha.ac.kr.... 58 IEEE MICRO