A Scalable Approach for Packet Classification Using Rule-Base Partition

CNIR Journal, Volume (5), Issue (1), Dec., 2005 A Scalable Approach for Packet Classification Using Rule-Base Partition Mr. S J Wagh 1 and Dr. T. R. Sontakke 2 [1] Assistant Professor in Information Technology, Army Institute of Technology. (Affiliated to University of Pune, MS) E-Mail ID: sjwagh@rediffmail.com [2] Director & Professor, S.G.G.S. College of Engineering and Technology, Vishnupuri, Nanded-431603 MS. E-Mail ID: trsontakke@yahoo.com Abstract: This paper focuses on a new direction for packet classification, which can substantially improve the performance of a classifier by decreasing the rule-base lookup latency. The classifier partitions the rule-base into smaller independent sub-rule bases by using the hash key of hashing technique. We apply the concept of maximum entropy to select the hash key for optimal partitioning of rule-base. We performed the detailed simulations of our proposed algorithm on synthetic rulebases of size 1K to 200K entries using packet traces. From the simulation results we found that the algorithm significantly outperforms by reducing the size of a rulebase by more than four orders of magnitude with just two-levels of partitioning. Both the space and time complexity of the algorithm exhibit linearity in terms of the size of a rule-bases. The proposed idea suggests a good scalable solution for the packet classification with a large rule-base. Keywords: Packet classification, scalability, Lookup latency, Rule-bases, Space & Time complexity 1. Introductions The proposed algorithm basically intends to observe that a given packet matches only a few rules even in large classifiers [1]. This strongly implies that most of rules in any given rule-base are independent. Thus, we can partition the rule-base into many smaller independent sub-rule-bases. As long as the matching sub-rule-base can be identified quickly, the performance of the rulebase lookup can be substantially improved since the lookup needs to be performed only in the final sub-rulebase. This is achieved by hierarchically decomposing the original rule-base into many smaller independent subrule-bases based on the rules definitions. Our algorithm works in two phases: preprocessing and classification. In the preprocessing phase we hierarchically partition the original rule-base into many smaller independent sub-rule-bases by hashing on the bit fields selected from the classification space. The degree of the partitioning depends on the density of a sub-rule-base in the classification space. The denser the sub-rule-base, the more partitioning is needed. This hierarchical partitioning stops until all the sub-rule-bases are small enough. Then, during the classification phase a classifier inspects each incoming packet using the same hash key used in the preprocessing and identifies the sub-rule-base relevant to the packet. The search to find a matching rule is performed only in the final sub-rulebase where any existing lookup algorithm can be employed. To evaluate the performance of our classification algorithm, we have applied our algorithm to simulator which traces packet under synthetic rulebases of size 1K to 200K rules. The results show that the algorithm can reduce the size of the original rule-base by several orders of magnitude with only two-levels of partitioning, which requires only a couple of memory lookups. For example, a rule-base with 100K rules can be reduced to a sub-rule-base with only 7.6 rules on average and 258 rules in the worst case. In view of memory accesses, our algorithm requires 2 or 3 times less number of memory lookups compared to best classification algorithms known so far. Fur 1 thermore, the algorithm exhibits scalability in both its memory requirement and classification performance as we increase the size of a rule-base. 2. Theory The packet classification problem can define as follows. Given a rule base, which is a set of rules, a packet classifier matches one or more fields of the incoming packets in order to identify the rule. Each rule is specified by the range of values in 19

one or more fields of a packet header. Specifically, in d- dimensional packet classification, each rule r i is defined over d fields. Formally, r i is defined by a tuple (Ci, Ai) where Ci is called a classification space and Ai is the associated action of rule r i. The classification space is defined by the cross-product, where is a range of values the field k must take. A rule r i match a packet p= {b 1,b 2,b 3,.b d } if for where b k is a singleton. Multiple rules can match a packet. Thus, a classifier must identify the highest priority rule among all the matching rules. Intuitively, this requires the classifier to lookup the header fields of an incoming packet and to compare them against the rules in the rule base one by one in order of decreasing priority. When n, i.e. the number of rules, is large or the arrival rate of incoming packet is high, this is a time-consuming serial process, which will limit the speed of the classifier. Thus, the essence of the problem is to find a fast yet scalable classification function both in time and in space. 2.1 Proposed Algorithm This classification algorithm is based on the conjecture that, in a rule base, only a few rules have the possibility of matching a given packet. Let s look at the rule base example of a typical firewall shown in Table 1 (please refer Appendix -1) where inner network serves several application services such as HTTP, telnet and FTP. Rules R1, R2, and R3 represent grant of these connection requests while R0 protects inner network against spoofing attacks. D is the default deny rule for all other communications. The protocol field in Table 1 suggests that a packet using UDP protocol can be matched only to R0 or D. Thus, R1, R2 and R3 need not be matched against a UDP packet. Our algorithm works in two phases: preprocessing and classification based on divideand-conquer approach. In the preprocessing phase we divide the original rule base into many smaller independent subrule bases based on the values of classification fields where each rule is defined. This deviation of original rule base is shown in Figure 1 (please refer Appendix-2) From table 1 (please refer Appendix-1) a rule Overlaps with a rule If Intuitively, two rules overlap if there exist any instance of a packet that matches both rules. Since sub-rule bases differ at least in those bits that are selected as the hash key, a packet cannot match both sub-rule bases at the same time. Thus, the independence among sub-rule bases is guaranteed. Therefore, we need to look up only the relevant sub-rule base after inspecting a packet on the same bit fields. CNIR Journal, Volume (5), Issue (1), Dec., 2005 20 2.2 Preprocessing Phase The preprocessing phase starts by partitioning the original rule base into many independent sub-rule bases. Further, by looking up the protocol field of an incoming packet, we only need to look up the sub-rule base with the same protocol. We can select any of the bits in the classification fields as a hash key. If we select 8 bits, then we create a hash table with 2 8 =256 entries, each of which points to a sub-rule base. Intuitively, two rules may overlap if they map to the same sub-rule base while rules mapped to different sub-rule bases would never overlap, which implies that they are independent. The sub-rule bases larger threshold value can be repartitioned with another hash key, which must be different from the previously used first hash key. This hierarchical partitioning stops until all the sub-rule bases are small enough. However, the experimentation results show that two levels of partitioning are enough for a rule base fewer than 200K rules. Both the space and time complexity of this classification algorithm depend on the number of nodes and the depth of the partitioning hierarchy. To reduce the number of partitioning we need to partition a rule base into sub-rule bases as evenly as possible so that the number of empty sub-rule bases is minimized and the number of rules in sub-rule bases must follow uniform distribution. This partitioning efficiency depends on the hash key selection algorithm. 2.3 Classification Phase After partitioning a rule base and constructing hash tables during the preprocessing stage, a classifier narrows down the rule base lookup by mapping an incoming packet into the corresponding sub-rule base where the packet can be applied. The classifier looks up the hash table by using the hash key extracted from the packet header. Let us consider the rule base example shown in Table 2 (please refer Appendix -1). Assume that rules R0 to R3 are listed in the decreasing order of priority. We assume 5-dimensional classification, which uses 104-bit fields from protocol (8), source port (16), destination port (16), source (32) and destination (32) IP addresses from the header. In Table 2, we show only 8 most significant bits (MSBs) of a classification space, which may represent any header field such as the protocol. We partition the rule base to 256 buckets by using the 8 MSBs and create the hash table as shown in Table 2. Rules in one sub-rule base do not overlap with rules in other sub-rule bases. When a packet arrives, the classifier extracts the 8 MSBs from the header and uses it as an index to the hash table. If the hash table entry is not empty, then the classification is performed within the sub-rule base. Otherwise, the default rule is the matching rule. 3. Hash Key Selection Algorithm Hash key is a bit number selected from the rule space, to divide the original large rule base in to smaller part of sub rule bases. The partition of rule base is carried out on the basis of hash table, which is formed after selecting desired hash keys. The size of hash table is depends on the number of hash keys selected. If we select n

CNIR Journal, Volume (5), Issue (1), Dec., 2005 number of hash keys then their 2 n number of entries in hash table. 3.1 Rule-base Partition In the first-level partitioning, we only consider the protocol and port numbers as a hash key since these fields can naturally classify rules based on the Internet services governed by the rules. For example, HTTP service corresponds to protocol 6 and server port 80. The 6 bits are used for a client port. Thus, a hash key is concatenated from [protocol field], [direction bit] and [10 LSBs 6 MSBs in one of the port field]. Since a server port has a higher partitioning efficiency than a client port, we use the server port regardless of direction if a rule specifies a server port. If a rule specifies client ports in both source and destination ports, we use the 6 MSBs and spread the rules in both source and destination hash tables. classification space at this level is comprised of protocol In second level partitioning, which applies to (8), source (16), and destination port (16) numbers. In buckets larger than the threshold after the first level order to reduce the size of the hash table, we select a partitioning, we consider source and destination IP subset from the classification space as a hash key. If it is addresses, since the second-level hash key must be required to use a 17-bit hash key, which suggests a hash disjoint from the first-level hash key. To limit the size of table with 128K entries. There is a tradeoff between the the hash table, we only select a subset of the 64-bit fields memory space and the depth of the partitioning hierarchy as a second level hash key. depending on the size of the hash key. Assuming each entry contains either a 32-bit address or NULL-pointer, the size of the table is 512Kbytes. 3.2 Algorithm for Selecting Hash Key Selecting 6 bits from the protocol field using the The number of hash keys required for entropy-maximizing key selection algorithm, the 17-bit partitioning large rule base depends on original size of hash key is built. Since only two protocols, TCP and rule base and the threshold value (Maximum value) of UDP, need to specify port numbers, we select up to 11 sub-rule base. The formula for calculate number of hash additional bits from the port numbers for these protocols. keys is as follows Since a port number is bi-directional, i.e. either source or destination, and specified by a range with upper and N = S / T or N = 2 H lower bounds, we select one of the port field by an additional bit to denote the direction and then select additional 10 LSBs or 6 MSBs from the port field by using the precision directed grouping. Typically, a server port (dense area) designates a specific port Where N = Number of sub-rules S = Size of original rule base T = Threshold value of sub-rule base number between 0 and 1023 while a client port (sparse area) uses a random port numbered from 1024 to 65536. H = Total number of hash keys required Thus, lower 10 bits are used for a server port while upper There are following different type of hash key selection algorithms, i.e. MSB pattern (represented as MSB), Exponential growing pattern (Exp) Where w = the length of classification space, Mask distribution pattern (Mask) and The Entropymaximizing pattern (Ent) n = the total number of rules in a rule base. s = the length of a hash key, Entropy-maximizing pattern algorithm is a heart of this classification algorithm with this algorithm we 3.3 Demo on 4-hash key selection Algorithm find a good hash key by using the concepts of entropy technique, which is used in information theory. As We have simulated above-mentioned four algorithms by known widely, the entropy is maximized when all the considering only 16 rules (The rules are taken entries have the same probability of occurrence. Thus, randomly) and compared the results. We take 16 rules we can find a good hash key through the calculation of randomly by multiplying 76 with random function so as, to change the rule set each time. In real world the entropy. Using the entropy technique, a hash key standard size of one rule is up to 104 bits, but we of length can be expressed recursively by considered the size of one rule is 8 bit. We have where is the concatenation provided 4 options with respect to the hash key selection operator and q is the bit from the classification space that produces the maximum entropy. The algorithm starts by calculating the entropy for the hash key of algorithm. The original rule base (with 16 rules) is divided in to 4 groups, so we required total 2 hash keys for partitioning the rule base according to formula N = length 1 and determines the bit position that produces 2 H the maximum entropy value. Then, the algorithm The result of Entropy maximizing pattern is as repeats this process for the hash key of length 2 and so shown in Figure 2. (Please refer appendix-2). In this the on until the length of the hash key reaches or the bit B5 and B3 are get selected as hash key. The selection entropy does not increase further. Based on this of these keys is carried out according to the Entropy algorithm, we select required number of hash keys half maximizing algorithm, which is briefly explained from the source address field and half from the before. Comparing the results obtained for the four destination field. The time complexity this algorithm is 21

algorithms we conclude that the Entropy Maximizing pattern algorithm gives efficient results. 4. Experimentation and Results In this paper, we demonstrate the performance of the proposed algorithms for 3-dimensional classification (rule) in visual basic environment. Since it is difficult to obtain large real-life classification rule bases, we synthesized large rule bases by using random function, which is readily available in visual basic language. To create a synthetic rule base that resembles real-life rule bases, we carefully synthesized a rule base by following the rule base characteristics observed from real-life firewall applications. All of this experimentation was performed on 1.8GHz Pentium IV system with 256MB of memory running Windows XP Operating system. The Figure 3 shows the window by which the rules are accepted. According to figure one rule consist total 3 fields 1. Source port address (16 bit) 2. Destination port address (16 bit) 3. Protocol number (8bit) In the rule each bit is formed from a tuple {0, 1, *}, here * is a don t care bit it can be 0 or 1. The variation in rule base is maintained by entering the random value, before going for accepting rules. Figure 4 shows the results of the first level partitioning by displaying the average and maximum size of a sub-rule base after the partitioning. By the partitioning we can reduce the average size of a rule base substantially. For rule bases with 1K, 5K and 10K rules, the reduction ratios are 0.0029, 0.0014, and 0.0014 respectively. This is very significant since we can reduce the size of a rule base by more than two orders of magnitude by a single memory lookup to the corresponding hash table. However, as you can see from the maximum size of a sub-rule base in the figure, rules are not evenly distributed in the partitioned rule bases. The largest sub-rule base contains about 24% of rules of the original rule base in all the rule bases tested. As we can predict, these rules are related to HTTP service, which corresponds to protocol 6 and port 80. The numbers of sub-rule bases over the threshold (16 rules per sub-rule base) are 35, 200, and 150 for 1K, 5K, and 10K cases. For these sub-rule bases we perform the second-level partitioning. All of the first level partitioning is completed in less than one second in our experimentation platform. As a side effect of the first level partitioning, we observe that the partitioning more than doubles the total number of rules due to rule spreading. The actual inflation ratio is 2:42. Figure 5 and Figure 6 (please refer appendix-2) show the graphical results of the second level partitioning with various hash key selection algorithms. Figure 5 shows the average number of rules per sub-rule base while the Figure 6 shows the size of the largest rule base. Assuming the entropy maximizing key selection, the second-level partitioning further reduces the sub-rule base by reduction ratios of 0.054, 0.054, and 0.052 for CNIR Journal, Volume (5), Issue (1), Dec., 2005 22 1K, 5K, and 10K rule bases. When the first-level and the second-level partitioning are combined, 1K, 5K, and 10K rule bases are reduced to 1.6, 7.6, and 36.6 rules per subrule base on average, which corresponds to reduction ratios of 0.00016, 0.000076, and 0.000073. This is very significant since we can reduce the size of a rule base by more than four orders of magnitude by just two memory lookups to the hash tables. The second level partitioning is also very effective in reducing the largest sub-rule base, which contained 24% of the entire original rule base after the first level partitioning. Assuming the entropy-maximizing key selection, with the second level partitioning we can reduce the size of the largest rule base to contain 80, 100, and 150 rules in 1K, 5K, and 10K rule bases respectively. This suggests that for a 10K rule base we only need to compare a packet to those 150 rules in the worst case during classification phase. 5. Conclusion Most of existing works mainly carry out on relatively small classifiers, which consists less than 20K rules. Beyond this size, the existing packet classification algorithm schemes may not scale either due to the memory explosion or slowdown of classification. The proposed new classification algorithm is best at this time in achieving the scalability by hierarchically partitioning a rule-base into many smaller independent sub-rule bases. By using the same hash key used in the partitioning a classifier can inspect an incoming packet and find its relevant sub-rule base with a few memory lookups to the hash tables. By using the concepts of entropy we get effective hash key by which the sub-rule base can be effectively reduced compared to the original rule base. The experimental results show that two-levels of partitioning can substantially reduce the size of a rule base. Due to the partitioning phase, in a classification phase a classifier needs to access only 4.2, 20.4, and 207 rules on average for rule bases with 5K, 50K, and 200K rules. The results of simulation on this algorithm are very promising since one of the best-known algorithms [1] requires at least 13 memory accesses while this required only 4.2 memory access for 5K rules. Furthermore, according to resultant graph we show that the proposed algorithm has the unique scalability both in space and in time as we increase the size of rule base. 6. Acknowledgements We would like to acknowledge the Technical support by SRES College of Engineering, Kopargaon & SGGSCOE&T, Nanded for our Practical work of paper. 7. References: [1]A. Feldmann and S. Muthukrishnan, Tradeoffs for Packet Classification. In Gigabit Networking Workshop of the Proceedings of the IEEE INFOCOM 00. March 2000. [2] M. M. Buddhikot, S. Suri, and M. Waldvogel, Space Decomposition Techniques for Fast Layer-4 Switching, In Proceedings of the IFIP Sixth

International Workshop on Protocols for High Speed Networks. Vol. 66, No. 6, pp. 277-283, August 1999. [3] P. Gupta and N. McKeown, Packet Classification on Multiple Fields, In Proceedings of the ACM SIGCOM 99, Vol. 29, issue 4, August 1999. [4] Robert B. Ash, Information Theory, Dover Publications, 1 st edition, November 1990. [5] S J Wagh, P M Yawalkar & D B Kshirasgar, Taxonomical Survey of IP Address Lookup Algorithms, Proc. National Conference on Latest Trends in Information Technology, at North Maharashtra University, Jalgaon, (MS) India, Oct 2002. [6] S. J. Wagh, P. M. Yawalkar & S. R. Patil, Hierarchical Intelligent Cuttings: A Packet Classification Technique Proc. National Conference on Signal Processing, Intelligent Systems and Networking - SPIN 2003, Dec 4-5, 2003, Bangalore, India. [7] T. V. Lakshman and D. Stiladis, High-speed Policybased Packet Forwarding using Efficient Multidimensional Range Matching, In Proceedings of the ACM SIGCOMM 98, Vol. 28, pp. 191-202, 1998. [8] V. Srinivasan, S. Suri, G. Varghese, and M. Valdvogel, Fast and Scalable Layer Four Switching, In Proceedings of the ACM SIGCOMM 98, Vol. 28, pp. 203-214, 1998. [9] T. Woo, A Modular Approach to Packet Classification: Algorithms and Results, In Proceedings of the IEEE INFOCOM 00. March 2000. CNIR Journal, Volume (5), Issue (1), Dec., 2005 23

CNIR Journal, Volume (5), Issue (1), Dec., 2005 APPENDIX-1 [TABLES] Table 1:A rule base example of a firewall. Rules Protocol Src. Port Dest. Port R0 * * * R1 R2 R3 TCP TCP TCP 1024~ 65535 1024~ 65535 1024~ 65535 80 23 21 Src. IP Dest. IP Action Description side Outer Outer Outer Deny Protection against Spoofing Attack Accept HTTP Services Accept Accept Telnet Service FTP Service Distance * * * * * Deny Default Rule side: protected local network by the firewall. Outer side: network separated from inner side network by the firewall Table 2:A rule base example and its hash tables. Classification Phase (b0,b1,..b103) Rule Hash Key 8MSBs (b0,b1,b2,b3,b4,b5,b6,b7) 0000 0110 R0 Subrulebase 0000 0000 Null Entry 0000 0001 R3 Hash Key B3,b5 Subrulebase 00 R3 0000 0110 R1 0000 0010 Null Entry 0000 0110 R0, R1 01 R0, R1 0001 0001 0000 1100 Null Entry R2 10 R2 0001 0001 R2 0000 0001 R3 0011 1100 Null Entry 11 Null Entry **** **** D 1111 1111 Null Entry * : denotes a don the care bit Table 3. A rule base and its hash tables with two different hash keys of length 2. Rule Field Description (b0,b1,b2,b3,b4,b5,b6,b7) Index Hash Key b0,b1 Hash Key b0,b2 R0 0000 0000 00 R0 R0 R1 0110 0000 01 R1 R1 R2 1000 0000 10 R2,R3 R2 R3 1*10 0000 11 R3 R3 24

CNIR Journal, Volume (5), Issue (1), Dec., 2005 Appendix-2 [figures] Figure 1: Demo on hash key selection Algorithms Figure 2: Result of Entropy Maximizing Patterns Figure 3: Window for accepting synthetic rules Figure 4: The result of the first level partitioning Figure 5: The average size of a sub-rule base with different key selection algorithms Figure 6: The maximum size of a sub-rule base with different key selection algorithms 25

CNIR Journal, Volume (5), Issue (1), Dec., 2005 APPENDIX-3 [ALGORITHM] Entropy maximizing key selection algorithm is described as follows. Consider: N = Number of sub-rules S = Size of original rule base T = Threshold value of sub-rule base H = Total number of hash keys required n = Total number of bits present in one rule ALGORITHM: Step 1: S1 = S Step 2: For (I =0; I < H; I++) For (J = 0; J < n; J++) {For (K = 0; K < S; K++) {Count number of 1 or * which is present at J th bit position in rule number K if it is considerable and store that summation at J th position in a single dimensional array say A} } For ( J = 0 ; J < n ; J1++) {A [J] = absolute (A [J] S1 / 2 /* S1 is a number of considerable rule */ } /*Now scan the array A to find out smallest element and store the index of that smallest element from array A in a resultant array say R in a I th position, The index which we will be getting is nothing but hash key number */ Step 3: For ( K1 = 0 ; K1 < S ; K1++ ) {Now make rule number K1 as non-considerable if it consist 0 at bit position R[ I ]} S1 = S - Number of non-considerable rules } Step 4: Stop. /*from this algorithm array R consist the hash keys*/ LIST OF TABLES: Table 1: Rule base example of a firewall. Table 2: Rule base example and its hash tables. Table 3: Rule base and its hash tables with two different hash keys of length 2. LIST OF FIGURES: Figure 1: Demo on hash key selection Algorithms Figure 2: Result of Entropy Maximizing Patterns Figure 3: Window for accepting synthetic rules Figure 4: The result of the first level partitioning Figure 5: The average size of a sub-rule base with different key selection algorithms Figure 6: The maximum size of a sub-rule base with different key selection algorithms 26