Problem Statement. Algorithm MinDPQ (contd.) Algorithm MinDPQ. Summary of Algorithm MinDPQ. Algorithm MinDPQ: Experimental Results.

Similar documents
Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005

CS 268: Route Lookup and Packet Classification

Fast Packet Classification Algorithms

INF5050 Protocols and Routing in Internet (Friday ) Subject: IP-router architecture. Presented by Tor Skeie

Frugal IP Lookup Based on a Parallel Search

Three Different Designs for Packet Classification

Binary Search Schemes for Fast IP Lookups

Last Lecture: Network Layer

A Scalable Approach for Packet Classification Using Rule-Base Partition

Packet Classification Using Dynamically Generated Decision Trees

Scalable Packet Classification for IPv6 by Using Limited TCAMs

Algorithms for Packet Classification

ITTC High-Performance Networking The University of Kansas EECS 881 Packet Switch I/O Processing

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Data Structures for Packet Classification

Scalable IP Routing Lookup in Next Generation Network

AN EFFICIENT HYBRID ALGORITHM FOR MULTIDIMENSIONAL PACKET CLASSIFICATION

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

Recursive Flow Classification: An Algorithm for Packet Classification on Multiple Fields

Packet Classification. George Varghese

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1

Multiway Range Trees: Scalable IP Lookup with Fast Updates

Dynamic Routing Tables Using Simple Balanced. Search Trees

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table

ECE697AA Lecture 21. Packet Classification

Routing Lookup Algorithm for IPv6 using Hash Tables

IP Address Lookup in Hardware for High-Speed Routing

A Multi Gigabit FPGA-based 5-tuple classification system

THE advent of the World Wide Web (WWW) has doubled

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1,

Fast Update of Forwarding Tables in Internet Router Using AS Numbers Λ

Tree-Based Minimization of TCAM Entries for Packet Classification

Novel Hardware Architecture for Fast Address Lookups

Implementation of Boundary Cutting Algorithm Using Packet Classification

Grid of Segment Trees for Packet Classification

Tree, Segment Table, and Route Bucket: A Multistage Algorithm for IPv6 Routing Table Lookup

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline

Packet Classification using Rule Caching

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification

DESIGN AND IMPLEMENTATION OF OPTIMIZED PACKET CLASSIFIER

Hierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm

Disjoint Superposition for Reduction of Conjoined Prefixes in IP Lookup for Actual IPv6 Forwarding Tables

Memory-Efficient 5D Packet Classification At 40 Gbps

Growth of the Internet Network capacity: A scarce resource Good Service

Master Course Computer Networks IN2097

Master Course Computer Networks IN2097

Parallel-Search Trie-based Scheme for Fast IP Lookup

CS 552 Computer Networks

FPGA Implementation of Lookup Algorithms

Selective Boundary Cutting For Packet Classification SOUMYA. K 1, CHANDRA SEKHAR. M 2

TOWARDS EFFECTIVE PACKET CLASSIFICATION

15-744: Computer Networking. Routers

Introduction CHAPTER 1

Professor Yashar Ganjali Department of Computer Science University of Toronto.

Computer Networks CS 552

Efficient Construction Of Variable-Stride Multibit Tries For IP Lookup

Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Router Architectures

ADDRESS LOOKUP SOLUTIONS FOR GIGABIT SWITCH/ROUTER

Fast Firewall Implementations for Software and Hardware-based Routers

Performance Improvement of Hardware-Based Packet Classification Algorithm

Rule Caching for Packet Classification Support

Design of a Multi-Dimensional Packet Classifier for Network Processors

Router Design: Table Lookups and Packet Scheduling EECS 122: Lecture 13

SINCE the ever increasing dependency on the Internet, there

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors

IP Forwarding. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli

CS 5114 Network Programming Languages Data Plane. Nate Foster Cornell University Spring 2013

Packet Classification: From Theory to Practice

Lecture 5: Router Architecture. CS 598: Advanced Internetworking Matthew Caesar February 8, 2011

* I D ~~~ ~ Figure 2: Longest matching prefix.

Minimum average and bounded worst-case routing lookup time on binary search trees

Network Processors. Nevin Heintze Agere Systems

Packet Classification Using Standard Access Control List

MULTI-MATCH PACKET CLASSIFICATION BASED ON DISTRIBUTED HASHTABLE

Efficient Packet Classification using Splay Tree Models

Routers Technologies & Evolution for High-Speed Networks

Design principles in parser design

CS244a: An Introduction to Computer Networks

PACKET classification is an enabling function for a variety

CS419: Computer Networks. Lecture 6: March 7, 2005 Fast Address Lookup:

A Pipelined IP Address Lookup Module for 100 Gbps Line Rates and beyond

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

Flow Caching for High Entropy Packet Fields

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT

incrementally update the data structure on insertion or deletion of classification rules. An algorithm that supports incremental updates is said to be

Efficient Multi-Match Packet Classification with TCAM

LONGEST prefix matching (LPM) techniques have received

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table

Review on Tries for IPv6 Lookups

All-Match Based Complete Redundancy Removal for Packet Classifiers in TCAMs

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview

Dynamic Pipelining: Making IP- Lookup Truly Scalable

Efficient hardware architecture for fast IP address lookup. Citation Proceedings - IEEE INFOCOM, 2002, v. 2, p

Survey and Taxonomy of Packet Classification Techniques

Performance Evaluation of Cutting Algorithms for the Packet Classification in Next Generation Networks

Transcription:

Algorithms for Routing Lookups and Packet Classification October 3, 2000 High Level Outline Part I. Routing Lookups - Two lookup algorithms Part II. Packet Classification - One classification algorithm Pankaj Gupta Department of Computer Science Stanford University pankaj@stanford.edu http://www.stanford.edu/~pankaj 2 Routing Lookups: Outline Internet: Mesh of Routers C Background Motivation Definition of the problem Algorithm #1 Algorithm #2 of Part I IP Edge Router The Internet Core IP Core router A 3 4 Inside an IP Router 1. Accept packet arriving on an incoming link. 2. Lookup packet destination address in the forwarding table, to identify outgoing port(s). 3. Manipulate packet header: e.g., decrement TTL, update header checksum. 4. Send packet to the outgoing port(s). 5. Buffer packet in the queue. 6. Transmit packet onto outgoing link. Part I of the Talk Fast and efficient algorithms that an IP router uses to lookup the destination address in order to decide where to forward the packets next. 5 6 1

Data Address Forwarding Engine Packet payload header Router The Search Operation is not a Direct Lookup Destination Address Routing Lookup Data Structure Outgoing Port (Incoming port, label) Memory (Outgoing port, label) Forwarding Table Dest-network Port 65.0.0.0/8 128.9.0.0/16 3 1 149.12.0.0/19 7 IP addresses: 32 bits long 4G entries 7 8 The Search Operation is also not an Exact Match Search Exact match search: search for a key in a collection of keys of the same length. Relatively well studied data structures: Hashing Balanced binary search trees Example Forwarding Table Destination IP Prefix Outgoing Port 65.0.0.0/8 3 128.9.0.0/16 Prefix length 1 142.12.0.0/19 7 IP prefix: 0-32 bits 142.12.0.0/19 128.9.0.0/16 65.0.0.0/8 0 2 128.9.16.14 24 232-1 65.0.0.0 65.255.255.255 9 10 Prefixes can Overlap Longest matching prefix 65.0.0.0/8 128.9.176.0/24 128.9.16.0/21 128.9.172.0/21 128.9.0.0/16 142.12.0.0/19 0 128.9.16.14 2 32-1 Routing lookup: Find the longest matching prefix (aka the most specific route) among all prefixes that match the destination address. Prefix Length Difficulty of Longest Prefix Match 32 24 8 128.9.176.0/24 128.9.16.0/21 128.9.172.0/21 142.12.0.0/19 128.9.0.0/16 65.0.0.0/8 0 128.9.16.14 2 32-1 Prefixes 11 12 2

Metrics for Lookup Algorithms Lookup Rate Required Preprocessing time Storage requirements Lookup rate Update time Year 1998-99 1999-00 2000-01 2002-03 Line OC12c OC48c OC192c OC768c Line-rate (Gbps) 0.622 2.5 10.0 40.0 40B packets (Mpps) 1.94 7.81 31.25 125 31.25 Mpps 33 ns DRAM: 50-80 ns, SRAM: 5-10 ns 13 14 Size of the Forwarding Table Routing Lookups: Outline Number of Prefixes 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 10,000/year Superlinear 95 96 97 98 99 00 Year Algorithm #1 Motivation Details Performance Algorithm #2 Source: http://www.telstra.net/ops/bgptable.html 15 16 Binary Search on Prefix Intervals 1 P1 P2 P3 P4 P5 Prefix /0 00/2 1/1 1101/4 001/3 Interval 00001111 00000011 10001111 11011101 00100011 P5 P4 P2 P3 P1 I 1 I 2 I 3 I 4 I 5 I 6 0000 0010 0100 0110 1000 10011010 1100 1110 1111 1. [Lampson et al., Proc. Infocom, 1998] 17 Alphabetic Tree 0001 1/2 1/4 0111 0011 1101 1/8 > 1/32 I 3 1100 1/16 1/32 I 1 I 2 I 4 I 5 P5 P4 P2 P3 P1 I 1 I 2 I 3 I 4 I 5 I 6 0000 0010 0100 0110 1000 10011010 1100 1110 1111 18 I 6 3

I 1 1/2 Another Alphabetic Tree Yet Another Alphabetic Tree 0001 0001 0011 I 2 1/4 I 3 1/8 0111 I 4 1100 1101 I 1 1/2 1/16 I 3 I 4 I 5 I 6 I 5 I 1/32 1/32 6 1/8 1/16 1/32 1/32 19 20 I 2 1/4 0011 0111 1101 1100 Original Alphabetic Tree 0111 0011 1101 1/8 > 1/32 I 3 0001 1100 1/2 1/4 1/16 1/32 I 1 I 2 Avgtime = 2.85 I 4 I 5 Maxtime = 3 P5 P4 P2 P3 P1 I 1 I 2 I 3 I 4 I 5 I 6 0000 0010 0100 0110 1000 10011010 1100 1110 1111 I 6 21 I 1 1/2 Optimal Alphabetic Tree 0001 0011 I 2 1/4 Avgtime = 1.94 Maxtime = 5 I 3 1/8 0111 I 4 1100 1101 1/16 I 1/32 5 1/32 I 6 22 I 1 1/2 Optimal Depth-constrained Alphabetic Tree 0001 I 2 1/4 0011 1101 0111 Avgtime = 2 Maxtime = 4 Depth Constraint = 4 1100 Average Lookup Time logn Expected Result I 3 I 4 I 5 I 6 1/8 1/16 1/32 1/32 23 logn Maximum Lookup Time 24 4

Problem Statement access time to reach leaf i Minimize Average Lookup Time = i l i p i s.t. l i D i Previous Work: probability of accessing leaf i Depth-constrained Huffman trees Optimal solutions [Larmore and Przytycka94] O(nDlogn) with large constant factors. Depth constraint 25 Goal: Near-optimal Depthconstrained Alphabetic Tree Why near-optimal? Simpler to find than an optimal solution. Probabilities are approximate. 26 Algorithm MinDPQ Fact [Yeung91]: Given {p k }, can choose {l k } such that: H(p) C < H(p) + 2 C = avglookupt ime H ( p) = But: p i log p i i lk = p D k < 2 lk > D log2 pk log2 pk + k = 1, n 1 1< k < n Depth constraint (D) violated 27 Algorithm MinDPQ (contd.) Original distribution {p k }, possibly p min < 2 -D * p q = max k,2 k µ µ Transformed distribution Transform Probabilities {q k }, q min 2 -D Explicit D Solution where is s.t. * q k = 1 k Within 2 memory accesses of optimal! µ can be found in O(nlogn) time and O(n) space * * = opt C pklk D( p q) + H ( p) + 2 C + 2 k 28 Algorithm MinDPQ: Experimental Results Summary of Algorithm MinDPQ Average mem-accesses A practical algorithm to minimize average lookup time while simultaneously keeping maximum lookup time bounded. Provably within two memory accesses of the optimal algorithm. Maximum mem-accesses 29 30 5

Routing Lookups: Outline Algorithm #1 Algorithm #2 Motivation Previous Work Details Performance What if we are simply interested in the fastest worst-case lookup algorithm? 31 32 Previous Work Previous Work (contd.) K. Sklower. A tree-based packet routing table for Berkeley unix, Proc. Usenix, pp 93-9, 1991. W. Doeringer, G. Karjoth and M. Nassehi. Routing on longest-matching prefixes, IEEE/ACM Transactions on Networking, vol. 4, no. 1, pp 86-97, 1996. M. Degermark, A. Brodnik, S. Carlsson, S. Pink. Small forwarding tables for fast routing lookups, Proc. Sigcomm, pp 3-14, 1997. M. Waldvogel, G. Varghese, J. Turner, B. Plattner. Scalable high-speed IP routing lookups, Proc. Sigcomm, pp 25-36, 1997. 33 B. Lampson, V. Srinivasan, G. Varghese. IP lookups using multiway and multicolumn search, Proc. Infocom, vol. 3, pp 1248-56, 1998. V. Srinivasan, G.Varghese. Fast IP lookups using controlled prefix expansion, Sigmetrics, 1998. S. Nilsson, G. Karlsson. IP-address lookup using LC-tries, IEEE JSAC, vol. 17, no. 6, pp 1083-92, 1999. Fastest: 298 ns (3.3 Mpps) with 2 MB for forwarding table with 38K prefixes (300 MHz Pentium-II with 512 KB cache) 34 Lookups with Ternary CAM Motivation: Speed and Simplicity 0 1 3 Destination Memory array Address M 2 TCAM 1.23.11.3 P 32 P 31 10.x.x.x P 8 Pros Fast: 15-20 ns Simple to understand Cons High power: 6-8 W Expensive (low density): 0.25 MB at 50 MHz costs $30-$75. Biggest TCAM in production today holds 64K 32-bit entries. 35 Optimized for implementation in dedicated hardware: Routing lookup function fairly welldefined Seems necessary for highest performance anyway Goal: One routing lookup every memory access 36 6

Key Idea #1 Key Idea #2 Number of Prefixes 100000 10000 1000 100 10 < 0.04% Memory is cheap (approx $1/MByte), and getting cheaper Makes sense to use memory inefficiently to gain speed 0 8 16 Prefix Length 24 32 MAE-EAST routing table (source: www.merit.edu) 37 38 Routing Lookups in Hardware Routing Lookups in Hardware 102.19.6.14 102.19.6 Prefixes expanded to 24-bits 102.19.6 24 210.12.0 210.12/16 210.12.255 1 Port 2 24 = 16M entries Port 39 128.3.72 128.3.72.14 Prefixes up to 24-bits 1 Port 128.3.72 24 0 Pointer 14 8 offset base (when pipelined) Throughput of one routing lookup every memory access. Prefixes longer than 24-bits Port Next Hop (24,8) split Port 40 Routing Lookups in Hardware Routing Lookups: Outline Pros Simple hardware implementation 20 Mpps with 50ns DRAM Unlimited number of prefixes less than or equal to 24 bits long Cons Large memory required (7-33 MB) Depends on prefix-length distribution Slow worst-case updates Algorithm #1 Previous work Algorithm #2 41 42 7

Summary of Contributions High Level Outline Algorithm to minimize average lookup time while keeping worst case bounded: of independent interest in information theory. Hardware lookup algorithm: first proposed algorithm that performs a routing lookup in one memory access. Part I. Routing Lookups - Two lookup algorithms Part II. Packet Classification - One classification algorithm 43 44 Packet Classification: Outline Background Motivation Problem definition Previous work Proposed algorithm 45 R C Background Traditional Internet provides a besteffort service, and treats all packets IP Core router going to the same destination identically The Internet Core IP Edge Router A 46 Motivation: Desire for Additional Services Packet Header Fields E1 Y ISP1 Z X NAP ISP3 ISP2 L4-SP L4-DP L4-PROT L3-SA L3-DA L3-PROT L2-SA L2-DA Service Differentiated Service Packet Filtering Policy-based routing Example Src IP address Ensure that traffic from ISP2 is given higher priority over traffic from ISP3. Deny all web traffic from ISP3 at interface X. Transport Layer Protocol Ensure that all web traffic from ISP3 is sent via interface Z. 47 Transport layer header Network layer header MAC header DA = Destination address SA = Source address PROT = Protocol SP = Source port DP = Destination port L2 = layer 2 (e.g., Ethernet) L3 = layer 3 (e.g., IP) L4 = layer 4 (e.g., TCP) 48 8

Multi-field Packet Classification Rule 1 Rule 2 Rule N Field 1 5.3.40.0/21 5.168.3.0/24 5.168.0.0/16 Field 2 2.13.8.11/32 152.133.0.0/ 16 152.0.0.0/8 Field k UDP TCP ANY Example: packet (5.168.3.32, 152.133.171.71,, TCP) Action A 1 A 2 A N Routing Lookup: Instance of 1D Classification One-dimension (destination address) Forwarding table classifier Routing table entry rule Outgoing port action Prefix-length priority Packet Classification: Find the action associated with the highest priority rule matching an incoming packet header. 49 Example of multi-dimensional classification: Firewall for packet-filtering 50 Dimension 2 Geometric Interpretation R7 R3 R6 Packet classification problem: Find the highest priority rectangle containing an incoming point P2 P1 e.g. (144.24/16, 64/24) e.g. (128.16.46.23, *) R5 R4 R1 R2 Goal: Packet Classification Algorithms Small preprocessing time Low storage requirements High speed Scale to multiple header fields Dimension 1 51 52 Packet Classification: Outline Previous work Proposed algorithm 53 Previous Work T.V. Lakshman, D. Stiliadis. High-speed policy-based packet forwarding using efficient multi-dimensional range matching, Proc. Sigcomm, pp 191-202, 1998. V. Srinivasan, S. Suri, G. Varghese and M. Waldvogel. Fast and scalable layer four switching, Proc. Sigcomm, pp 203-214, 1998. V. Srinivasan, G. Varghese, S. Suri. Packet classification using tuple space search, Proc. Sigcomm, pp 135-146, 1999. 54 9

Previous Work (contd.) M. M. Buddhikot, S. Suri, and M. Waldvogel. Space decomposition techniques for fast layer-4 switching, PfHSN 99, pp 25-41, 1999. A. Feldmann and S. Muthukrishnan. Tradeoffs for packet classification, Proc. Infocom, vol. 3, pp 1193-202, 2000. T. Woo, A modular approach to packet classification: algorithms and resuts, Proc. Infocom, vol. 3, pp 1203-22, 2000. Previous Algorithms: Summary Good for two fields, but do not scale to more than two fields, OR Good for very small classifiers (< 50 rules) only, OR Have non-deterministic classification time, OR Either too slow or consume too much storage 55 56 Classification Algorithms: Speed vs Storage Tradeoff Point Location: Lower bounds for N regions in d dimensions. O(log N) time with O(N d ) storage, or O(log d-1 N) time with O(N) storage N = 100, d = 4, N d = 100 MBytes and log d-1 N = 350 memory accesses Recursive Flow Classification: Motivation Lower bounds are achieved by pathological classifier datasets. Real-life datasets have structure and redundancy. Good heuristics may do better than worstcase bounds for real-life datasets. Goal: A practical algorithm that exploits the structure of real-life datasets to achieve both high speed and low storage requirements. 57 58 Packet Classification: Outline Classifier Dataset Previous work Algorithm Recursive Flow Classification Motivation Real-life datasets: characteristics and structure Algorithm details Performance Pros and cons 793 classifiers from 101 ISP and enterprise networks with a total of 41,505 rules 40 classifiers: more than 100 rules. Biggest classifier had 1733 rules Maximum of 4 fields per rule: source IP address, destination IP address, protocol and destination transport port number 59 60 10

Storage in Mbytes Structure of the Classifiers Structure of the Classifiers 4 regions R3 7 regions R3 {R2, R3} Dimension 2 R1 R2 R1 {R1, R2} R2 {R1, R2, R3} Dimension 1 61 dataset: 1733 rule classifier = 4316 distinct regions (worst case is 10 11!) 62 Recursive Flow Classification (RFC): Basic Idea 16 RFC: Packet Flow 16 8 Reduction S = header width T = log2(#actions) 2 S = 2 128 2 T = 2 12 One-step 128 16 12 action 2 S = 2 128 2 64 2 32 2 T = 2 12 Multi-step 63 Combination 16 8 16 8 16 8 Header 16 8 128 64 32 16 Phase 0 Phase 1 Phase 2 Phase 64 3 RFC: Classification Speed RFC: Storage Requirements Pipelined hardware: 31 Mpps (worst case OC192) using two 0.5Mbyte SRAMs and two 8Mbyte SDRAMs at 125MHz. Three Phases Four Phases Number of Rules 65 66 11

Storage in Mbytes Two Phase RFC Crossproducting [Srini98] RFC: Pros and Cons Two Phases Three Phases Four Phases Pros Exploits structure of real-life datasets Scales to multiple fields Fast classification (designed for parallel and pipelined accesses) Cons Depends on structure of classifiers Large pre-processing time Slow incremental insertions Number of Rules 67 68 Packet Classification: Outline Previous work Proposed algorithm Motivation Real-life classifiers: characteristics and structure Algorithm details Performance Pros and cons Summary of Contributions on Packet Classification Recursive Flow Classification: First proposed algorithm that achieves fast multi-field classification and low storage requirements, by deliberately exploiting the structure of real-life datasets. 69 70 Other Contributions on Packet Classification P. Gupta and N. McKeown. Packet classification using hierarchical intelligent cuttings, Proc. Hot Interconnects VII, August 99. Also in IEEE Micro, pp 34-41, vol. 20, no. 1, Jan/Feb 2000. P. Gupta and N. McKeown. Dynamic algorithms with worst-case performance for packet classification, Proc. IFIP Networking, May 2000. Publications for Algorithms Discussed Here P. Gupta, B. Prabhakar, and S. Boyd. Near-optimal routing lookups with bounded worst case performance, Proc. Infocom, vol. 3, pp 1184-92, March 2000. P. Gupta, S. Lin, and N. McKeown. Routing lookups in hardware at memory access speeds, Proc. Infocom, vol. 3, pp 1241-8, April 1998. P. Gupta and N. McKeown. Packet Classification on Multiple Fields, Proc. Sigcomm, vol. 29, pp 147-60, September 1999. 71 72 12

Unrelated Contributions P. Gupta and N. McKeown. Design and implementation of a fast crossbar scheduler, Proc. Hot Interconnects VI, August 98. Also in IEEE Micro, pp 20-28, vol. 19, no. 1, Jan/Feb 1999. D. Shah and P. Gupta, Fast updates on ternary CAMs for packet lookups and classification, Proc. Hot Interconnects VIII, August 2000. 73 13