Dynamic Pipelining: Making IP- Lookup Truly Scalable

Similar documents
Dynamic Pipelining: Making IP-Lookup Truly Scalable

CS419: Computer Networks. Lecture 6: March 7, 2005 Fast Address Lookup:

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

Novel Hardware Architecture for Fast Address Lookups

Data Structures for Packet Classification

IP Forwarding. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli

Last Lecture: Network Layer

Growth of the Internet Network capacity: A scarce resource Good Service

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table

A Pipelined IP Address Lookup Module for 100 Gbps Line Rates and beyond

Scalable Lookup Algorithms for IPv6

Scalable Name-Based Packet Forwarding: From Millions to Billions. Tian Song, Beijing Institute of Technology

HIGH-PERFORMANCE PACKET PROCESSING ENGINES USING SET-ASSOCIATIVE MEMORY ARCHITECTURES

IP ROUTING LOOKUP: HARDWARE AND SOFTWARE APPROACH. A Thesis RAVIKUMAR V. CHAKARAVARTHY

Frugal IP Lookup Based on a Parallel Search

Power Efficient IP Lookup with Supernode Caching

FPGA Implementation of Lookup Algorithms

LONGEST prefix matching (LPM) techniques have received

Lecture 11: Packet forwarding

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers

ECE697AA Lecture 21. Packet Classification

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21

Parallel-Search Trie-based Scheme for Fast IP Lookup

A Scalable, Commodity Data Center Network Architecture

A Framework for Rule Processing in Reconfigurable Network Systems

Shape Shifting Tries for Faster IP Route Lookup

Shape Shifting Tries for Faster IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

PLUG: Flexible Lookup Modules for Rapid Deployment of New Protocols in High-speed Routers

Midterm Review. Congestion Mgt, CIDR addresses,tcp processing, TCP close. Routing. hierarchical networks. Routing with OSPF, IS-IS, BGP-4

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline

An Architecture for IPv6 Lookup Using Parallel Index Generation Units

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Router Architectures

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava

Novel Hardware Architecture for Fast Address Lookups

Problem Statement. Algorithm MinDPQ (contd.) Algorithm MinDPQ. Summary of Algorithm MinDPQ. Algorithm MinDPQ: Experimental Results.

Computer Sciences Department

Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Chapter 12 Digital Search Structures

Deep Packet Inspection of Next Generation Network Devices

Multi-gigabit Switching and Routing

Efficient Packet Classification for Network Intrusion Detection using FPGA

Multiway Range Trees: Scalable IP Lookup with Fast Updates

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers

Memory Management. CSE 2431: Introduction to Operating Systems Reading: , [OSC]

Design and Implementation of High Performance Application Specific Memory

Beyond TCAMs: An SRAM-based Parallel Multi-Pipeline Architecture for Terabit IP Lookup

IP Address Lookup and Packet Classification Algorithms

Fast binary and multiway prefix searches for packet forwarding

A Scalable IPv6 Route Lookup Scheme via Dynamic Variable-Stride Bitmap Compression and Path Compression i

IP Address Lookup in Hardware for High-Speed Routing

A Trie Merging Approach with Incremental Updates for Virtual Routers

Scalable Enterprise Networks with Inexpensive Switches

High Performance Architecture for Flow-Table Lookup in SDN on FPGA

Recursive Flow Classification: An Algorithm for Packet Classification on Multiple Fields

TriBiCa: Trie Bitmap Content Analyzer for High-Speed Network Intrusion Detection

IP packet forwarding, or simply, IP-lookup, is a classic

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors

The iflow Address Processor Forwarding Table Lookups using Fast, Wide Embedded DRAM

A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup

Stochastic Pre-Classification for SDN Data Plane Matching

Efficient Packet Classification using Splay Tree Models

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017

Professor Yashar Ganjali Department of Computer Science University of Toronto.

Networking hierarchy Internet architecture

High-Speed Network Processors. EZchip Presentation - 1

Computer Networks CS 552

Homework 1 Solutions:

A Hybrid IP Lookup Architecture with Fast Updates

Network Processors. Nevin Heintze Agere Systems

Cisco Nexus 9508 Switch Power and Performance

FlashTrie: Hash-based Prefix-Compressed Trie for IP Route Lookup Beyond 100Gbps

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Flexible Lookup Modules for Rapid Deployment of New Protocols in High-speed Routers

Performance Evaluation and Improvement of Algorithmic Approaches for Packet Classification

SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA. Hoang Le, Weirong Jiang, Viktor K. Prasanna

P51: High Performance Networking

High-Performance Network Data-Packet Classification Using Embedded Content-Addressable Memory

Journal of Network and Computer Applications

ENERGY EFFICIENT INTERNET INFRASTRUCTURE

Multiway Range Trees: Scalable IP Lookup with Fast Updates

The router architecture consists of two major components: Routing Engine. 100-Mbps link. Packet Forwarding Engine

ECE697AA Lecture 20. Forwarding Tables

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

CS 552 Computer Networks

ITTC High-Performance Networking The University of Kansas EECS 881 Packet Switch I/O Processing

Packet Classification Using Dynamically Generated Decision Trees

Intel: Driving the Future of IT Technologies. Kevin C. Kahn Senior Fellow, Intel Labs Intel Corporation

AN EFFICIENT HYBRID ALGORITHM FOR MULTIDIMENSIONAL PACKET CLASSIFICATION

Lecture 12: Aggregation. CSE 123: Computer Networks Alex C. Snoeren

In-memory processing of big data via succinct data structures

FAST IP ADDRESS LOOKUP ENGINE FOR SOC INTEGRATION

Balanced Trees Part One

x-fast and y-fast Tries

New Directions in Traffic Measurement and Accounting. Need for traffic measurement. Relation to stream databases. Internet backbone monitoring

University of Alberta. Sunil Ravinder. Master of Science. Department of Computing Science

Transcription:

Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University SIGCOMM 05 Rung-Bo-Su 10/26/05 1

0.Abstract IP-lookup scheme must address five challenges of scalability, namely: routing-table size, lookup throughput, implementation cost, power dissipation, and routing-table update cost. 2

Outline 1. Introduction 2. Background 3. Pipelined and Scalable IP-Lookup 4. Brief Review of TCAM-based Schemes 5. Methodology 6. Experimental Results 7. Conclusions 3

1.Introduction Fiber optics enabling high line-rates. Two major problems for IP-lookup First, 2 ns per packet (for a 160 Gbps line-rate and minimum packet size of 40 bytes). Second, a large number of prefixes. 4

1.Introduction Key component: routing-table memory is used to search through the prefixes to locate the one that matches the incoming packet. 5

1.Introduction Five key scaling requirements: Memory required. Keep up with the ever-increasing linerates. Keep the complexity of heat removal and the cost of cooling reasonable. Update implementation cost and complexity 6

1.Introduction Two categories: Trie-based TCAMs. 7

1.Introduction Tries scale well in power but they do not scale well in throughput if they are not pipelined. Two approaches for pipelining tries are: Hardware-level pipelining (HLP) Datastructure-level pipelining (DLP) To solve DLP s problems, we propose scalable dynamic pipelining (SDP). 8

2.Background Requirements: (1) To avoid denial-of-service attacks and instabilities in the network, minimum sized packets streaming in at full line-rate. (2) Provide enough memory (3) Choose the prefix with the longest match. 9

2.Background Trie-Based IP-lookup Schemes 10

2.Background Multiple-bit Stride Tries(striding) 11

2.Background The Need for Pipelined Tries One memory access may take longer than the packet inter-arrival time. The problem is aggravated that perform multiple memory accesses for one lookup. 12

3.Pipelined and Scalable IP- Lookup The observation that pipelining can be used to solve the scalability problem of IP-lookup is not new. Hardware-level pipelined (HLP) scheme. Data-structure-level pipelined (DLP) scheme. 13

3.Pipelined and Scalable IP- Lookup Hardware-Level Pipelining k is the number of levels in the multi-bit trie. d, the total delay of one memory access. one lookup every t seconds. HLP hardware- level pipelines the entire memory holding the trie into k*d/t stages. 14

3.Pipelined and Scalable IP- Lookup Hardware-Level Pipelining 15

3.Pipelined and Scalable IP- Lookup Hardware-Level Pipelining Decoder X Y Memory Array Access 2 X 2 Y Output Multiplex 16

3.Pipelined and Scalable IP- Lookup Hardware-Level Pipelining Decoder X Y Memory Array Access 2 Y 2 X Multiplex 17

3.Pipelined and Scalable IP- Lookup Data-Structure-Level Pipelining places each level of the trie in a different memory, so that each memory is accessed only once per packet lookup. Does not rely on expensive memory technologies or deep hardware pipelining, it scales well in power and implementation cost. 18

3.Pipelined and Scalable IP- Lookup 19

3.Pipelined and Scalable IP- Lookup Three remaining challenges: Scalability in memory size in route-update cost and in lookup throughput. 20

3.Pipelined and Scalable IP- Lookup DLP s Scalability Problems in Memory Size each memory stage should be sufficient for any prefix distribution. for the prefix distribution shown in Figure 4 its worst-case memory size would be no better. 21

3.Pipelined and Scalable IP- Lookup DLP s Scalability Problems in Route-update Cost Multibit trie: Arbitrarily many nodes. Tree Bitmap: Almost doubles the size of each trie node. 22

3.Pipelined and Scalable IP- Lookup DLP s Non-Scalability in Throughput Scalable Dynamic Pipelining(SDP) 00* 0* * 10* 1* 000* 100* 1010* 23

3.Pipelined and Scalable IP- Lookup DELETE: 24

3.Pipelined and Scalable IP- Lookup Jump Nodes: k bits must have an array of 2 k pointers Often there may be only one child and the remaining pointers are null. 25

3.Pipelined and Scalable IP- Lookup Jump Nodes: 26

3.Pipelined and Scalable IP- Lookup Per-Stage Memory Bound (a)binary search tree with N leaves (b)memory size of a trie with jump-nodes for the worst-case prefix distribution of Figure 4, compared to size of 1-bit trie 27

3.Pipelined and Scalable IP- Lookup Per-Stage Memory Bound (c) The space taken at various levels by a trie with jump-nodes, for various prefix distributions 28

3.Pipelined and Scalable IP- Lookup System Architecture shadow trie: a copy of the trie containing all the required auxiliary information. accessed only during the construction or update of the trie. it using slow and cheap memory (DRAM). the modifications access only the shadow trie and the IP-lookups access only the SDP trie. 29

3.Pipelined and Scalable IP- Lookup Ensures that no read operation may encounter the data-structure in an inconsistent or erroneous state. 30

3.Pipelined and Scalable IP- Lookup Optimum Cost Incremental Route-updates 31

3.Pipelined and Scalable IP- Lookup Memory Management Overhead Scalability in Lookup Rate 32

4. Brief Review of TCAM-based Schemes Content Addressable Memory (CAM): Compares all memory locations against the input key to find matching entries. Ternary Content Addressable Memory (TCAM): Supports wild card bits in the entries. Finds the longest matching prefixes in one operation. 33

4. Brief Review of TCAM-based Schemes a single access activates all memory locations, as opposed to just one, a TCAM dissipates a lot more power compared to RAM. TCAMs are pipelined at the hardware level. TCAMs do not scale well in power and implementation cost at high linerates. 34

5. Methodology Utilize CACTI 3.2. CACTI is a tool that models accurately. SRAM and CAM structures. Only for 100nm CMOS technology. 35

6. Experimental Results (a) Worst-case per-stage memory versus trie-levels for DLP 36

6. Experimental Results (b) Worst-case total memory versus trie-levels for HLP 37

6. Experimental Results (c) A comparison of total worst-case memory versus routing table size for various IP-lookup schemes. 38

6. Experimental Results Comparison of power dissipation versus line-rate for various schemes with tables sizes of (a) 250,000 (b) 500,000 (c) 1 million 39

6. Experimental Results Comparison of chip area versus line-rate for various schemes with table sizes of (a) 250,000 (b) 500,000 (c) 1 million prefixes. 40

6. Experimental Results Summary of Results HLP does not scale well in total memory size, power dissipation, route-update cost, and implementation cost. DLP does not scale well in total memory size, lookup throughput, and routeupdate cost. TCAMs do not scale well in implementation cost and power dissipation. 41

7. Conclusions Proposed scalable dynamic pipelining (SDP) Three key innovations: prove a worst-case per-stage memory bound which is significantly tighter than those of previous schemes. This route-update cost is obviously the optimum. Scalability at the data-structure level and hardware level. 42

7. Conclusions SDP naturally scales in power and implementation cost. Using detailed hardware simulation. SDP is the only scheme that achieves all the five scalability requirements. 43