Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

Similar documents
Last Lecture: Network Layer

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table

Data Structures for Packet Classification

CS419: Computer Networks. Lecture 6: March 7, 2005 Fast Address Lookup:

IP Forwarding. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline

Master Course Computer Networks IN2097

Master Course Computer Networks IN2097

Scalable Name-Based Packet Forwarding: From Millions to Billions. Tian Song, Beijing Institute of Technology

Router Design: Table Lookups and Packet Scheduling EECS 122: Lecture 13

Dynamic Pipelining: Making IP- Lookup Truly Scalable

Computer Networks CS 552

Hash-Based String Matching Algorithm For Network Intrusion Prevention systems (NIPS)

ECE697AA Lecture 21. Packet Classification

Lecture 11: Packet forwarding

Growth of the Internet Network capacity: A scarce resource Good Service

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

CS 268: Route Lookup and Packet Classification

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Routers: Forwarding EECS 122: Lecture 13

Routers: Forwarding EECS 122: Lecture 13

Homework 1 Solutions:

Lecture 5: Router Architecture. CS 598: Advanced Internetworking Matthew Caesar February 8, 2011

Multi-gigabit Switching and Routing

ITTC High-Performance Networking The University of Kansas EECS 881 Packet Switch I/O Processing

ECE697AA Lecture 20. Forwarding Tables

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Scalable Enterprise Networks with Inexpensive Switches

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including

LONGEST prefix matching (LPM) techniques have received

Network Processors and their memory

15-744: Computer Networking. Routers

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava

Network Processors. Nevin Heintze Agere Systems

Recursive Flow Classification: An Algorithm for Packet Classification on Multiple Fields

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Chapter 12 Digital Search Structures

Midterm Review. Congestion Mgt, CIDR addresses,tcp processing, TCP close. Routing. hierarchical networks. Routing with OSPF, IS-IS, BGP-4

HIGH-PERFORMANCE PACKET PROCESSING ENGINES USING SET-ASSOCIATIVE MEMORY ARCHITECTURES

Professor Yashar Ganjali Department of Computer Science University of Toronto.

Problem Statement. Algorithm MinDPQ (contd.) Algorithm MinDPQ. Summary of Algorithm MinDPQ. Algorithm MinDPQ: Experimental Results.

Scalable Lookup Algorithms for IPv6

Deep Packet Inspection of Next Generation Network Devices

High-Speed Network Processors. EZchip Presentation - 1

Implementation of Boundary Cutting Algorithm Using Packet Classification

IP Address Lookup and Packet Classification Algorithms

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification. Fang Yu, T.V. Lakshman, Martin Austin Motoyama, Randy H.

Novel Hardware Architecture for Fast Address Lookups

Counter Braids: A novel counter architecture

Frugal IP Lookup Based on a Parallel Search

IP packet forwarding, or simply, IP-lookup, is a classic

CS 552 Computer Networks

Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!

High-Performance Network Data-Packet Classification Using Embedded Content-Addressable Memory

Two Level State Machine Architecture for Content Inspection Engines

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21

Software Defined Networking

The iflow Address Processor Forwarding Table Lookups using Fast, Wide Embedded DRAM

IP ROUTING LOOKUP: HARDWARE AND SOFTWARE APPROACH. A Thesis RAVIKUMAR V. CHAKARAVARTHY

Cisco Nexus 9508 Switch Power and Performance

1 Connectionless Routing

An Efficient Parallel IP Lookup Technique for IPv6 Routers Using Multiple Hashing with Ternary marker storage

Internet Routers Past, Present and Future

A B C D E Total / 24 / 23 / 12 / 18 / 3 / 80

Review on Tries for IPv6 Lookups

DevoFlow: Scaling Flow Management for High Performance Networks

IP Address Lookup in Hardware for High-Speed Routing

Scalable Packet Classification for IPv6 by Using Limited TCAMs

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors

L3 Addressing and data plane. Benjamin Baron

INF5050 Protocols and Routing in Internet (Friday ) Subject: IP-router architecture. Presented by Tor Skeie

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

CS244a: An Introduction to Computer Networks

Configuring TAP Aggregation and MPLS Stripping

Sizing Router Buffers

Tree-Based Minimization of TCAM Entries for Packet Classification

TriBiCa: Trie Bitmap Content Analyzer for High-Speed Network Intrusion Detection

How to Choose the Best Router Switching Path for Your Network

Routing architecture and forwarding

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Configuring ACLs. ACL overview. ACL categories. ACL numbering and naming

FPX Architecture for a Dynamically Extensible Router

Flow Caching for High Entropy Packet Fields

FPGA Implementation of Lookup Algorithms

Counter Braids: A novel counter architecture

CSC Network Security

Configuring TAP Aggregation and MPLS Stripping

CS 5114 Network Programming Languages Data Plane. Nate Foster Cornell University Spring 2013

Three Different Designs for Packet Classification

Router Architectures

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Design principles in parser design

On using content addressable memory for packet classification

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Multiway Range Trees: Scalable IP Lookup with Fast Updates

Selective Boundary Cutting For Packet Classification SOUMYA. K 1, CHANDRA SEKHAR. M 2

Transcription:

// Bottlenecks Memory, memory, 88 - Switch and Router Design Dr. David Hay Ross 8b dhay@cs.huji.ac.il Source: Nick Mckeown, Isaac Keslassy Packet Processing Examples Address Lookup (IP/Ethernet) Where to send an incoming packet? Use output-port, to send packets to MAC address ::::89:ab Exact Match Use output-port, to send packets to destination network./ -(Longest Prefix Match) Packet Processing Examples Intrusion Detection Schemes Deep Packet inspection (DPI) Drop all packets that contains the string EvilWorm anywhere within the packet SNORT rule set Firewall, ACL Which packet to accept or deny? Drop all packets from evil source network./ on ports - Usually needs fields: source-address, dest-address, sourceport, dest-port, protocol Packet Processing Rate Memory Technology (-) Year 99 999 Line Mb/s.Gb/s Gb/s Gb/s B packets (Mpkt/s).9.8. Technology Networking DRAM Single chip density $/chip ($/MByte) MB $-$ ($.-$.) SRAM MB $-$ ($-$8) TCAM MB $-$ ($-$) Access speed -8ns Watts/ chip.-w -8ns -W -8ns -W. Lookup mechanism must be simple and easy to implement. (Surprise?) Memory access time is the long-term bottleneck Note: Price, speed and power are manufacturer and market dependent. Numbers are a bit outdated but give the general idea

// Simplest Task: Exact Matching Solution : Binary Search Mostly in bridges Bridges works in layer (Ethernet) Bridges connects two Ethernet networks Wire-speed forwarding: Each time a packet arrives at a bridge, forward it according to the destination MAC address Store/update also the source MAC address (learning) Should be done at wire speed a b Bridge c d MAC addresses have values which can be sorted Thus, when keeping them sorted, one can perform a binary search on the array and find the right MAC address However, each iteration is a memory access log N memory accesses works fine (even using DRAM) for small speed, N (around Mb/s, 8K values) but doesn t scale for large N/higher speeds (not even for Mb/s, K values) Using faster hardware (SRAM) won t really solve the problem (and it is more expensive ) Scaling using Hashing Example (Gigaswitch, 99) Hashing is much faster than binary search on average, however much slower on the worst case (up to linear time ) However, one can choose (pre-compute) good hash functions, so the number of collision can be small and bounded Precomputation takes a lot of time, but addresses are not added in rapid rate Applying the hash functions is done on wire-speed More sophisticated data structure/hashing techniques can also be applied (e.g. to reduce memory) Bloom Filters, fingerprinting, etc. N = K; binary search takes memory accesses For each 8-bit address addr, we first apply h(addr), to get 8-bit value: LSB are the hash-table entry index (K entries) Each entry is a balanced binary tree of height at most, sorted by the remaining MSB The hash function should guarantee that no more than 8 addresses are in the same tree, and that we can disambiguate between addresses using the MSB Solve corner-cases separately (CAM); rehashing memory accesses IP longest prefix matching Destination =..9. ------------------------------- payload OK better even better Prefix Next Hop Interface.../... Output-port.../8...9 Output-port.../... Output-port Longest Prefix Match is Harder than Exact Match The destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix Hence, one needs to search among the space of all prefix lengths; as well as the space of all prefixes of a given length best!..8./... IP Forwarding Table Output-port

// Current Practical Data Problem Definition Caching works poorly in backbone routers, concurrent flows Wire speed lookup needed for -byte packets % are TCP acks nsec/packet in Gbsand 8 nsec/packet in Gbs Lookup dominated by memory accesses speed is measured by memory accesses Prefix length 8- Today, prefixes with growth million prefixes Higher speeds need SRAM Worth minimizing memory 9../, R 9../../ 9../, R../, R 9../ 9... 9...... LPM: Find the most specific route, or the longest matching prefix among all the prefixes matching the destination address of an incoming packet LPM in IPv Use exact match algorithms for LPM! Metrics for Lookup Algorithms Network Address We can start with prefix length 8 Exact match against prefixes of length Exact match against prefixes of length Exact match against prefixes of length Priority Encode and pick Port Speed (= number of memory accesses) Storage requirements (= amount of memory) Low update time Scalability With length of prefix: IPv unicast(b), Ethernet (8b), IPv multicast (b), IPv unicast(8b) With size of routing table: (sweetspotfor today s designs = million) Flexibility in implementation Low preprocessing time Our Toy Example Unibit(=Radix) Tries P = * P = * P = * = * = * = * = * = * = * P = * P = * P = * = * = * = * = * = * = * pointer prefix pointer Packet: 8.....,,, Forward to

// Unibit Tries Compacting One-Way Branches (variant of PARTICIA tree) P = * P = * P = * = * = * = * = * = * = * P P P = * P = * P = * = * = * = * = * = * = * P P P P P = * P = * P = * = * = * = * = * = * = * P P P = * P = * P = * = * = * = * = * = * = * P P P P Input: Memory: null Input: Memory: P = * P = * P = * = * = * = * = * = * = * P P P = * P = * P = * = * = * = * = * = * = * P P P P Input: Memory: Input: Memory:

// Unibit Tries - Analysis W-bit prefixes, N -prefixes: O(W) lookup, O(NW) storage and O(W) update complexity Patricia: O(N) storage (why?) Still slow, high memory, but: Simple Extensible to wider fields Multi-bit Tries W W/k Binary trie Depth = W Degree = Stride = bit Multi-ary trie Depth = W/k Degree = k Stride = k bits Principle: Trade Memory for Speed Prefix Expansion with Multi-bit Tries Quadrary-Trie(k=) If stride = k bits, prefix lengths that are not a multiple of k need to be expanded E.g., k = : Prefix * *, * * * Expanded prefixes Maximum number of expanded prefixes corresponding to one non-expanded prefix = k- P = * P = * P = * = * = * = * = * = * = * a b a Pa Pb Pa a b Pb b Pa Pb Prefix Expansion Increases Storage Consumption Ternary Content-Addressable Memory (TCAM) Replication of next-hop ptr Greater number of unused (null) pointers in a node Time ~ W/k Storage ~ NW/k * k- Improvement: From Fixed-Stride Tries to Variable Stride Tries 8 9 TCAM Array Each entry is a word in {,, } W and represents a rule Encoder 8 9 Match lines Search Key

// Example TCAM Benefits and Disadvantages 8 9 Encoder Match lines Deterministic Search Throughput O() search Very flexible to other problems as well Next week: multi-field packet classifications However, relatively costly and energyconsuming $ for small (Mbit) TCAM Energy depends on the number of entries ~ million TCAM devices already deployed Typical Dimensions and Speed K-K rules - symbols per rule million searches per second for -bit keys Suitable even for Gb/s traffic IPv and IPv lookups are trivial with TCAM Extra symbolsare left in each entry, that can be used to optimize TCAM performance