Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Similar documents
Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Scalable Packet Classification on FPGA

Scalable Packet Classification on FPGA

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

FPGA Based Packet Classification Using Multi-Pipeline Architecture

CS 5114 Network Programming Languages Data Plane. Nate Foster Cornell University Spring 2013

Network Virtualization Based on Flows

Problem Statement. Algorithm MinDPQ (contd.) Algorithm MinDPQ. Summary of Algorithm MinDPQ. Algorithm MinDPQ: Experimental Results.

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification. Fang Yu, T.V. Lakshman, Martin Austin Motoyama, Randy H.

High-Performance Packet Classification on GPU

Software Defined Networks and OpenFlow. Courtesy of: AT&T Tech Talks.

OpenFlow Ronald van der Pol

COMP211 Chapter 4 Network Layer: The Data Plane

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005

Implementing an OpenFlow Switch With QoS Feature on the NetFPGA Platform

CSC 4900 Computer Networks: Network Layer

Field-Split Parallel Architecture for High Performance Multi-Match Packet Classification Using FPGAs

I Commands. iping, page 2 iping6, page 4 itraceroute, page 5 itraceroute6 vrf, page 6. itraceroute vrf encap vxlan, page 12

Chapter 5 Network Layer: The Control Plane

OPENFLOW & SOFTWARE DEFINED NETWORKING. Greg Ferro EtherealMind.com and PacketPushers.net

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

Scalable Enterprise Networks with Inexpensive Switches

FPX Architecture for a Dynamically Extensible Router

CSC358 Week 6. Adapted from slides by J.F. Kurose and K. W. Ross. All material copyright J.F Kurose and K.W. Ross, All Rights Reserved

Line-rate packet processing in hardware: the evolution towards 400 Gbit/s

Data Structures for Packet Classification

Resource-Efficient SRAM-based Ternary Content Addressable Memory

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21

Multi-dimensional Packet Classification on FPGA: 100 Gbps and Beyond

NetFPGA Hardware Architecture

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA

High Performance Architecture for Flow-Table Lookup in SDN on FPGA

ClassBench-ng: Recasting ClassBench After a Decade of Network Evolution

CS 268: Route Lookup and Packet Classification

Chapter 4 Network Layer: The Data Plane

Hashing Round-down Prefixes for Rapid Packet Classification

Packet Classification Using Dynamically Generated Decision Trees

Design principles in parser design

SCALABLE HIGH-THROUGHPUT SRAM-BASED ARCHITECTURE FOR IP-LOOKUP USING FPGA. Hoang Le, Weirong Jiang, Viktor K. Prasanna

P51: High Performance Networking

Scalable Ternary Content Addressable Memory Implementation Using FPGAs

Survey on Multi Field Packet Classification Techniques

An Architecture for IPv6 Lookup Using Parallel Index Generation Units

Chapter 4 Network Layer: The Data Plane

High-Performance Packet Classification on GPU

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview

A Framework for Rule Processing in Reconfigurable Network Systems

Performance Modeling and Optimizations for Decomposition-based Large-scale Packet Classification on Multi-core Processors*

Multi-gigabit Switching and Routing

Slicing a Network. Software-Defined Network (SDN) FlowVisor. Advanced! Computer Networks. Centralized Network Control (NC)

A Configurable Packet Classification Architecture for Software- Defined Networking

Software-Defined Networking (Continued)

Network Layer: Chapter 4. The Data Plane. Computer Networking: A Top Down Approach

On using content addressable memory for packet classification

Chapter 4 Network Layer: The Data Plane

Chapter 4 Network Layer: The Data Plane

Router Architectures

Jakub Cabal et al. CESNET

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

Chapter 4 Network Layer: The Data Plane

Large-scale Multi-flow Regular Expression Matching on FPGA*

Scalable High Throughput and Power Efficient IP-Lookup on FPGA

IP Forwarding. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli

IP packet forwarding, or simply, IP-lookup, is a classic

ECE697AA Lecture 21. Packet Classification

PACKET classification is an enabling function for a variety

Three Different Designs for Packet Classification

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

P4 for an FPGA target

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory

Last Lecture: Network Layer

DevoFlow: Scaling Flow Management for High Performance Networks

Managing and Securing Computer Networks. Guy Leduc. Chapter 2: Software-Defined Networks (SDN) Chapter 2. Chapter goals:

Chapter 4 Network Layer: The Data Plane

Lesson 9 OpenFlow. Objectives :

FPGA Implementation of Lookup Algorithms

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks

CSC 401 Data and Computer Communications Networks

Implementation of Boundary Cutting Algorithm Using Packet Classification

ITTC High-Performance Networking The University of Kansas EECS 881 Packet Switch I/O Processing

Cisco Nexus 9508 Switch Power and Performance

High-Performance Network Data-Packet Classification Using Embedded Content-Addressable Memory

LONGEST prefix matching (LPM) techniques have received

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava

Topic 4a Router Operation and Scheduling. Ch4: Network Layer: The Data Plane. Computer Networking: A Top Down Approach

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Chapter 4 Network Layer: The Data Plane

Outline. Circuit Switching. Circuit Switching : Introduction to Telecommunication Networks Lectures 13: Virtual Things

Scalable Packet Classification using Distributed Crossproducting of Field Labels

Flow Caching for High Entropy Packet Fields

Programmable Software Switches. Lecture 11, Computer Networks (198:552)

Software Defined Networking

Prototyping Fast, Simple, Secure Switches for Ethane

Dynamic Pipelining: Making IP- Lookup Truly Scalable

Configuring Firewall Filters (J-Web Procedure)

Novel Hardware Architecture for Fast Address Lookups

HybridCuts: A Scheme Combining Decomposition and Cutting for Packet Classification

Transcription:

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Weirong Jiang, Viktor K. Prasanna University of Southern California Norio Yamagaki NEC Corporation September 1, 2010

Outline Background Related Work Our Solution Conclusions 9/2/2010 1

Background Packet Forwarding in Network Routers Matching packets against a (large) set of forwarding rules Each rule specifies the matching conditions and resulting actions Take corresponding actions of the matching rule Forward to a specified interface / Drop packets 9/2/2010 2

Traditional Packet Forwarding IP Lookup Look up the routing table using the destination IP address of a packet Each route entry: {prefix next-hop} Prefix: represent a subnet Longest prefix matching IP: 128.123.111.233 Prefixes 128.123.*.* Port 0 128.123.111.* Port 2 Other Packet Forwarding Problems MAC forwarding Packet classification etc. Routing table: Next-hop interface 9/2/2010 3

Flexible Flow Matching Motivation Router Supporting Network Virtualization Multiple virtual networks employing different types of forwarding rules OpenFlow Switching Centralized control Flexible flow table Initiated in 2008 60+ Universities in US NEC, Juniper, HP, Cisco,... OpenFlow-enabled Switch Software Datapath Secure Channel Flow Table Controller PC OpenFlowSwitch.org 9/2/2010 4

Example OpenFlow Rules Forwardin g types Ingress Port Eth src Eth dst Eth type VLAN ID VLAN prior IP src IP dst IP Prtl IP ToS TCP sport TCP dport Action Ethernet Switching * * 00: 1F * * * * * * * * * Port 6 IP Routing * * * * * * * 1.2.3.4 * * * * Port 9 Applicatio n Firewall * * * * * 5 * * * * * 22 Drop Flow Switching Port 2 00: 23 00: 42 0800 vlan1 * 1.2.3.4 5.6.7.8 4 * 1743 4 80 Port 3 Port + Ethernet + IP Port 3 00: 23 * 0800 * 7 * 2.3.4.5 4 0 * * Port 13 9/2/2010 5

OpenFlow-enabled Switch / Router Send to Secure Channel Network Packet Flexible Flow Matching Flexible Flow Matching 12-tuple Packet Header Fields Exact/ Wildcard/ Prefix match Priority Apply Actions If a packet matches multiple entries, the one with the highest priority is used. 9/2/2010 6

OpenFlow Switching Consortium Initiated in 2008 60+ Universities in US NEC, Juniper, HP, Cisco,... Goal: Put an Open platform in the hands of researchers/students to test new ideas at scale Network Programmability Enabling Innovation in Campus Networks Network Virtualization Enables atomic access to (virtualized) resources OpenFlowSwitch.org 9/2/2010 7

Challenges Throughput e.g. 40 Gbps, i.e. 125 million (40 byte) packets per second Wide Rules 12-field -->12-dimensional search Sparse Wildcards Difficult for hashing State of the art solution: Linear search Current OpenFlow Research Focus on functionality Little work to address the performance challenges 9/2/2010 8

Related Work: TCAM Ternary Content Addressable Memory (TCAM) De facto solution for packet forwarding engines Store an entry in a word Parallel search on all words Brute-force Scalable? High power/energy consumption Low clock rate Large area TCAM (18 Mb) SRAM (18 Mb) Clock rate 266 MHz 450 MHz Power 12 ~ 15 Watts 1.2 ~ 1.6 Watts Cell size 16 transistors 6 transistors 9/2/2010 9

Motivation Packet Classification Flexible Flow Matching Packet classification: 5-tuple IP src/dst, Port src/dst, IP protocol OpenFlow: 12-tuple Algorithmic Solutions SRAM-based Pipelines Decision tree -based packet classification algorithm Decision forest: multiple decision trees Modern FPGAs Massive parallelism Large amount of dual-port RAMs High clock rate, e.g. 550 MHz (Xilinx Virtex-5) 9/2/2010 10

Packet Classification Example Example Rule Set Rule SA (8-bits) DA (8-bit) SP (4-bit) DP (4-bit) Prtl (2-bit) Priority R1 1110* 01* [1:3] [6:7] 1 1 R2 0000* 1101* [10:13] [2:6] 2 1 R3 100* 00* [2:5] [6:7] 3 2 R4 10* 0011* [0:3] [6:7] 3 3 R5 111* 010* [2:5] [6:7] 0 3 R6 001* 1* [1:3] [6:7] 1 4 R7 0* 11* [1:3] [6:7] 2 5 R8 * 0110* [0:15] [0:7] 3 6 Packet SA (8-bits) DA (8-bit) SP (4-bit) DP (4-bit) Prtl (2-bit) 10000000 00110100 2 6 3 9/2/2010 11

Algorithmic Solutions Decision-tree-based Algorithm DA HyperCuts Packet 6 1111 R8 null Packet 5 1110 1101 R2 R7 R6 null R7 Packet 4 1100 1011 R6 R7 R6 R6 Packet 3 1010 1001 R7 R2 R7 R7 Packet 2 1000 Packet 1 0111 0110 0101 0100 R8 R1 R5 0011 R4 0010 0001 0000 R3 9/2/2010 12 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 SA

Drawbacks of Existing Algorithm Memory Explosion Due to rule duplication O(N D ) Y R5 R1 R3 X: 2 cuts Y: 2 cuts Large Tree Depth High latency Even worse for flexible flow matching! D = 12 R2 (a) Rule set R4 X R1 R2 R1 R5 R1 R2 R4 (b) HyperCuts R1 R3 R4 9/2/2010 13

Our Idea: Rule Set Partitioning Example OpenFlow Table Rules Switch port MAC src MAC dst Eth type VLAN ID IP src IP dst IP port TCP sport TCP dport R1 * 2 5 * * * * * * * R2 * * * * * 111 223 * * * R3 * * * * * 62 59 * * * R4 * 6 9 * * * * * * * Motivation Partition the flow table: {R1, R4} and {R2, R3} Each subset built into a decision tree Each tree uses a small number of header fields Theoretically, memory requirement: O(P(N/P) D ) = O(N D /P D-1 ) 9/2/2010 14

Architecture: Decision Forest Multiple Depth-Bounded Decision Trees Split out trees one-by-one Mapping Each Tree to a Linear Pipeline P trees P pipelines Dual-port RAMs 2x throughput P = 2 9/2/2010 15

Tree-to-Pipeline Mapping Decison Tree Internal nodes 1 Rule list at each leaf 2 3 4 5 6 7 4 4 5 5 8 9 10 11 7 7 8 9 10 11 Linear Pipeline 8 9 10 11 Memory balancing 3 7 8 7 7 8 8 6 9 9 9 1 10 10 10 2 4 11 11 11 5 9/2/2010 16

Performance Evaluation Using OpenFlow-like 12-tuple Rules Randomly generated 9/2/2010 17

Performance Evaluation (2) 9/2/2010 18

FPGA Implementation Xilinx Virtex-5 XC5VFX200T 1K 12-tuple rules 125 MHz 40 Gbps Comparison Available Used Utilization # of Slices 30,720 11,720 38% # of 36Kb BlockRAMs 456 256 56% # of User I/Os 960 303 31% # of complex rules Throughput Our design 1000 40 Gbps TCAM on NetFPGA [Naous: ancs08] 32 4 Gbps 9/2/2010 19

Summary First attempt to address the performance challenges for flexible packet forwarding Flexible forwarding itself is evolving Algorithmic Solution + Parallel Architecture Decision Forest Partitioning the rule set Multiple decision trees Reducing overall memory & resouce requirement FPGA is an attractive option for implementing network processing engines Future Work Optimal partitioning? 9/2/2010 20

Q & A Questions? 9/2/2010 21

Our Solution: Algorithms 9/2/2010 22

Our Solution: Algorithms (2) Depth bounding Split-out 9/2/2010 23