Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava

Similar documents
Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

IP Forwarding. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli

Jakub Cabal et al. CESNET

Scalable Name-Based Packet Forwarding: From Millions to Billions. Tian Song, Beijing Institute of Technology

FPGA accelerated application monitoring in 40 and 100G networks

Case study: NBA as a Service at GÉANT

P51: High Performance Networking

A Next Generation Home Access Point and Router

Flexible network monitoring at 100Gbps. and beyond

Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!

P4GPU: A Study of Mapping a P4 Program onto GPU Target

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

Novel Hardware Architecture for Fast Address Lookups

Network Processors. Nevin Heintze Agere Systems

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Flows at Masaryk University Brno

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

Improving DPDK Performance

FPGA Implementation of Lookup Algorithms

Router Architectures

Ruler: High-Speed Packet Matching and Rewriting on Network Processors

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

Users Guide: Fast IP Lookup (FIPL) in the FPX

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Last Lecture: Network Layer

NetFPGA Update at GEC4

Problem Statement. Algorithm MinDPQ (contd.) Algorithm MinDPQ. Summary of Algorithm MinDPQ. Algorithm MinDPQ: Experimental Results.

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

A Framework for Rule Processing in Reconfigurable Network Systems

High-Speed Network Processors. EZchip Presentation - 1

NETWORK PROBE FOR FLEXIBLE FLOW MONITORING

TriBiCa: Trie Bitmap Content Analyzer for High-Speed Network Intrusion Detection

CS 268: Route Lookup and Packet Classification

15-744: Computer Networking. Routers

Dynamic Pipelining: Making IP- Lookup Truly Scalable

IP Address Lookup and Packet Classification Algorithms

Routers: Forwarding EECS 122: Lecture 13

Professor Yashar Ganjali Department of Computer Science University of Toronto.

Growth of the Internet Network capacity: A scarce resource Good Service

Motivation to Teach Network Hardware

Performance Evaluation of Myrinet-based Network Router

Fast and Reconfigurable Packet Classification Engine in FPGA-Based Firewall

Project Turris. Ondřej Filip 12 May 2014 RIPE 68 Warsaw

Supra-linear Packet Processing Performance with Intel Multi-core Processors

Data Structures for Packet Classification

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005

Routing architecture and forwarding

Deep Packet Inspection of Next Generation Network Devices

Feature Rich Flow Monitoring with P4

FPGA Based Agrep for DNA Microarray Sequence Searching

An Optically Turbocharged Internet Router

Design principles in parser design

High Performance Packet Processing with FlexNIC

PVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim

Master Course Computer Networks IN2097

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1,

LegUp: Accelerating Memcached on Cloud FPGAs

Master Course Computer Networks IN2097

Hardware-Accelerated Flexible Flow Measurement

Packet Header Analysis and Field Extraction for Multigigabit Networks

DDoS Protection in Backbone Networks

HSCN Quality of Service (QoS) Policy

Multi-gigabit Switching and Routing

Rapid Platform Deployment: Allows clients to concentrate their efforts on application software.

A Platform for High Performance Overlay Hosting Services

High-Performance Network Data-Packet Classification Using Embedded Content-Addressable Memory

Lecture 5: Router Architecture. CS 598: Advanced Internetworking Matthew Caesar February 8, 2011

INT G bit TCP Offload Engine SOC

ECE 435 Network Engineering Lecture 12

Scalable Lookup Algorithms for IPv6

PacketShader: A GPU-Accelerated Software Router

Lecture 11: Packet forwarding

Scalable Packet Classification using Distributed Crossproducting of Field Labels

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA

Research on DPDK Based High-Speed Network Traffic Analysis. Zihao Wang Network & Information Center Shanghai Jiao Tong University

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

IP packet forwarding, or simply, IP-lookup, is a classic

Frugal IP Lookup Based on a Parallel Search

A closer look at network structure:

Homework 1 Solutions:

A B C D E Total / 24 / 23 / 12 / 18 / 3 / 80

Routers: Forwarding EECS 122: Lecture 13

CSE 123A Computer Networks

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline

Hashing Round-down Prefixes for Rapid Packet Classification

5. Classless and Subnet Address Extensions 최양희서울대학교컴퓨터공학부

100% PACKET CAPTURE. Intelligent FPGA-based Host CPU Offload NIC s & Scalable Platforms. Up to 200Gbps

ECE697AA Lecture 21. Packet Classification

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

Experience with the NetFPGA Program

Axon: A Low-latency Device Implementing Source-routed Ethernet

Efficient Packet Classification for Network Intrusion Detection using FPGA

An Architecture for IPv6 Lookup Using Parallel Index Generation Units

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

White Paper Enabling Quality of Service With Customizable Traffic Managers

A 400Gbps Multi-Core Network Processor

EE 122 Fall 2010 Discussion Section III 5 October 2010

Fast Flexible FPGA-Tuned Networks-on-Chip

Transcription:

Hardware Acceleration in Computer Networks

Outline Motivation for hardware acceleration Longest prefix matching using FPGA Hardware acceleration of time critical operations Framework and applications Contracted research Conclusions

Network Traffic Growth New services implies network traffic growth Sharing files at social networks Video on demand services Internet of Things (IoT) more than 50 billion devices will be connected to the Internet in 2020 Large data centres use 40Gb or 100Gb networks and call for 1 Tb networks

Why hardware acceleration? Packet rate and processing speed Time to process one packet is defined by minimal packet size 64B Packet t Packet Packet Number of clock cycles, processor 3,6 GHz 1 Gbps 500 ns ~ 1 807 CPU clock cycles 10 Gbps 50 ns ~ 181 CPU clock cycles 40 Gbps 12 ns ~ 45 CPU clock cycles 100 Gbps 5 ns ~ 18 CPU clock cycles

Performance of Processor Core Time to process one packet Line rate [b/s] Maximal Packet rate [Packets/s] CPU 3GHz [clock cycles/packet] 1G 10G 40G 100G 1,5M 15M 60M 150M 1807 181 45 18 181 clock cycles per packet even for 10G links Processor performance for time-critical operations Operation Protocol Parsing TCP Stream Reassembly Packet Classification Pattern Matching Throughput 21Mp/s 2,5Mp/s 12Mp/s 400Mb/s 1G 10G 40G 100G Results for Intel Core i7, one core at 3,6 GHz

Longest Prefix Matching Find longest prefix for IP address. Core routers have routing table with more than 300k of IP prefixes Prefix Representation Trie binary tree on IP prefixes Levels of tree (steps): 32 for IPv4 128 for IPv6 Processing of multiple bits in a single step to achieve 100 Gbps throughput Related Algoritms TreeBitmap, Shape Shifting Trie,...

Encoding Trie to Instructions Reducing memory by efficient encoding of Trie to several types of instructions Comparison of subtree and TreeBitmap Pipelined architecture, 100 Gbps throughput Memory allocation to pipeline stages is solved by FPGA reconfiguration Subtrees Encoded to Instructions Hardware architecture with deep pipeline PE1 PE1 PE1 DP Mem DP Mem DP Mem PE1 PE1 PE1

Encoding Trie to Instructions Analysis of memory utilization for various algorithms Significant reduction of memory utilization with encoding Trie to instructions (new nodes) Routing table can be stored in an on-chip memory

Hardware Acceleration using FPGA Packet header analysis and header fields extraction Parsing of packet headers and extraction of selected header fields Modular pipelined architecture with very low logic utilisation Flexible hardware architecture with throughput over 100 Gbps Longest prefix match or IP lookup Find longest prefix for a destination IP address in routing table (300k+ items) Significantly reduced memory requirements, forwarding table can be stored in the onchip memory Processing with pipelined hardware architecture to achieve high speed Throughput over 100Gbps even for very large routing tables Packet classification Find classification (filtering) rule for every received packet Perfect hash function with intended collisions, constant time look-up with low memory requirements Throughput over 100Gbps with only two QDR SRAM memory

Framework for Rapid Prototyping High performance scalable framework for rapid development of FPGA applications and rapid prototyping Wire speed packet capture, very high speed DMA transfers over PCI Express 10 Gbps 100 Gbps Network Interface 0 Network Interface 1 Application Core Host Interface PCI Express PCI Express with FPGA, memories and network interfaces

High Speed Probe for Lawful Interc. Hardware acceleration of packet filtering based on IP addresses, TCP or UDP ports and protocols Filtering Engine performs packet classification and LPM Designed for 100 Gbps, implemented for two 10 Gbps port Host Computer PCI Express x8 10 Gbps CPU Core 0 Core 1 Core n Data export to mediation device or LEA The probe was designed for Ministry of Interior

Contracted Research CESNET (Volume: 550 thous. CZK) Contract with Tools for Monitoring and Configuration department Research and development of new hardware architectures for 100Gb networks New anomaly and intrusion detection algorithms and systems for high speed networks CZ.NIC (Volume: 158 thous. CZK) Cooperation on Turris project (www.turris.cz) Development of small embedded router The goal of the project is to protect user's home network Honeywell (Volume: 1950 thous. CZK) Contract with Automation and Control Solutions division Design and implementation of Intelligent thermoregulator

Cooperation and Technology Utilization Cooperation with academic institution Stanford University University of Pisa Computer Laboratory Czech NREN Deployment of technology using spin-off company INVEA-TECH

Conclusions Hardware acceleration in computer networks is necessary for many applications network security, network monitoring and lawful interception, precise packet generator, etc. We focus on time-critical operations and hardware acceleration for 40 and 100 Gbps networks Packet header analysis and header fields extraction (over 100Gbps) Longest prefix matching with representation of trie by instructions (over 100Gbps) Packet classification with perfect hash crossproduct algorithm (over 100Gbps) Most of the technology have been transferred to INVEA-TECH company which is Brno University of Technology spin-off We cooperate on various applications with CESNET, CZ.NIC, INVEA-TECH and Ministry of Interior (Czech Police) Submitted Cyber Security Competence centre project (CyberSec)

Thank you for your attention! Brno University of Technology Faculty of Information Technology korenek@fit.vutbr.cz