Hardware Acceleration in Computer Networks
Outline Motivation for hardware acceleration Longest prefix matching using FPGA Hardware acceleration of time critical operations Framework and applications Contracted research Conclusions
Network Traffic Growth New services implies network traffic growth Sharing files at social networks Video on demand services Internet of Things (IoT) more than 50 billion devices will be connected to the Internet in 2020 Large data centres use 40Gb or 100Gb networks and call for 1 Tb networks
Why hardware acceleration? Packet rate and processing speed Time to process one packet is defined by minimal packet size 64B Packet t Packet Packet Number of clock cycles, processor 3,6 GHz 1 Gbps 500 ns ~ 1 807 CPU clock cycles 10 Gbps 50 ns ~ 181 CPU clock cycles 40 Gbps 12 ns ~ 45 CPU clock cycles 100 Gbps 5 ns ~ 18 CPU clock cycles
Performance of Processor Core Time to process one packet Line rate [b/s] Maximal Packet rate [Packets/s] CPU 3GHz [clock cycles/packet] 1G 10G 40G 100G 1,5M 15M 60M 150M 1807 181 45 18 181 clock cycles per packet even for 10G links Processor performance for time-critical operations Operation Protocol Parsing TCP Stream Reassembly Packet Classification Pattern Matching Throughput 21Mp/s 2,5Mp/s 12Mp/s 400Mb/s 1G 10G 40G 100G Results for Intel Core i7, one core at 3,6 GHz
Longest Prefix Matching Find longest prefix for IP address. Core routers have routing table with more than 300k of IP prefixes Prefix Representation Trie binary tree on IP prefixes Levels of tree (steps): 32 for IPv4 128 for IPv6 Processing of multiple bits in a single step to achieve 100 Gbps throughput Related Algoritms TreeBitmap, Shape Shifting Trie,...
Encoding Trie to Instructions Reducing memory by efficient encoding of Trie to several types of instructions Comparison of subtree and TreeBitmap Pipelined architecture, 100 Gbps throughput Memory allocation to pipeline stages is solved by FPGA reconfiguration Subtrees Encoded to Instructions Hardware architecture with deep pipeline PE1 PE1 PE1 DP Mem DP Mem DP Mem PE1 PE1 PE1
Encoding Trie to Instructions Analysis of memory utilization for various algorithms Significant reduction of memory utilization with encoding Trie to instructions (new nodes) Routing table can be stored in an on-chip memory
Hardware Acceleration using FPGA Packet header analysis and header fields extraction Parsing of packet headers and extraction of selected header fields Modular pipelined architecture with very low logic utilisation Flexible hardware architecture with throughput over 100 Gbps Longest prefix match or IP lookup Find longest prefix for a destination IP address in routing table (300k+ items) Significantly reduced memory requirements, forwarding table can be stored in the onchip memory Processing with pipelined hardware architecture to achieve high speed Throughput over 100Gbps even for very large routing tables Packet classification Find classification (filtering) rule for every received packet Perfect hash function with intended collisions, constant time look-up with low memory requirements Throughput over 100Gbps with only two QDR SRAM memory
Framework for Rapid Prototyping High performance scalable framework for rapid development of FPGA applications and rapid prototyping Wire speed packet capture, very high speed DMA transfers over PCI Express 10 Gbps 100 Gbps Network Interface 0 Network Interface 1 Application Core Host Interface PCI Express PCI Express with FPGA, memories and network interfaces
High Speed Probe for Lawful Interc. Hardware acceleration of packet filtering based on IP addresses, TCP or UDP ports and protocols Filtering Engine performs packet classification and LPM Designed for 100 Gbps, implemented for two 10 Gbps port Host Computer PCI Express x8 10 Gbps CPU Core 0 Core 1 Core n Data export to mediation device or LEA The probe was designed for Ministry of Interior
Contracted Research CESNET (Volume: 550 thous. CZK) Contract with Tools for Monitoring and Configuration department Research and development of new hardware architectures for 100Gb networks New anomaly and intrusion detection algorithms and systems for high speed networks CZ.NIC (Volume: 158 thous. CZK) Cooperation on Turris project (www.turris.cz) Development of small embedded router The goal of the project is to protect user's home network Honeywell (Volume: 1950 thous. CZK) Contract with Automation and Control Solutions division Design and implementation of Intelligent thermoregulator
Cooperation and Technology Utilization Cooperation with academic institution Stanford University University of Pisa Computer Laboratory Czech NREN Deployment of technology using spin-off company INVEA-TECH
Conclusions Hardware acceleration in computer networks is necessary for many applications network security, network monitoring and lawful interception, precise packet generator, etc. We focus on time-critical operations and hardware acceleration for 40 and 100 Gbps networks Packet header analysis and header fields extraction (over 100Gbps) Longest prefix matching with representation of trie by instructions (over 100Gbps) Packet classification with perfect hash crossproduct algorithm (over 100Gbps) Most of the technology have been transferred to INVEA-TECH company which is Brno University of Technology spin-off We cooperate on various applications with CESNET, CZ.NIC, INVEA-TECH and Ministry of Interior (Czech Police) Submitted Cyber Security Competence centre project (CyberSec)
Thank you for your attention! Brno University of Technology Faculty of Information Technology korenek@fit.vutbr.cz