The Power of Batching in the Click Modular Router
|
|
- Dominick Sherman
- 6 years ago
- Views:
Transcription
1 The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering Dept., KAIST
2 Flexibility Software Router High flexibility and or High performance Click Modular Router (Kohler+ 2000) Modularity Next-generation software routers RouteBricks (Dobrescu+ 2009) Batching PacketShader (Han+ 2010) 1-3 Gbps 8.7 Gbps 40 Gbps Performance (single machine) 2
3 Our Goal Make the Click modular router to support multi-10g workloads. Let s combine Click + PacketShader! 3
4 The Click Modular Router Basic abstraction of processing Element := operation(packet) Element Types Packet sources and sinks Example: RX, TX, Drop Packet modifiers/validators Routing elements Queues etc. IP Router Example: 4
5 Why is Click slow? 1. Inefficient packet I/O (both kernel & user levels) 2. Limited multicore scalability 3. Function call overhead Useful computations 5
6 Our Approach 1. Inefficient packet I/O IO Batching The Packet IO engine 2. Limited multi-core scalability Multi-queue NIC support SMP threading model NUMA-aware thread affinity Per-queue IO elements Cloned router pipelines Thread pinning & static scheduler 3. Function call overhead Computation Batching Conversion of Click elements 6
7 1. Packet IO Engine Basic I/O unit: pack of multiple packets (contains from 1 to 2,048 packets) Advantage: bypassing inefficient OS networking stack RX element It receives a pack from a NIC. For compatibility, it can split a pack into individual packets. TX element If given a pack: transmits it immediately. If given a packet: queues it to a buffer pack. 7
8 2. Per-queue IO elements Replace Click s per-device IO elements. High-speed NICs support multiple queues and RSS (receive-side scaling) for multi-core scalability. Current Linux & Click does not utilize this. NIC0 NIC1 NIC2 Linux Abstraction NIC0 NIC1 NIC2 Click RX FromDevice FromDevice FromDevice RX queues Device-only abstraction is not scalable. 8
9 2. Per-queue IO elements Replace Click s per-device IO elements. Direct access to NIC queues enables multiple threads to divide the input traffic from a single NIC. NIC0 NIC1 NIC2 The Packet IO Engine RX queues Click RX FromQueue FromQueue FromQueue FromQueue FromQueue FromQueue Contention-free scaling 9
10 3. Converting Click Elements (1/2) New elements process multiple packets at a time. Original Click Elements time RX IP Router TX RX pack TX pack 10
11 3. Converting Click Elements (1/2) New elements process multiple packets at a time. New Elements time RX IP Router TX RX pack TX pack 11
12 3. Converting Click Elements (2/2) Case 1: one input and one output Iterate over packets in a pack. Case 2: one input and N outputs Split a pack into multiple packs. Input: k packets in 1 pack LookupIPRoute Prepare N buckets ( buffer packs) Output: k/n packets (on average) in N packs 12
13 Evaluation Setup offered traffic Packet generator (4x 10GbE ports) 40 Gbps routed traffic Click machine (4x 10GbE ports) All experiments: 64 B packets & IPv4 router Routing table: traces from RouteViews (~300K entries) Each machine has: 2x NUMA nodes (Intel Xeon E GHz quad-core CPUs) 12 GB of DDR2 1333MHz RAM 4x Intel 82599EB network adapters with SFP+ cables 13
14 Performance Gain Vanilla Click * (best reported before) + IO Batching, including: per-queue IO elements cloned router pipelines + IO Batching + NUMA support + IO Batching + NUMA support + Computation Batching baseline 2X 4X 3.4 Gbps 7.5 Gbps 16 Gbps 28.3 Gbps *Argyraki et al. (2008) forwarding only at the kernel-level 14
15 Throughput (Gbps) Throughput RX/TX batch size (# packets per pack) IO Batching IO Batching + NUMA IO Batching + NUMA + Computation Batching 15
16 Latency (usec) Latency RX/TX batch size (# packets per pack) IO Batching + NUMA IO Batching + NUMA + Computation Batching 16
17 Throughput (Gbps) Additional Experiment Multi-core scalability on a 12-core machine IOH Bottleneck? # used CPU cores & queues 17
18 Our Contribution Question How much performance could we get with software-based modular packet processors? (In this case, the Click modular router) Results Flexibility and performance can go together. We pushed the performance of Click from ~3 Gbps to 28.3 Gbps. Aggressive batching is essential for high performance. Not only IO batching, but also computation batching! 18
19 Future Plans Better Performance! Hybrid polling/interrupt based IO (like NAPI) Overlapping of IO and computation More complex configurations IPsec encryption/decryption, OpenFlow, and etc. Handling of pack-split problem Automatic conversion of existing Click elements GPU Integration More computation power Challenging due to different performance characteristics 19
20 Thanks! Q&A / DISCUSSION 20
21 DoubleClick BACKUP SLIDES 21
22 Throughput (Gbps) Additional Experiment II Isolated effect of computation batching The IO batch size is fixed to It uses RX only: there is no IO bottleneck IP Router (pack processing) Pack size (# packets per pack) for elements 22
23 Vanilla-only Performance? In our early experiments, vanilla Click showed Kpps at the user-level. It is far behind 14.2Mpps (10 Gbps). [Argyraki et al.] recorded 3.4 Gbps at the kernel-level. Used 16 NICs and 8 kernel threads Packet forwarding only (no route lookups) We want to keep the router in the user-level. (See the next back-up slide for the reason) 24
24 Why Packet IOE instead of XXX? High-performance reported by existing applications (PacketShader [Han+] and XIA [Han 2 +]) Alternatives: Click s netmap integration is not fast yet. (See the next slide) pcap was not better, too. 25
25 What about netmap? The same purpose: user-level packet processing + batching oriented The next release of Click will ship the native netmap support (currently beta). FromDevice(METHOD NETMAP, IFACE eth0) However, it performs badly: < 1 Gbps with 4 NICs & 4 cores. Our suspect: linear scanning of all NIC queues to preserve per-device IO elements It would be interesting to compare with psio if netmap is integrated using per-queue IO elements. 26
26 4. SMP Threading Model All CPU cores run the same cloned router pipeline. It eliminates resource contention (e.g., cache). NIC0 packet RX IP Router packet TX NIC0 NIC1 First CPU NIC1 NIC2 packet RX IP Router packet TX NIC2 NIC3 Second CPU NIC3 psio Click s pipeline psio 27
27 per-queue IO + cloned router pipelines Original Click DoubleClick Push path Pull path Intermediate NIC FromDevice ToDevice elements NIC NIC RX queues FromDevice Push path Intermediate elements ToDevice TX queues NIC (same in other CPU cores) NIC (same in other CPU cores) NIC 28
28 5. NUMA-aware Thread Affinity NUMA: non-uniform memory access RAM RAM RAM CPU0 CPU1 RAM RAM RAM NIC0,1 NIC2,3 IOH0 IOH1 Node 0 Node 1 10G port PCIe x8 QPI Thread pinning in Click s static thread scheduler We can now fix the CPU to execute a set of elements. Caveat: the configuration must be tweaked by the user. 29
29 Click s multithreading time A thread = a processing path divided by queues Push path (for processing) The first element in the path initiates the processing. At the end of each step, the current element calls the handler of the next element. Pull path (for scheduling) The last element in the path initiates the processing. The first element returns the packet it holds, or NULL otherwise. Each path can run in different processors if the queue is synchronized properly. 30
30 Why user-level? We borrow a slide from PacketShader. The same applies to the Click modular router. Packet processing in kernel is bad Kernel has higher scheduling priority; overloaded kernel may starve user-level processes. Some CPU extensions such as MMX and SSE is not available. Buggy kernel code causes irreversible damage to the system. Processing in user-space is good Rich, friendly development and debugging environment Seamless integration with 3 rd party libraries such as CUDA or OpenSSL Easy to develop virtualized data plane. 31
PacketShader as a Future Internet Platform
PacketShader as a Future Internet Platform AsiaFI Summer School 2011.8.11. Sue Moon in collaboration with: Joongi Kim, Seonggu Huh, Sangjin Han, Keon Jang, KyoungSoo Park Advanced Networking Lab, CS, KAIST
More informationPacketShader: A GPU-Accelerated Software Router
PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,
More informationThe Power of Batching in the Click Modular Router
The Power of Batching in the Click Modular Router Joongi Kim Seonggu Huh Keon Jang KyoungSoo Park Sue Moon Department of Computer Science, KAIST, Korea {joongi, seonggu}@an.kaist.ac.kr, sbmoon@kaist.edu
More informationRecent Advances in Software Router Technologies
Recent Advances in Software Router Technologies KRNET 2013 2013.6.24-25 COEX Sue Moon In collaboration with: Sangjin Han 1, Seungyeop Han 2, Seonggu Huh 3, Keon Jang 4, Joongi Kim, KyoungSoo Park 5 Advanced
More informationGPGPU introduction and network applications. PacketShaders, SSLShader
GPGPU introduction and network applications PacketShaders, SSLShader Agenda GPGPU Introduction Computer graphics background GPGPUs past, present and future PacketShader A GPU-Accelerated Software Router
More information소프트웨어기반고성능침입탐지시스템설계및구현
소프트웨어기반고성능침입탐지시스템설계및구현 KyoungSoo Park Department of Electrical Engineering, KAIST M. Asim Jamshed *, Jihyung Lee*, Sangwoo Moon*, Insu Yun *, Deokjin Kim, Sungryoul Lee, Yung Yi* Department of Electrical
More informationOpenFlow Software Switch & Intel DPDK. performance analysis
OpenFlow Software Switch & Intel DPDK performance analysis Agenda Background Intel DPDK OpenFlow 1.3 implementation sketch Prototype design and setup Results Future work, optimization ideas OF 1.3 prototype
More informationNetSlices: Scalable Mul/- Core Packet Processing in User- Space
NetSlices: Scalable Mul/- Core Packet Processing in - Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee Packet Processors Essen/al for evolving networks Sophis/cated
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationOpen Source Traffic Analyzer
Open Source Traffic Analyzer Daniel Turull June 2010 Outline 1 Introduction 2 Background study 3 Design 4 Implementation 5 Evaluation 6 Conclusions 7 Demo Outline 1 Introduction 2 Background study 3 Design
More informationPostprint.
http://www.diva-portal.org Postprint This is the accepted version of a paper presented at Architectures for Networking and Communications Systems (ANCS' 15). Citation for the original published paper:
More informationntop Users Group Meeting
ntop Users Group Meeting PF_RING Tutorial Alfredo Cardigliano Overview Introduction Installation Configuration Tuning Use cases PF_RING Open source packet processing framework for
More informationNBA (Network Balancing Act): A High-performance Packet Processing Framework for Heterogeneous Processors
NBA (Network Balancing Act): A High-performance Packet Processing Framework for Heterogeneous Processors Joongi Kim Keon Jang Keunhong Lee Sangwook Ma Junhyun Shim Sue Moon KAIST {joongi, keonjang, keunhong,
More informationGASPP: A GPU- Accelerated Stateful Packet Processing Framework
GASPP: A GPU- Accelerated Stateful Packet Processing Framework Giorgos Vasiliadis, FORTH- ICS, Greece Lazaros Koromilas, FORTH- ICS, Greece Michalis Polychronakis, Columbia University, USA So5ris Ioannidis,
More informationLearning with Purpose
Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts
More informationBESS: A Virtual Switch Tailored for NFV
BESS: A Virtual Switch Tailored for NFV Sangjin Han, Aurojit Panda, Brian Kim, Keon Jang, Joshua Reich, Saikrishna Edupuganti, Christian Maciocco, Sylvia Ratnasamy, Scott Shenker https://github.com/netsys/bess
More informationDPDK Summit China 2017
Summit China 2017 Embedded Network Architecture Optimization Based on Lin Hao T1 Networks Agenda Our History What is an embedded network device Challenge to us Requirements for device today Our solution
More informationReducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet
Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems
More informationAgilio CX 2x40GbE with OVS-TC
PERFORMANCE REPORT Agilio CX 2x4GbE with OVS-TC OVS-TC WITH AN AGILIO CX SMARTNIC CAN IMPROVE A SIMPLE L2 FORWARDING USE CASE AT LEAST 2X. WHEN SCALED TO REAL LIFE USE CASES WITH COMPLEX RULES TUNNELING
More informationHigh Performance Packet Processing with FlexNIC
High Performance Packet Processing with FlexNIC Antoine Kaufmann, Naveen Kr. Sharma Thomas Anderson, Arvind Krishnamurthy University of Washington Simon Peter The University of Texas at Austin Ethernet
More informationSpeeding up Linux TCP/IP with a Fast Packet I/O Framework
Speeding up Linux TCP/IP with a Fast Packet I/O Framework Michio Honda Advanced Technology Group, NetApp michio@netapp.com With acknowledge to Kenichi Yasukata, Douglas Santry and Lars Eggert 1 Motivation
More informationSpeeding Up IP Lookup Procedure in Software Routers by Means of Parallelization
2 Telfor Journal, Vol. 9, No. 1, 217. Speeding Up IP Lookup Procedure in Software Routers by Means of Parallelization Mihailo Vesović, Graduate Student Member, IEEE, Aleksandra Smiljanić, Member, IEEE,
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
NET1343BU NSX Performance Samuel Kommu #VMworld #NET1343BU Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no
More informationMiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces
MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces Hye-Churn Jang Hyun-Wook (Jin) Jin Department of Computer Science and Engineering Konkuk University Seoul, Korea {comfact,
More informationPVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim
PVPP: A Programmable Vector Packet Processor Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim Fixed Set of Protocols Fixed-Function Switch Chip TCP IPv4 IPv6
More informationEvaluating the Suitability of Server Network Cards for Software Routers
Evaluating the Suitability of Server Network Cards for Software Routers Maziar Manesh Katerina Argyraki Mihai Dobrescu Norbert Egi Kevin Fall Gianluca Iannaccone Eddie Kohler Sylvia Ratnasamy EPFL, UCLA,
More informationG-NET: Effective GPU Sharing In NFV Systems
G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science
More informationSoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet
SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet Mao Miao, Fengyuan Ren, Xiaohui Luo, Jing Xie, Qingkai Meng, Wenxue Cheng Dept. of Computer Science and Technology, Tsinghua
More informationSoftware Routers: NetMap
Software Routers: NetMap Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking October 8, 2014 Slides from the NetMap: A Novel Framework for
More informationDynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle
Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationMoonGen. A Scriptable High-Speed Packet Generator. Paul Emmerich. January 31st, 2016 FOSDEM Chair for Network Architectures and Services
MoonGen A Scriptable High-Speed Packet Generator Paul Emmerich January 31st, 216 FOSDEM 216 Chair for Network Architectures and Services Department of Informatics Paul Emmerich MoonGen: A Scriptable High-Speed
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges
More informationPCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate
NIC-PCIE-1SFP+-PLU PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate Flexibility and Scalability in Virtual
More informationStateless Network Functions:
Stateless Network Functions: Breaking the Tight Coupling of State and Processing Murad Kablan, Azzam Alsudais, Eric Keller, Franck Le University of Colorado IBM Networks Need Network Functions Firewall
More informationEvolution of the netmap architecture
L < > T H local Evolution of the netmap architecture Evolution of the netmap architecture -- Page 1/21 Evolution of the netmap architecture Luigi Rizzo, Università di Pisa http://info.iet.unipi.it/~luigi/vale/
More informationIsoStack Highly Efficient Network Processing on Dedicated Cores
IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single
More informationImproving Packet Processing Performance of a Memory- Bounded Application
Improving Packet Processing Performance of a Memory- Bounded Application Jörn Schumacher CERN / University of Paderborn, Germany jorn.schumacher@cern.ch On behalf of the ATLAS FELIX Developer Team LHCb
More informationWORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS
WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES ON BIG AND SMALL SERVER PLATFORMS Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium
More informationTOWARDS FAST IP FORWARDING
TOWARDS FAST IP FORWARDING IP FORWARDING PERFORMANCE IMPROVEMENT AND MEASUREMENT IN FREEBSD Nanako Momiyama Keio University 25th September 2016 EuroBSDcon 2016 OUTLINE Motivation Design and implementation
More informationNetworking at the Speed of Light
Networking at the Speed of Light Dror Goldenberg VP Software Architecture MaRS Workshop April 2017 Cloud The Software Defined Data Center Resource virtualization Efficient services VM, Containers uservices
More informationReliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!
Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015! ! My Topic for Today! Goal: a reliable longest name prefix lookup performance
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationQuickSpecs. HP Z 10GbE Dual Port Module. Models
Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or
More informationMuch Faster Networking
Much Faster Networking David Riddoch driddoch@solarflare.com Copyright 2016 Solarflare Communications, Inc. All rights reserved. What is kernel bypass? The standard receive path The standard receive path
More informationControlling Parallelism in a Multicore Software Router
Controlling Parallelism in a Multicore Software Router Mihai Dobrescu, Katerina Argyraki EPFL, Switzerland Gianluca Iannaccone, Maziar Manesh, Sylvia Ratnasamy Intel Research Labs, Berkeley ABSTRACT Software
More informationProgrammable NICs. Lecture 14, Computer Networks (198:552)
Programmable NICs Lecture 14, Computer Networks (198:552) Network Interface Cards (NICs) The physical interface between a machine and the wire Life of a transmitted packet Userspace application NIC Transport
More informationNetronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance
WHITE PAPER Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and NETRONOME AGILIO CX 25GBE SMARTNICS SIGNIFICANTLY OUTPERFORM MELLANOX CONNECTX-5 25GBE NICS UNDER HIGH-STRESS
More informationNetworking Servers made for BSD and Linux systems
Networking Servers made for BSD and Linux systems presents NETMAP L-800 high-end 1U rack networking server for mission critical operations ServerU Netmap L-800 is our best offer for an embedded network-centric
More informationP4GPU: A Study of Mapping a P4 Program onto GPU Target
P4GPU: A Study of Mapping a P4 Program onto GPU Target Peilong Li, Tyler Alterio, Swaroop Thool and Yan Luo ACANETS Lab (http://acanets.uml.edu/) University of Massachusetts Lowell 11/18/15 University
More informationData Path acceleration techniques in a NFV world
Data Path acceleration techniques in a NFV world Mohanraj Venkatachalam, Purnendu Ghosh Abstract NFV is a revolutionary approach offering greater flexibility and scalability in the deployment of virtual
More informationArrakis: The Operating System is the Control Plane
Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building
More informationProgrammable Software Switches. Lecture 11, Computer Networks (198:552)
Programmable Software Switches Lecture 11, Computer Networks (198:552) Software-Defined Network (SDN) Centralized control plane Data plane Data plane Data plane Data plane Why software switching? Early
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationRouteBricks: Exploiting Parallelism To Scale Software Routers
outebricks: Exploiting Parallelism To Scale Software outers Mihai Dobrescu & Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, Sylvia atnasamy
More information10GE network tests with UDP. Janusz Szuba European XFEL
10GE network tests with UDP Janusz Szuba European XFEL Outline 2 Overview of initial DAQ architecture Slice test hardware specification Initial networking test results DAQ software UDP tests Summary 10GE
More informationPCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)
NIC-PCIE-4SFP+-PLU PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based) Key Features Quad-port 10 GbE adapters PCI Express* (PCIe) 3.0, x8 Exceptional Low Power Adapters Network Virtualization
More informationImplementing Software Virtual Routers on Multi-core PCs using Click
Implementing Software Virtual Routers on Multi-core PCs using Click Mickaël Hoerdt, Dept. of computer engineering Université catholique de Louvain la neuve mickael.hoerdt@uclouvain.be LANCASTER UNIVERSITY
More informationCPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University
Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads
More informationImprove Performance of Kube-proxy and GTP-U using VPP
Improve Performance of Kube-proxy and GTP-U using VPP Hongjun Ni (hongjun.ni@intel.com) Danny Zhou (danny.zhou@intel.com) Johnson Li (johnson.li@intel.com) Network Platform Group, DCG, Intel Acknowledgement:
More informationHow to Build a 100 Gbps DDoS Traffic Generator
How to Build a 100 Gbps DDoS Traffic Generator DIY with a Single Commodity-off-the-shelf Server (COTS) Surasak Sanguanpong Surasak.S@ku.ac.th DISCLAIMER THE FOLLOWING CONTENTS HAS BEEN APPROVED FOR APPROPIATE
More informationImproving DPDK Performance
Improving DPDK Performance Data Plane Development Kit (DPDK) was pioneered by Intel as a way to boost the speed of packet API with standard hardware. DPDK-enabled applications typically show four or more
More informationFlexNIC: Rethinking Network DMA
FlexNIC: Rethinking Network DMA Antoine Kaufmann Simon Peter Tom Anderson Arvind Krishnamurthy University of Washington HotOS 2015 Networks: Fast and Growing Faster 1 T 400 GbE Ethernet Bandwidth [bits/s]
More informationAn Experimental review on Intel DPDK L2 Forwarding
An Experimental review on Intel DPDK L2 Forwarding Dharmanshu Johar R.V. College of Engineering, Mysore Road,Bengaluru-560059, Karnataka, India. Orcid Id: 0000-0001- 5733-7219 Dr. Minal Moharir R.V. College
More informationOperating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering
Operating System Chapter 4. Threads Lynn Choi School of Electrical Engineering Process Characteristics Resource ownership Includes a virtual address space (process image) Ownership of resources including
More informationExperiences in Building a 100 Gbps (D)DoS Traffic Generator
Experiences in Building a 100 Gbps (D)DoS Traffic Generator DIY with a Single Commodity-off-the-shelf (COTS) Server March 31, 2018 Umeda Sky Building Escalators Surasak Sanguanpong Surasak.S@ku.ac.th About
More informationMWC 2015 End to End NFV Architecture demo_
MWC 2015 End to End NFV Architecture demo_ March 2015 demonstration @ Intel booth Executive summary The goal is to demonstrate how an advanced multi-vendor implementation of the ETSI ISG NFV architecture
More informationRouteBricks: Exploi2ng Parallelism to Scale So9ware Routers
RouteBricks: Exploi2ng Parallelism to Scale So9ware Routers Mihai Dobrescu and etc. SOSP 2009 Presented by Shuyi Chen Mo2va2on Router design Performance Extensibility They are compe2ng goals Hardware approach
More informationResearch on DPDK Based High-Speed Network Traffic Analysis. Zihao Wang Network & Information Center Shanghai Jiao Tong University
Research on DPDK Based High-Speed Network Traffic Analysis Zihao Wang Network & Information Center Shanghai Jiao Tong University Outline 1 Background 2 Overview 3 DPDK Based Traffic Analysis 4 Experiment
More informationLegUp: Accelerating Memcached on Cloud FPGAs
0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are
More informationNetwork stack specialization for performance
Network stack specialization for performance goo.gl/1la2u6 Ilias Marinos, Robert N.M. Watson, Mark Handley* University of Cambridge, * University College London Motivation Providers are scaling out rapidly.
More informationVALE: a switched ethernet for virtual machines
L < > T H local VALE VALE -- Page 1/23 VALE: a switched ethernet for virtual machines Luigi Rizzo, Giuseppe Lettieri Università di Pisa http://info.iet.unipi.it/~luigi/vale/ Motivation Make sw packet processing
More informationMotivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4
Motivation Threads Chapter 4 Most modern applications are multithreaded Threads run within application Multiple tasks with the application can be implemented by separate Update display Fetch data Spell
More informationFAQ. Release rc2
FAQ Release 19.02.0-rc2 January 15, 2019 CONTENTS 1 What does EAL: map_all_hugepages(): open failed: Permission denied Cannot init memory mean? 2 2 If I want to change the number of hugepages allocated,
More informationNFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications
NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan
More informationParallelizing IPsec: switching SMP to On is not even half the way
Parallelizing IPsec: switching SMP to On is not even half the way Steffen Klassert secunet Security Networks AG Dresden June 11 2010 Table of contents Some basics about IPsec About the IPsec performance
More informationHEX Switch: Hardware-assisted security extensions of OpenFlow
HEX Switch: Hardware-assisted security extensions of OpenFlow Taejune Park / KAIST / taejune.park@kaist.ac.kr Zhaoyan Xu / StackRox Inc. / z@stackrox.com Seungwon Shin / KAIST / claude@kaist.ac.kr Software-Defined
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationCisco Ultra Packet Core High Performance AND Features. Aeneas Dodd-Noble, Principal Engineer Daniel Walton, Director of Engineering October 18, 2018
Cisco Ultra Packet Core High Performance AND Features Aeneas Dodd-Noble, Principal Engineer Daniel Walton, Director of Engineering October 18, 2018 The World s Top Networks Rely On Cisco Ultra 90+ 300M
More informationPDP : A Flexible and Programmable Data Plane. Massimo Gallo et al.
PDP : A Flexible and Programmable Data Plane Massimo Gallo et al. Introduction Network Function evolution L7 Load Balancer TLS/SSL Server Proxy Server Firewall Introduction Network Function evolution Can
More informationOvercoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics
Overcoming the Memory System Challenge in Dataflow Processing Darren Jones, Wave Computing Drew Wingard, Sonics Current Technology Limits Deep Learning Performance Deep Learning Dataflow Graph Existing
More informationSEDA: An Architecture for Well-Conditioned, Scalable Internet Services
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles
More informationNFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains
NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains Sameer G Kulkarni 1, Wei Zhang 2, Jinho Hwang 3, Shriram Rajagopalan 3, K.K. Ramakrishnan 4, Timothy Wood 2, Mayutan Arumaithurai 1 &
More informationResQ: Enabling SLOs in Network Function Virtualization
ResQ: Enabling SLOs in Network Function Virtualization Amin Tootoonchian* Aurojit Panda Chang Lan Melvin Walls Katerina Argyraki Sylvia Ratnasamy Scott Shenker *Intel Labs UC Berkeley ICSI NYU Nefeli EPFL
More informationEnabling Fast, Dynamic Network Processing with ClickOS
Enabling Fast, Dynamic Network Processing with ClickOS Joao Martins*, Mohamed Ahmed*, Costin Raiciu, Roberto Bifulco*, Vladimir Olteanu, Michio Honda*, Felipe Huici* * NEC Labs Europe, Heidelberg, Germany
More informationLEoNIDS: a Low-latency and Energyefficient Intrusion Detection System
LEoNIDS: a Low-latency and Energyefficient Intrusion Detection System Nikos Tsikoudis Thesis Supervisor: Evangelos Markatos June 2013 Heraklion, Greece Low-Power Design Low-power systems receive significant
More informationWHITE PAPER SINGLE & MULTI CORE PERFORMANCE OF AN ERASURE CODING WORKLOAD ON AMD EPYC
WHITE PAPER SINGLE & MULTI CORE PERFORMANCE OF AN ERASURE CODING WORKLOAD ON AMD EPYC INTRODUCTION With the EPYC processor line, AMD is expected to take a strong position in the server market including
More informationSuricata Extreme Performance Tuning With Incredible Courage
Suricata Extreme Performance Tuning With Incredible Courage By Michal Purzynski (@MichalPurzynski ) Threat Management, Mozilla Peter Manev (@pevma) Suricata Core Team Lead QA and training instructor Stamus
More informationPerformance Enhancement for IPsec Processing on Multi-Core Systems
Performance Enhancement for IPsec Processing on Multi-Core Systems Sandeep Malik Freescale Semiconductor India Pvt. Ltd IDC Noida, India Ravi Malhotra Freescale Semiconductor India Pvt. Ltd IDC Noida,
More informationDesign and Implementation of Virtual TAP for Software-Defined Networks
Design and Implementation of Virtual TAP for Software-Defined Networks - Master Thesis Defense - Seyeon Jeong Supervisor: Prof. James Won-Ki Hong Dept. of CSE, DPNM Lab., POSTECH, Korea jsy0906@postech.ac.kr
More informationKeeping up with the hardware
Keeping up with the hardware Challenges in scaling I/O performance Jonathan Davies XenServer System Performance Lead XenServer Engineering, Citrix Cambridge, UK 18 Aug 2015 Jonathan Davies (Citrix) Keeping
More informationWhat s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1
What s New in VMware vsphere 4.1 Performance VMware vsphere 4.1 T E C H N I C A L W H I T E P A P E R Table of Contents Scalability enhancements....................................................................
More informationFairness Issues in Software Virtual Routers
Fairness Issues in Software Virtual Routers Norbert Egi, Adam Greenhalgh, h Mark Handley, Mickael Hoerdt, Felipe Huici, Laurent Mathy Lancaster University PRESTO 2008 Presenter: Munhwan Choi Virtual Router
More informationScaling Acceleration Capacity from 5 to 50 Gbps and Beyond with Intel QuickAssist Technology
SOLUTION BRIEF Intel QuickAssist Technology Scaling Acceleration Capacity from 5 to 5 Gbps and Beyond with Intel QuickAssist Technology Equipment manufacturers can dial in the right capacity by choosing
More informationBe Fast, Cheap and in Control with SwitchKV. Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level
More informationThe Multikernel A new OS architecture for scalable multicore systems
Systems Group Department of Computer Science ETH Zurich SOSP, 12th October 2009 The Multikernel A new OS architecture for scalable multicore systems Andrew Baumann 1 Paul Barham 2 Pierre-Evariste Dagand
More informationA Case Study in Optimizing GNU Radio s ATSC Flowgraph
A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%
More informationAn FPGA-Based Optical IOH Architecture for Embedded System
An FPGA-Based Optical IOH Architecture for Embedded System Saravana.S Assistant Professor, Bharath University, Chennai 600073, India Abstract Data traffic has tremendously increased and is still increasing
More informationPEARL. Programmable Virtual Router Platform Enabling Future Internet Innovation
PEARL Programmable Virtual Router Platform Enabling Future Internet Innovation Hongtao Guan Ph.D., Assistant Professor Network Technology Research Center Institute of Computing Technology, Chinese Academy
More informationChronicle: Capture and Analysis of NFS Workloads at Line Rate
Chronicle: Capture and Analysis of NFS Workloads at Line Rate Ardalan Kangarlou, Sandip Shete, and John D. Strunk, NetApp, Inc. https://www.usenix.org/conference/fast15/technical-sessions/presentation/kangarlou
More information