PacketShader as a Future Internet Platform
|
|
- Lesley McBride
- 5 years ago
- Views:
Transcription
1 PacketShader as a Future Internet Platform AsiaFI Summer School Sue Moon in collaboration with: Joongi Kim, Seonggu Huh, Sangjin Han, Keon Jang, KyoungSoo Park Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab, EE, KAIST
2 Network Systems Routers are the last bastion of niche super computers Cisco CRS-1 4-slot 320Gbps chassis = $15K on ebay Cisco s CRS-3 Multishelf System = 322Tbps =$15K x 10^3 + a Cost = special-purpose ASIC chips + TCAMs + S/W suites Why? People want Huge Freaking Routers! Why do people go for large routers? Managment O/H Routing protocol scalability Hurdles for innovation Special-purpose ASICs, network processors Proprietary S/W (Cisco s IOS, Juniper s JunOS, Huawei s???) Cost 2
3 Recent Trends New functions keep added Firewalls / IDS / IPS Monitoring and measurement VPN gears VoIP Traffic engineering Switches with minimum L3 functions at the network edge 3
4 Next Generation Networking vs Future Internet Evolution vs revolution Today s problems w/ hacks and patches vs Architectural re-design to attack the problems at the roots Both approaches need to address ever-increasing traffic demand B Internet users 5B Internet users 4.6B mobile subscribers 10B mobile subscribers 1.4B connected machines 40B connected machines 800 x (10^18) bytes of info 53 x (10^21) bytes of info Notable breakthrough A new protocol that could replace IP 4
5 Challenges Ahead Design and demonstrate overarching architecture Routing + security + mobility NSF Future Internet Architecture (FIA) projects XIA NDN Mobility First Nebula NSF GENI projects Testbed building Mostly OpenFlow driven Need a platform to play with Fast but cost-effective Software routers! 5
6 Positioning of 10+ Gbps Routers Commercial competitor? Core routers with 100+ Tbps capacity? No. Edge routers with 100+ Gbps capacity with complex features? Maybe Experimental platforms? NetFGPA ServerSwitch RouteBricks ATCA-based boxes 6
7 PacketShader: A GPU-Accelerated Software Router High-performance Our prototype: 40 Gbps on a single box 7
8 Software Router Despite its name, not limited to IP routing You can implement whatever you want on it. Driven by software Flexible Friendly development environments Based on commodity hardware Cheap Fast evolution 8
9 Now 10 Gigabit NIC is a commodity From $200 $300 per port Great opportunity for software routers 9
10 Achilles Heel of Software Routers Low performance Due to CPU bottleneck Year Ref. H/W IPv4 Throughput 2008 Egi et al. Two quad-core CPUs 3.5 Gbps 2008 Enhanced SR Bolla et al RouteBricks Dobrescu et al. Two quad-core CPUs Two quad-core CPUs (2.8 GHz) 4.2 Gbps 8.7 Gbps Not capable of supporting even a single 10G port 10
11 What is PacketShader? A high-speed software router Presented at SIGCOMM Gbps on a single box Main ideas Optimizing packet I/O Utilizing GPU for packet processing Four applications IPv4, IPv6 forwarding, OpenFlow switch, IPsec gateway 11
12 WHAT IS GPU? 12
13 GPU = Graphics Processing Unit The heart of graphics cards Mainly used for real-time 3D game rendering Massively-parallel processing capacity (Ubisoft s AVARTAR, from 13
14 CPU vs. GPU CPU: Small # of super-fast cores GPU: Large # of small cores 14
15 Silicon Budget in CPU and GPU ALU Xeon X5550: 4 cores 731M transistors GTX480: 480 cores 3,200M transistors 15
16 CPU BOTTLENECK 16
17 Per-Packet CPU Cycles for 10G IPv4 + 1, = 1,800 cycles Cycles needed IPv6 Packet I/O IPv4 lookup + 1,200 1,600 Packet I/O IPv6 lookup = 2,800 IPsec + 1,200 5,400 = 6,600 Packet I/O Encryption and hashing Your budget 1,400 cycles 10G, min-sized packets, dual quad-core 2.66GHz CPUs (in x86, cycle numbers are from RouteBricks [Dobrescu09] and ours) 17
18 PacketShader Part 1: I/O Optimization 1, = 1,800 cycles Packet I/O IPv4 lookup 1, ,600 = 2,800 Packet I/O IPv6 lookup 1,200 Packet I/O + 5,400 = 6,600 Encryption and hashing Packet I/O 1,200 reduced to 200 cycles per packet Main ideas Huge packet buffer Batch processing 18
19 PacketShader Part 2: GPU Offloading Packet I/O Packet I/O Packet I/O + + IPv4 lookup 1,600 IPv6 lookup 5,400 Encryption and hashing GPU Offloading for Memory-intensive or Compute-intensive operations 19
20 OPTIMIZING PACKET I/O ENGINE 20
21 User-space Packet Processing Packet processing in kernel is bad Processing in user-space is good Kernel has higher scheduling priority; overloaded kernel may starve userlevel processes. Some CPU extensions such as MMX and SSE are not available. Buggy kernel code causes irreversible damage to the system. Rich, friendly development and debugging environment Seamless integration with 3 rd party libraries such as CUDA or OpenSSL Easy to develop virtualized data plane. But packet processing in user-space is known to be 3x times slower! Our solution: (1) batching + (2) better core-queue mapping 21
22 Inefficiencies of Linux Network Stack Compact metadata Batch processing Huge packet buffer Software prefetch CPU cycle breakdown in packet RX
23 Huge Packet Buffer eliminates per-packet buffer allocation cost Linux per-packet buffer allocation Our huge packet buffer scheme 23
24 GPU FOR PACKET PROCESSING 24
25 Advantages of GPU for Packet Processing 1. Raw computation power 2. Memory access latency 3. Memory bandwidth Comparison between Intel X5550 CPU NVIDIA GTX480 GPU 25
26 (1/3) Raw Computation Power Compute-intensive operations in software routers Hashing, encryption, pattern matching, network coding, compression, etc. GPU can help! Instructions/sec CPU: = 2.66 (GHz) 4 (# of cores) 4 (4-way superscalar) < GPU: = 1.4 (GHz) 480 (# of cores) 26
27 (2/3) Memory Access Latency Software router lots of cache misses GPU can effectively hide memory latency Cache miss Cache miss GPU core Switch to Thread 2 Switch to Thread 3 27
28 (3/3) Memory Bandwidth CPU s memory bandwidth (theoretical): 32 GB/s 28
29 (3/3) Memory Bandwidth 2. RX: RAM CPU 3. TX: CPU RAM 1. RX: NIC RAM 4. TX: RAM NIC CPU s memory bandwidth (empirical) < 25 GB/s 29
30 (3/3) Memory Bandwidth Your budget for packet processing can be less 10 GB/s 30
31 (3/3) Memory Bandwidth Your budget for packet processing can be less 10 GB/s GPU s memory bandwidth: 174GB/s 31
32 PACKETSHADER DESIGN 32
33 Scaling with Multiple Multi-Core CPUs Shader Device driver Preshader Postshader Device driver Shader 33
34 Throughput (Gbps) Results (w/ 64B packets) CPU-only CPU+GPU IPv4 IPv6 OpenFlow IPsec GPU speedup 1.4x 4.8x 2.1x 3.5x 34
35 Example 1: IPv6 forwarding Longest prefix matching on 128-bit IPv6 addresses Algorithm: binary search on hash tables [Waldvogel97] 7 hashings + 7 memory accesses Prefix length
36 Throughput (Gbps) Example 1: IPv6 forwarding CPU-only CPU+GPU Packet size (bytes) (Routing table was randomly generated with 200K entries) Bounded by motherboard IO capacity 36
37 Results Year Ref. H/W IPv4 Throughput 2008 Egi et al. Two quad-core CPUs 3.5 Gbps 2008 Enhanced SR Bolla et al. Two quad-core CPUs 4.2 Gbps Kernel 2009 RouteBricks Dobrescu et al. Two quad-core CPUs (2.8 GHz) 8.7 Gbps 2010 PacketShader (CPU-only) 2010 PacketShader (CPU+GPU) Two quad-core CPUs (2.66 GHz) Two quad-core CPUs + two GPUs 28.2 Gbps 39.2 Gbps User 37
38 PacketShader 1.0 GPU a great opportunity for fast packet processing PacketShader Optimized packet I/O + GPU acceleration scalable with # of multi-core CPUs, GPUs, and high-speed NICs Current Prototype Supports IPv4, IPv6, OpenFlow, and IPsec 40 Gbps performance on a single PC 38
39 Next Steps? 39
40 PacketShader 2.0 Control plane integration Dynamic routing protocols with Quagga or XORP Opportunistic offloading CPU at low load GPU at high load Multi-functional, modular programming environment Integration with Click? [Kohler99] 40
41 #3 Multi-functional, modular programming environment Cascading NIC NIC NIC Recv? Y N? N Y? Chunk Subchunk Filter Gathering queue Schedulable task? N Module 1 Y Module 2 Module N Send NIC NIC NIC (drop) slow path (e.g. Linux TCP/IP stack)
42 Factors behind PacketShader 1.0 Performance Batching, batching, batching! The IO engine (modified NIC driver) uses continuous huge packet buffers called chunks. The user-level process pipelines multiple chunks. The GPU processes multiple chunks in parallel. Hardware-aware optimizations No NUMA node crossing Minimized cache conflicts among multi-cores
43 Remaining Challenges 100+ Gbps speed Stateful processing Intrusion detection systems / firewalls 43
44 Review of I/O Capacity for 100+ Gbps QuickPath Interconnect (QPI) CPU socket-to-socket link for remote memory IOH-to-IOH link for I/O traffic CPU-to-IOH for CPU to peripheral connections Today s QPI link 12.8 GB/s or Gbps Sandy Bridge On recall at the moment Expected to boost performance to 60 Gbps w/o modification
45 Review of Memory B/W for 100+ Gbps For 100Gbps forwarding we need 400 Gbps in memory bandwidth + bookkeeping Current configuration triple-channel DDR3 1,333 MHz 32 GB/s per core (theoretical) and 17.9GB/s (empirical) On NUMA system More nodes Careful placement...
46 Future Work Consider other architectures AMD s APU Tilera s tiled many-core chips Inte s MIC Become a platform for all new FIA architectures Advantage over NetFPGA, ServerSwitch, ATCA solutions 46
47 For more details QUESTIONS? THANK YOU! 47
PacketShader: A GPU-Accelerated Software Router
PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,
More informationGPGPU introduction and network applications. PacketShaders, SSLShader
GPGPU introduction and network applications PacketShaders, SSLShader Agenda GPGPU Introduction Computer graphics background GPGPUs past, present and future PacketShader A GPU-Accelerated Software Router
More informationThe Power of Batching in the Click Modular Router
The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering
More informationRecent Advances in Software Router Technologies
Recent Advances in Software Router Technologies KRNET 2013 2013.6.24-25 COEX Sue Moon In collaboration with: Sangjin Han 1, Seungyeop Han 2, Seonggu Huh 3, Keon Jang 4, Joongi Kim, KyoungSoo Park 5 Advanced
More information소프트웨어기반고성능침입탐지시스템설계및구현
소프트웨어기반고성능침입탐지시스템설계및구현 KyoungSoo Park Department of Electrical Engineering, KAIST M. Asim Jamshed *, Jihyung Lee*, Sangwoo Moon*, Insu Yun *, Deokjin Kim, Sungryoul Lee, Yung Yi* Department of Electrical
More informationComputer Networks CS 552
Computer Networks CS 552 Routers Badri Nath Rutgers University badri@cs.rutgers.edu. High Speed Routers 2. Route lookups Cisco 26: 8 Gbps Cisco 246: 32 Gbps Cisco 286: 28 Gbps Power: 4.2 KW Cost: $5K Juniper
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationGASPP: A GPU- Accelerated Stateful Packet Processing Framework
GASPP: A GPU- Accelerated Stateful Packet Processing Framework Giorgos Vasiliadis, FORTH- ICS, Greece Lazaros Koromilas, FORTH- ICS, Greece Michalis Polychronakis, Columbia University, USA So5ris Ioannidis,
More informationOpenFlow Software Switch & Intel DPDK. performance analysis
OpenFlow Software Switch & Intel DPDK performance analysis Agenda Background Intel DPDK OpenFlow 1.3 implementation sketch Prototype design and setup Results Future work, optimization ideas OF 1.3 prototype
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationG-NET: Effective GPU Sharing In NFV Systems
G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science
More informationReliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!
Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015! ! My Topic for Today! Goal: a reliable longest name prefix lookup performance
More informationForwarding Architecture
Forwarding Architecture Brighten Godfrey CS 538 February 14 2018 slides 2010-2018 by Brighten Godfrey unless otherwise noted Building a fast router Partridge: 50 Gb/sec router A fast IP router well, fast
More informationIntel Workstation Technology
Intel Workstation Technology Turning Imagination Into Reality November, 2008 1 Step up your Game Real Workstations Unleash your Potential 2 Yesterday s Super Computer Today s Workstation = = #1 Super Computer
More informationNetSlices: Scalable Mul/- Core Packet Processing in User- Space
NetSlices: Scalable Mul/- Core Packet Processing in - Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee Packet Processors Essen/al for evolving networks Sophis/cated
More informationRouteBricks: Exploiting Parallelism To Scale Software Routers
outebricks: Exploiting Parallelism To Scale Software outers Mihai Dobrescu & Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, Sylvia atnasamy
More informationA 400Gbps Multi-Core Network Processor
A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,
More information100 Gbps Open-Source Software Router? It's Here. Jim Thompson, CTO, Netgate
100 Gbps Open-Source Software Router? It's Here. Jim Thompson, CTO, Netgate @gonzopancho Agenda Edge Router Use Cases Need for Speed Cost, Flexibility, Control, Evolution The Engineering Challenge Solution
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationDPDK Summit China 2017
Summit China 2017 Embedded Network Architecture Optimization Based on Lin Hao T1 Networks Agenda Our History What is an embedded network device Challenge to us Requirements for device today Our solution
More informationNBA (Network Balancing Act): A High-performance Packet Processing Framework for Heterogeneous Processors
NBA (Network Balancing Act): A High-performance Packet Processing Framework for Heterogeneous Processors Joongi Kim Keon Jang Keunhong Lee Sangwook Ma Junhyun Shim Sue Moon KAIST {joongi, keonjang, keunhong,
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationCisco Ultra Packet Core High Performance AND Features. Aeneas Dodd-Noble, Principal Engineer Daniel Walton, Director of Engineering October 18, 2018
Cisco Ultra Packet Core High Performance AND Features Aeneas Dodd-Noble, Principal Engineer Daniel Walton, Director of Engineering October 18, 2018 The World s Top Networks Rely On Cisco Ultra 90+ 300M
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationGPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP
GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP INTRODUCTION or With the exponential increase in computational power of todays hardware, the complexity of the problem
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationNext Generation Enterprise Solutions from ARM
Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the
More informationLecture 1: Gentle Introduction to GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed
More informationToward a unified architecture for LAN/WAN/WLAN/SAN switches and routers
Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers Silvano Gai 1 The sellable HPSR Seamless LAN/WLAN/SAN/WAN Network as a platform System-wide network intelligence as platform for
More informationIP Forwarding. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli
IP Forwarding CSU CS557, Spring 2018 Instructor: Lorenzo De Carli 1 Sources George Varghese, Network Algorithmics, Morgan Kauffmann, December 2004 L. De Carli, Y. Pan, A. Kumar, C. Estan, K. Sankaralingam,
More informationMiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces
MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces Hye-Churn Jang Hyun-Wook (Jin) Jin Department of Computer Science and Engineering Konkuk University Seoul, Korea {comfact,
More informationLegUp: Accelerating Memcached on Cloud FPGAs
0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are
More informationReducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet
Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems
More informationP51: High Performance Networking
P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman noa.zilberman@cl.cam.ac.uk Lent 2017/18 High Throughput Interfaces Performance Limitations So far we discussed
More informationThe Power of Batching in the Click Modular Router
The Power of Batching in the Click Modular Router Joongi Kim Seonggu Huh Keon Jang KyoungSoo Park Sue Moon Department of Computer Science, KAIST, Korea {joongi, seonggu}@an.kaist.ac.kr, sbmoon@kaist.edu
More information40 Gbps IPsec on Commodity Hardware. Jim Thompson Founder & CTO Netgate
40 Gbps IPsec on Commodity Hardware Jim Thompson Founder & CTO Netgate REALITY BEFORE 2017 1024 THE UNBREAKABLE BARRIER Packet Size $$$$ 256 ASIC 1 10 40 Throughput (Gbps) Want small packets and/or high(er)
More informationAccelerating 4G Network Performance
WHITE PAPER Accelerating 4G Network Performance OFFLOADING VIRTUALIZED EPC TRAFFIC ON AN OVS-ENABLED NETRONOME SMARTNIC NETRONOME AGILIO SMARTNICS PROVIDE A 5X INCREASE IN vepc BANDWIDTH ON THE SAME NUMBER
More informationMuch Faster Networking
Much Faster Networking David Riddoch driddoch@solarflare.com Copyright 2016 Solarflare Communications, Inc. All rights reserved. What is kernel bypass? The standard receive path The standard receive path
More informationToday s Paper. Routers forward packets. Networks and routers. EECS 262a Advanced Topics in Computer Systems Lecture 18
EECS 262a Advanced Topics in Computer Systems Lecture 18 Software outers/outebricks March 30 th, 2016 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley Slides
More informationImproving Packet Processing Performance of a Memory- Bounded Application
Improving Packet Processing Performance of a Memory- Bounded Application Jörn Schumacher CERN / University of Paderborn, Germany jorn.schumacher@cern.ch On behalf of the ATLAS FELIX Developer Team LHCb
More informationCisco ASR 1000 Series Routers Embedded Services Processors
Cisco ASR 1000 Series Routers Embedded Services Processors The Cisco ASR 1000 Series embedded services processors are based on the Cisco QuantumFlow Processor (QFP) for next-generation forwarding and queuing.
More informationToday s Paper. Routers forward packets. Networks and routers. EECS 262a Advanced Topics in Computer Systems Lecture 18
EECS 262a Advanced Topics in Computer Systems Lecture 18 Software outers/outebricks October 29 th, 2012 John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences University of
More informationPUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES
PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES Greg Hankins APRICOT 2012 2012 Brocade Communications Systems, Inc. 2012/02/28 Lookup Capacity and Forwarding
More informationMaximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD
Maximum Performance How to get it and how to avoid pitfalls Christoph Lameter, PhD cl@linux.com Performance Just push a button? Systems are optimized by default for good general performance in all areas.
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationIt's kind of fun to do the impossible with DPDK Yoshihiro Nakajima, Hirokazu Takahashi, Kunihiro Ishiguro, Koji Yamazaki NTT Labs
It's kind of fun to do the impossible with DPDK Yoshihiro Nakajima, Hirokazu Takahashi, Kunihiro Ishiguro, Koji Yamazaki NTT Labs 0 Agenda Motivation for fun Fun with Lagopus SDN switch Fun with speed
More informationEvolution of the netmap architecture
L < > T H local Evolution of the netmap architecture Evolution of the netmap architecture -- Page 1/21 Evolution of the netmap architecture Luigi Rizzo, Università di Pisa http://info.iet.unipi.it/~luigi/vale/
More information15-744: Computer Networking. Middleboxes and NFV
15-744: Computer Networking Middleboxes and NFV Middleboxes and NFV Overview of NFV Challenge of middleboxes Middlebox consolidation Outsourcing middlebox functionality Readings: Network Functions Virtualization
More informationVALE: a switched ethernet for virtual machines
L < > T H local VALE VALE -- Page 1/23 VALE: a switched ethernet for virtual machines Luigi Rizzo, Giuseppe Lettieri Università di Pisa http://info.iet.unipi.it/~luigi/vale/ Motivation Make sw packet processing
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationArrakis: The Operating System is the Control Plane
Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building
More informationAnand Raghunathan
ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.26: Example: Hardware Architecture Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014,
More informationCisco ASR 1000 Series Aggregation Services Routers: QoS Architecture and Solutions
Cisco ASR 1000 Series Aggregation Services Routers: QoS Architecture and Solutions Introduction Much more bandwidth is available now than during the times of 300-bps modems, but the same business principles
More informationIntel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances
Technology Brief Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances The world
More informationQuickSpecs. HP Z 10GbE Dual Port Module. Models
Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or
More informationCisco Nexus 9508 Switch Power and Performance
White Paper Cisco Nexus 9508 Switch Power and Performance The Cisco Nexus 9508 brings together data center switching power efficiency and forwarding performance in a high-density 40 Gigabit Ethernet form
More informationP4GPU: A Study of Mapping a P4 Program onto GPU Target
P4GPU: A Study of Mapping a P4 Program onto GPU Target Peilong Li, Tyler Alterio, Swaroop Thool and Yan Luo ACANETS Lab (http://acanets.uml.edu/) University of Massachusetts Lowell 11/18/15 University
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationECE 571 Advanced Microprocessor-Based Design Lecture 20
ECE 571 Advanced Microprocessor-Based Design Lecture 20 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 12 April 2016 Project/HW Reminder Homework #9 was posted 1 Raspberry Pi
More informationNUMA replicated pagecache for Linux
NUMA replicated pagecache for Linux Nick Piggin SuSE Labs January 27, 2008 0-0 Talk outline I will cover the following areas: Give some NUMA background information Introduce some of Linux s NUMA optimisations
More informationDecision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA
Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Weirong Jiang, Viktor K. Prasanna University of Southern California Norio Yamagaki NEC Corporation September 1, 2010 Outline
More informationNetFPGA Update at GEC4
NetFPGA Update at GEC4 http://netfpga.org/ NSF GENI Engineering Conference 4 (GEC4) March 31, 2009 John W. Lockwood http://stanford.edu/~jwlockwd/ jwlockwd@stanford.edu NSF GEC4 1 March 2009 What is the
More informationEvaluation of the Chelsio T580-CR iscsi Offload adapter
October 2016 Evaluation of the Chelsio T580-CR iscsi iscsi Offload makes a difference Executive Summary As application processing demands increase and the amount of data continues to grow, getting this
More informationAddressing the Memory Wall
Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the
More informationCSE 123A Computer Networks
CSE 123A Computer Networks Winter 2005 Lecture 8: IP Router Design Many portions courtesy Nick McKeown Overview Router basics Interconnection architecture Input Queuing Output Queuing Virtual output Queuing
More informationTopic & Scope. Content: The course gives
Topic & Scope Content: The course gives an overview of network processor cards (architectures and use) an introduction of how to program Intel IXP network processors some ideas of how to use network processors
More informationA Network-centric TCP for Interactive Video Delivery Networks (VDN)
A Network-centric TCP for Interactive Video Delivery Networks (VDN) MD Iftakharul Islam, Javed I Khan Department of Computer Science Kent State University Kent, OH 1 / 44 Outline 1 Interactive Video Network
More informationAdvanced Parallel Programming I
Advanced Parallel Programming I Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Levels of Parallelism RISC Software GmbH Johannes Kepler University
More informationRobert Jamieson. Robs Techie PP Everything in this presentation is at your own risk!
Robert Jamieson Robs Techie PP Everything in this presentation is at your own risk! PC s Today Basic Setup Hardware pointers PCI Express How will it effect you Basic Machine Setup Set the swap space Min
More informationCellSDN: Software-Defined Cellular Core networks
CellSDN: Software-Defined Cellular Core networks Xin Jin Princeton University Joint work with Li Erran Li, Laurent Vanbever, and Jennifer Rexford Cellular Core Network Architecture Base Station User Equipment
More informationRevisiting router architectures with Zipf
Revisiting router architectures with Zipf Steve Uhlig Deutsche Telekom Laboratories/TU Berlin Nadi Sarrar, Anja Feldmann Deutsche Telekom Laboratories/TU Berlin Rob Sherwood, Xin Huang Deutsche Telekom
More informationAchieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server
White Paper Achieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server Executive Summary This document describes the network I/O performance characteristics of the Cisco UCS S3260 Storage
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationPC BUILDING PRESENTED BY
PC BUILDING PRESENTED BY WHAT IS A PC General purpose Personal Computer for individual usage Macintosh 1984 WHAT IS A PC General purpose Personal Computer for individual usage IBM Personal Computer XT
More informationNFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications
NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan
More informationImproving DPDK Performance
Improving DPDK Performance Data Plane Development Kit (DPDK) was pioneered by Intel as a way to boost the speed of packet API with standard hardware. DPDK-enabled applications typically show four or more
More informationDCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture
DCS-ctrl: A Fast and Flexible ice-control Mechanism for ice-centric Server Architecture Dongup Kwon 1, Jaehyung Ahn 2, Dongju Chae 2, Mohammadamin Ajdari 2, Jaewon Lee 1, Suheon Bae 1, Youngsok Kim 1,
More informationPactron FPGA Accelerated Computing Solutions
Pactron FPGA Accelerated Computing Solutions Intel Xeon + Altera FPGA 2015 Pactron HJPC Corporation 1 Motivation for Accelerators Enhanced Performance: Accelerators compliment CPU cores to meet market
More informationEfficient and Scalable Shading for Many Lights
Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL
More informationBig Data Systems on Future Hardware. Bingsheng He NUS Computing
Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big
More information46PaQ. Dimitris Miras, Saleem Bhatti, Peter Kirstein Networks Research Group Computer Science UCL. 46PaQ AHM 2005 UKLIGHT Workshop, 19 Sep
46PaQ Dimitris Miras, Saleem Bhatti, Peter Kirstein Networks Research Group Computer Science UCL 46PaQ AHM 2005 UKLIGHT Workshop, 19 Sep 2005 1 Today s talk Overview Current Status and Results Future Work
More informationNetworking at the Speed of Light
Networking at the Speed of Light Dror Goldenberg VP Software Architecture MaRS Workshop April 2017 Cloud The Software Defined Data Center Resource virtualization Efficient services VM, Containers uservices
More informationEvaluating the Suitability of Server Network Cards for Software Routers
Evaluating the Suitability of Server Network Cards for Software Routers Maziar Manesh Katerina Argyraki Mihai Dobrescu Norbert Egi Kevin Fall Gianluca Iannaccone Eddie Kohler Sylvia Ratnasamy EPFL, UCLA,
More informationA Case Study in Optimizing GNU Radio s ATSC Flowgraph
A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%
More informationControlling Parallelism in a Multicore Software Router
Controlling Parallelism in a Multicore Software Router Mihai Dobrescu, Katerina Argyraki EPFL, Switzerland Gianluca Iannaccone, Maziar Manesh, Sylvia Ratnasamy Intel Research Labs, Berkeley ABSTRACT Software
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More information100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21
100 GBE AND BEYOND 2011 Brocade Communications Systems, Inc. Diagram courtesy of the CFP MSA. v1.4 2011/11/21 Current State of the Industry 10 Electrical Fundamental 1 st generation technology constraints
More informationDell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance
Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance A Dell Technical White Paper Dell Product Group Armando Acosta and James Pledge THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES
More informationGen-Z Memory-Driven Computing
Gen-Z Memory-Driven Computing Our vision for the future of computing Patrick Demichel Distinguished Technologist Explosive growth of data More Data Need answers FAST! Value of Analyzed Data 2005 0.1ZB
More informationGetting Real Performance from a Virtualized CCAP
Getting Real Performance from a Virtualized CCAP A Technical Paper prepared for SCTE/ISBE by Mark Szczesniak Software Architect Casa Systems, Inc. 100 Old River Road Andover, MA, 01810 978-688-6706 mark.szczesniak@casa-systems.com
More informationLecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform
More information10GE network tests with UDP. Janusz Szuba European XFEL
10GE network tests with UDP Janusz Szuba European XFEL Outline 2 Overview of initial DAQ architecture Slice test hardware specification Initial networking test results DAQ software UDP tests Summary 10GE
More informationSilicon Motion s Graphics Display SoCs
WHITE PAPER Silicon Motion s Graphics Display SoCs Enable 4K High Definition and Low Power Power and bandwidth: the twin challenges of implementing a solution for bridging any computer to any high-definition
More informationSoftware and Tools for HPE s The Machine Project
Labs Software and Tools for HPE s The Machine Project Scalable Tools Workshop Aug/1 - Aug/4, 2016 Lake Tahoe Milind Chabbi Traditional Computing Paradigm CPU DRAM CPU DRAM CPU-centric computing 2 CPU-Centric
More informationSAP High-Performance Analytic Appliance on the Cisco Unified Computing System
Solution Overview SAP High-Performance Analytic Appliance on the Cisco Unified Computing System What You Will Learn The SAP High-Performance Analytic Appliance (HANA) is a new non-intrusive hardware and
More informationThemes. The Network 1. Energy in the DC: ~15% network? Energy by Technology
Themes The Network 1 Low Power Computing David Andersen Carnegie Mellon University Last two classes: Saving power by running more slowly and sleeping more. This time: Network intro; saving power by architecting
More informationDistributed Systems. 01. Introduction. Paul Krzyzanowski. Rutgers University. Fall 2013
Distributed Systems 01. Introduction Paul Krzyzanowski Rutgers University Fall 2013 September 9, 2013 CS 417 - Paul Krzyzanowski 1 What can we do now that we could not do before? 2 Technology advances
More information