Network stack specialization for performance
|
|
- Helen McDaniel
- 5 years ago
- Views:
Transcription
1 Network stack specialization for performance goo.gl/1la2u6 Ilias Marinos, Robert N.M. Watson, Mark Handley* University of Cambridge, * University College London
2 Motivation Providers are scaling out rapidly. Key aspects: 1 machine:n functions N machines:1 function Performance is critical Scalability on multicore systems Cost & energy concerns
3 Motivation Providers are scaling out rapidly. Key aspects: 1 machine:n functions N machines:1 function Performance is critical Scalability on multicore systems Cost & energy concerns re general-purpose stacks the right solution for that kind of role?
4 The Problem Conventional stacks are great for bulk transfers, but what about short ones?
5 The Problem 10 Network Throughput (Gbps) Throughput (Gbps) HTTP object size (K)
6 The Problem 10 Network Throughput (Gbps) CPU utilization (%) 200 Throughput (Gbps) CPU utilization (%) HTTP object size (K) 0
7 The Problem 10 Network Throughput (Gbps) CPU utilization (%) NIC saturation, Low CPU-usage 200 Throughput (Gbps) CPU utilization (%) HTTP object size (K) 0
8 Throughput/CPU ratio is low 10 The Problem Network Throughput (Gbps) CPU utilization (%) NIC saturation, Low CPU-usage 200 Throughput (Gbps) CPU utilization (%) HTTP object size (K) 0
9 Throughput/CPU ratio is low 10 The Problem Network Throughput (Gbps) CPU utilization (%) NIC saturation, Low CPU-usage 200 Throughput (Gbps) CPU utilization (%) HTTP object size (K) Short-lived HTTP flows are a problem! 0
10 Why is this important?
11 Why is this important? Distribution based on traces from Yahoo! CDN [l-fares et al 2011]
12 Why is this important? 95% of the HTTP requested object sizes 50K 90% of the HTTP requested object sizes 25K Distribution based on traces from Yahoo! CDN [l-fares et al 2011]
13 Design Goals Design a network stack that: llows transparent flow of memory from NIC to the application and vice versa Reduces system costs (e.g., batching, cachelocality, lock- and sharing-free, CPU-affinity) Exploits application-specific knowledge to reduce repetitive processing costs (e.g. TCP segmentation of web objects, checksums)
14 Sandstorm: specialized webserver stack Prototyped on top of FreeSD s netmap framework: webserver web_write() tcpip_write() web_recv() tcpip_recv() libnmio: abstracting netmaprelated I/O libeth: lightweight ethernet layer libtcpip.so libeth.so libnmio.so tcpip_output() zero copy eth_output() netmap_output() tcpip_fsm() tcpip_input() eth_input() netmap_input() netmap ioctls user libtcpip: optimized TCP/IP layer application: simple HTTP server that serves static content DM memory mapped to user buffer rings TX RX syscall device driver kernel
15 Sandstorm: specialized webserver stack Key decisions (some of them): pplication & stack are merged into the same process address Static content is pre-segmented into network packets and a-priori loaded to DRM Received packet frames are processed in-place on the RX rings, w/o memory copying/buffering RX/TX packet batching greatly amortizes the system call overhead ufferless, synchronous model (no socket layer)
16 Sandstorm rchitecture (10,000ft view) app tcpip eth user nmio ix0:rx content ix0:tx NIC driver kernel
17 Sandstorm rchitecture (10,000ft view) app tcpip eth user netmap_input() nmio ix0:rx content ix0:tx NIC driver kernel
18 Sandstorm rchitecture (10,000ft view) app tcpip eth user netmap_input() nmio ix0:rx content ix0:tx NIC driver kernel
19 Sandstorm rchitecture (10,000ft view) app tcpip eth user netmap_input() nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
20 Sandstorm rchitecture (10,000ft view) app tcpip eth ether_input() netmap_input() user nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
21 Sandstorm rchitecture (10,000ft view) app tcpip eth tcpip_input() ether_input() netmap_input() user nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
22 Sandstorm rchitecture (10,000ft view) app tcpip eth tcpip_input() ether_input() TCP! FSM user netmap_input() nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
23 Sandstorm rchitecture (10,000ft view) app websrv_accept() websrv_receive() tcpip eth tcpip_input() ether_input() TCP! FSM tcpip_output() user netmap_input() nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
24 Sandstorm rchitecture (10,000ft view) app websrv_accept() websrv_receive() tcpip eth tcpip_input() ether_input() TCP! FSM tcpip_output() user netmap_input() nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
25 Sandstorm rchitecture (10,000ft view) app websrv_accept() websrv_receive() tcpip eth tcpip_input() ether_input() TCP! FSM tcpip_output() ether_output() user netmap_input() nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
26 Sandstorm rchitecture (10,000ft view) app websrv_accept() websrv_receive() tcpip eth tcpip_input() ether_input() TCP! FSM tcpip_output() ether_output() user netmap_input() nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
27 Sandstorm rchitecture (10,000ft view) app websrv_accept() websrv_receive() tcpip eth tcpip_input() ether_input() TCP! FSM tcpip_output() ether_output() user netmap_input() netmap_output() nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
28 Sandstorm rchitecture (10,000ft view) app websrv_accept() websrv_receive() tcpip eth tcpip_input() ether_input() TCP! FSM tcpip_output() ether_output() user netmap_input() netmap_output() nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
29 Sandstorm rchitecture (10,000ft view) app websrv_accept() websrv_receive() tcpip eth tcpip_input() ether_input() TCP! FSM tcpip_output() ether_output() user netmap_input() netmap_output() nmio POLLIN ix0:rx content ix0:tx NIC driver kernel
30 Sandstorm rchitecture (10,000ft view) app websrv_accept() websrv_receive() tcpip eth tcpip_input() ether_input() TCP! FSM tcpip_output() ether_output() user netmap_input() netmap_output() nmio POLLIN ix0:rx content ix0:tx POLLOUT NIC driver kernel
31 Evaluation Throughput - 6NICs (Gbps) nginx+freesd nginx+linux Sandstorm HTTP Object Size (K)
32 Evaluation Throughput - 6NICs (Gbps) nginx+freesd nginx+linux Sandstorm ~1.8x ~3.6x ~9.8x HTTP Object Size (K)
33 Evaluation Throughput - 6NICs (Gbps) nginx+freesd nginx+linux Sandstorm ~9.8x ~3.6x ~1.8x HTTP Object Size (K) Start converging for sizes 256K
34 To copy or not to copy? memcpy zerocopy /* Get src and destination slots */ struct netmap_slot *bf = &ppool->slot[slotindex]; struct netmap_slot *tx = &txring->slot[cur];! /* zero-copy packet */ tx->buf_idx = bf->buf_idx; tx->len = bf->len; tx->flags = NS_UF_CHNGED; OR /* Get source and destination bufs */ char *srcp = NETMP_UF(ppool, bf->buf_idx); char *dstp = NETMP_UF(txring, tx->buf_idx);! /* memcpy packet */ memcpy(dstp, srcp, bf->len); tx->len = bf->len; n n TX TX
35 To copy or not to copy? 10 Throughput (Gbps) Sandstorm zerocopy Sandstorm memcpy Intel Core 2 (2006) Serving a 24K HTTP object
36 To copy or not to copy? 10 Throughput (Gbps) % 0 Sandstorm zerocopy Sandstorm memcpy Intel Core 2 (2006) Serving a 24K HTTP object
37 To copy or not to copy? 10 Throughput (Gbps) ? = 0 Sandstorm zerocopy Sandstorm memcpy Intel Sandybridge (2013) Serving a 24K HTTP object
38 CPU microarchitecture ~2006 C C C C L 2 L 2 FS Memory Controller Hub DM engine PCIe PCIe
39 CPU microarchitecture ~2006 C C C C L 2 L 2 FS Memory Controller Hub DM engine PCIe PCIe
40 CPU microarchitecture ~2006 C C C C L 2 L 2 FS Memory Controller Hub DM engine Raise interrupt PCIe PCIe
41 CPU microarchitecture ~2006 C C C C L 2 L 2 FS Memory Controller Hub DM engine Raise interrupt PCIe PCIe
42 CPU microarchitecture ~2006 C C C C L 2 L 2 ottleneck FS Memory Controller Hub DM engine Extra detour to RM Raise interrupt PCIe PCIe
43 CPU microarchitecture ~2013 C C C C LLC MC PCIe PCIe
44 CPU microarchitecture ~2013 C C C C LLC MC PCIe PCIe
45 CPU microarchitecture ~2013 C C C C LLC MC Raise interrupt PCIe PCIe
46 CPU microarchitecture ~2013 C C C C LLC MC Raise interrupt PCIe PCIe
47 CPU microarchitecture ~2013 C C C C Eventual eviction from LLC LLC MC Raise interrupt PCIe PCIe
48 CPU microarchitecture ~2013 C C C C Eventual eviction from LLC LLC MC Raise interrupt PCIe PCIe No extra detours to DRM No FS bottleneck
49 CPU microarchitecture ~2013 C C C C Eventual eviction from LLC LLC MC Raise interrupt PCIe PCIe No extra detours to DRM No FS bottleneck?? LLC utilization ( thrashing?)
50 HW/SW Intersection Should HW architecture evolution be considered a Mem Read Throughput 6NICs (Gbps) black box for networked systems development? Sandstorm "zerocopy" Object Size (K) Sandstorm "memcpy" Lower is better
51 Generality of Specialization Natural fit for: Web & DNS servers (Sandstorm, Namestorm check our paper) In-memory Key-Value stores RPC-based services Rate-adaptive video streaming applications (with MPEG-DSH or pple HLS)
52 Generality of Specialization Natural fit for: Web & DNS servers (Sandstorm, Namestorm check our paper) In-memory Key-Value stores RPC-based services Rate-adaptive video streaming applications (with MPEG-DSH or pple HLS) Limitations:! Possibly not a good fit for CPU- and/or filesystem-intensive applications locking in application-layer cannot be tolerated!
53 Conclusions General-purpose stacks:! Great for bulk transfers, bad for short ones (but web is dominated by small-sized objects!) Picked a lot of generality in favor of flexibility (we don t need it for application-specific clusters) Hard to tune/profile/debug
54 Conclusions General-purpose stacks:! Great for bulk transfers, bad for short ones (but web is dominated by small-sized objects!) Picked a lot of generality in favor of flexibility (we don t need it for application-specific clusters) Hard to tune/profile/debug Specialized stacks:! 2-10x throughput improvement for web, 9x for DNS Linear scaling on multicore systems Low CPU utilization
55 Conclusions General-purpose stacks:! Great for bulk transfers, bad for short ones (but web is dominated by small-sized objects!) Picked a lot of generality in favor of flexibility (we don t need it for application-specific clusters) Hard to tune/profile/debug Specialized stacks:! 2-10x throughput improvement for web, 9x for DNS Linear scaling on multicore systems Low CPU utilization Specialized network stacks not only viable, but necessary!
56 ackup Slides
57 Supported TCP features Follows RFC 793, with Reno congestion control Limitations: Support of the required TCP subset to serve incoming connections (not initiating them) TCP reordering not supported (not needed with typical HTTP requests)
58 Latency vg. Latency (μs) Sandstorm Linux+nginx FreeSD+nginx # Concurrent Connections Serving a 24K object
59 Overview Problems with generalpurpose stacks: System-call overhead Shared accept-queue, PC locks Cache-unfriendly due to async. design Memory-related overhead (e.g., mbuf alloc./copying) Solutions with specialized stacks:! Packet batching Share- & Lock-free design, per-core state Process-to-completion, cache-friendly, incr. cksum Pre-packetization, no memory copying/buffering
Advanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationL41 - Lecture 5: The Network Stack (1)
L41 - Lecture 5: The Network Stack (1) Dr Robert N. M. Watson 27 April 2015 Dr Robert N. M. Watson L41 - Lecture 5: The Network Stack (1) 27 April 2015 1 / 19 Introduction Reminder: where we left off in
More informationSpeeding up Linux TCP/IP with a Fast Packet I/O Framework
Speeding up Linux TCP/IP with a Fast Packet I/O Framework Michio Honda Advanced Technology Group, NetApp michio@netapp.com With acknowledge to Kenichi Yasukata, Douglas Santry and Lars Eggert 1 Motivation
More informationEvolution of the netmap architecture
L < > T H local Evolution of the netmap architecture Evolution of the netmap architecture -- Page 1/21 Evolution of the netmap architecture Luigi Rizzo, Università di Pisa http://info.iet.unipi.it/~luigi/vale/
More informationSoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet
SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet Mao Miao, Fengyuan Ren, Xiaohui Luo, Jing Xie, Qingkai Meng, Wenxue Cheng Dept. of Computer Science and Technology, Tsinghua
More informationThe Power of Batching in the Click Modular Router
The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering
More informationIsoStack Highly Efficient Network Processing on Dedicated Cores
IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges
More informationPASTE: A Networking API for Non-Volatile Main Memory
PASTE: A Networking API for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Lars Eggert (NetApp) Douglas Santry (NetApp) TSVAREA@IETF 99, Prague May 22th 2017 More details at our HotNets
More informationThe Network Stack (1)
The Network Stack (1) L41 Lecture 5 Dr Robert N. M. Watson 25 January 2017 Reminder: where we left off last term Long, long ago, but in a galaxy not so far away: Lecture 3: The Process Model (1) Lecture
More informationXen Network I/O Performance Analysis and Opportunities for Improvement
Xen Network I/O Performance Analysis and Opportunities for Improvement J. Renato Santos G. (John) Janakiraman Yoshio Turner HP Labs Xen Summit April 17-18, 27 23 Hewlett-Packard Development Company, L.P.
More informationMegaPipe: A New Programming Interface for Scalable Network I/O
MegaPipe: A New Programming Interface for Scalable Network I/O Sangjin Han in collabora=on with Sco? Marshall Byung- Gon Chun Sylvia Ratnasamy University of California, Berkeley Yahoo! Research tl;dr?
More informationPASTE: A Network Programming Interface for Non-Volatile Main Memory
PASTE: A Network Programming Interface for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Giuseppe Lettieri (Università di Pisa) Lars Eggert and Douglas Santry (NetApp) USENIX NSDI 2018
More informationImpact of Cache Coherence Protocols on the Processing of Network Traffic
Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background
More informationHKG net_mdev: Fast-path userspace I/O. Ilias Apalodimas Mykyta Iziumtsev François-Frédéric Ozog
HKG18-110 net_mdev: Fast-path userspace I/O Ilias Apalodimas Mykyta Iziumtsev François-Frédéric Ozog Why userland I/O Time sensitive networking Developed mostly for Industrial IOT, automotive and audio/video
More informationSoftware Routers: NetMap
Software Routers: NetMap Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking October 8, 2014 Slides from the NetMap: A Novel Framework for
More informationAn FPGA-Based Optical IOH Architecture for Embedded System
An FPGA-Based Optical IOH Architecture for Embedded System Saravana.S Assistant Professor, Bharath University, Chennai 600073, India Abstract Data traffic has tremendously increased and is still increasing
More informationLearning with Purpose
Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts
More informationReducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet
Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems
More informationNTRDMA v0.1. An Open Source Driver for PCIe NTB and DMA. Allen Hubbe at Linux Piter 2015 NTRDMA. Messaging App. IB Verbs. dmaengine.h ntb.
Messaging App IB Verbs NTRDMA dmaengine.h ntb.h DMA DMA DMA NTRDMA v0.1 An Open Source Driver for PCIe and DMA Allen Hubbe at Linux Piter 2015 1 INTRODUCTION Allen Hubbe Senior Software Engineer EMC Corporation
More informationHigh bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK
High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK [r.tasker@dl.ac.uk] DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459
More informationToward MP-safe Networking in NetBSD
Toward MP-safe Networking in NetBSD Ryota Ozaki Kengo Nakahara EuroBSDcon 2016 2016-09-25 Contents Background and goals Approach Current status MP-safe Layer 3
More informationThe benefits and costs of writing a POSIX kernel in a high-level language
1 / 38 The benefits and costs of writing a POSIX kernel in a high-level language Cody Cutler, M. Frans Kaashoek, Robert T. Morris MIT CSAIL Should we use high-level languages to build OS kernels? 2 / 38
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationPASTE: Fast End System Networking with netmap
PASTE: Fast End System Networking with netmap Michio Honda, Giuseppe Lettieri, Lars Eggert and Douglas Santry BSDCan 2018 Contact: @michioh, micchie@sfc.wide.ad.jp Code: https://github.com/micchie/netmap/tree/stack
More informationWORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS
WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES ON BIG AND SMALL SERVER PLATFORMS Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium
More informationMuch Faster Networking
Much Faster Networking David Riddoch driddoch@solarflare.com Copyright 2016 Solarflare Communications, Inc. All rights reserved. What is kernel bypass? The standard receive path The standard receive path
More informationTo Grant or Not to Grant
To Grant or Not to Grant (for the case of Xen network drivers) João Martins Principal Software Engineer Virtualization Team July 11, 2017 Safe Harbor Statement The following is intended to outline our
More informationVALE: a switched ethernet for virtual machines
L < > T H local VALE VALE -- Page 1/23 VALE: a switched ethernet for virtual machines Luigi Rizzo, Giuseppe Lettieri Università di Pisa http://info.iet.unipi.it/~luigi/vale/ Motivation Make sw packet processing
More information6.9. Communicating to the Outside World: Cluster Networking
6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and
More informationHammer Slide: Work- and CPU-efficient Streaming Window Aggregation
Large-Scale Data & Systems Group Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, Holger Pirk Large-Scale Data & Systems (LSDS)
More informationOptimizing TCP Receive Performance
Optimizing TCP Receive Performance Aravind Menon and Willy Zwaenepoel School of Computer and Communication Sciences EPFL Abstract The performance of receive side TCP processing has traditionally been dominated
More informationNo Tradeoff Low Latency + High Efficiency
No Tradeoff Low Latency + High Efficiency Christos Kozyrakis http://mast.stanford.edu Latency-critical Applications A growing class of online workloads Search, social networking, software-as-service (SaaS),
More informationTolerating Malicious Drivers in Linux. Silas Boyd-Wickizer and Nickolai Zeldovich
XXX Tolerating Malicious Drivers in Linux Silas Boyd-Wickizer and Nickolai Zeldovich How could a device driver be malicious? Today's device drivers are highly privileged Write kernel memory, allocate memory,...
More informationHigh Performance Packet Processing with FlexNIC
High Performance Packet Processing with FlexNIC Antoine Kaufmann, Naveen Kr. Sharma Thomas Anderson, Arvind Krishnamurthy University of Washington Simon Peter The University of Texas at Austin Ethernet
More informationCSCI-GA Operating Systems. Networking. Hubertus Franke
CSCI-GA.2250-001 Operating Systems Networking Hubertus Franke frankeh@cs.nyu.edu Source: Ganesh Sittampalam NYU TCP/IP protocol family IP : Internet Protocol UDP : User Datagram Protocol RTP, traceroute
More informationMaster s Thesis (Academic Year 2015) Improving TCP/IP stack performance by fast packet I/O framework
Master s Thesis (Academic Year 2015) Improving TCP/IP stack performance by fast packet I/O framework Keio University Graduate School of Media and Governance Kenichi Yasukata Master s Thesis Academic Year
More informationHIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS
HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Moontae Lee (Nov 20, 2014) Part 1 Overview 00 Background User-level Networking (U-Net) Remote Direct Memory Access
More informationCase Study: Using System Tracing to Improve Packet Forwarding Performance
Case Study: Using System Tracing to Improve Packet Forwarding Performance Sebastien Marineau-Mes, Senior Networking Architect, sebastien@qnx.com Abstract Symmetric multiprocessing (SMP) can offer enormous
More informationLow-Latency Datacenters. John Ousterhout Platform Lab Retreat May 29, 2015
Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015 Datacenters: Scale and Latency Scale: 1M+ cores 1-10 PB memory 200 PB disk storage Latency: < 0.5 µs speed-of-light delay Most
More informationTCP Tuning for the Web
TCP Tuning for the Web Jason Cook - @macros - jason@fastly.com Me Co-founder and Operations at Fastly Former Operations Engineer at Wikia Lots of Sysadmin and Linux consulting The Goal Make the best use
More informationMessage Passing Architecture in Intra-Cluster Communication
CS213 Message Passing Architecture in Intra-Cluster Communication Xiao Zhang Lamxi Bhuyan @cs.ucr.edu February 8, 2004 UC Riverside Slide 1 CS213 Outline 1 Kernel-based Message Passing
More informationNetchannel 2: Optimizing Network Performance
Netchannel 2: Optimizing Network Performance J. Renato Santos +, G. (John) Janakiraman + Yoshio Turner +, Ian Pratt * + HP Labs - * XenSource/Citrix Xen Summit Nov 14-16, 2007 2003 Hewlett-Packard Development
More informationFlexSC. Flexible System Call Scheduling with Exception-Less System Calls. Livio Soares and Michael Stumm. University of Toronto
FlexSC Flexible System Call Scheduling with Exception-Less System Calls Livio Soares and Michael Stumm University of Toronto Motivation The synchronous system call interface is a legacy from the single
More informationDemystifying Network Cards
Demystifying Network Cards Paul Emmerich December 27, 2017 Chair of Network Architectures and Services About me PhD student at Researching performance of software packet processing systems Mostly working
More informationModernizing NetBSD Networking Facilities and Interrupt Handling. Ryota Ozaki Kengo Nakahara
Modernizing NetBSD Networking Facilities and Interrupt Handling Ryota Ozaki Kengo Nakahara Overview of Our Work Goals 1. MP-ify NetBSD networking facilities 2.
More informationThe Network Stack (2)
The Network Stack (2) L41 Lecture 6 Dr Robert N. M. Watson 27 January 2017 Reminder: Last time Rapid tour across hardware and software: Networking and the sockets API Network-stack design principles: 1980s
More informationASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed
ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER 80 GBIT/S OVER IP USING DPDK Performance, Code, and Architecture Charles Shiflett Developer of next-generation
More informationVM and I/O. IO-Lite: A Unified I/O Buffering and Caching System. Vivek S. Pai, Peter Druschel, Willy Zwaenepoel
VM and I/O IO-Lite: A Unified I/O Buffering and Caching System Vivek S. Pai, Peter Druschel, Willy Zwaenepoel Software Prefetching and Caching for TLBs Kavita Bala, M. Frans Kaashoek, William E. Weihl
More informationSEDA: An Architecture for Well-Conditioned, Scalable Internet Services
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles
More informationPacketShader: A GPU-Accelerated Software Router
PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,
More informationSybase Adaptive Server Enterprise on Linux
Sybase Adaptive Server Enterprise on Linux A Technical White Paper May 2003 Information Anywhere EXECUTIVE OVERVIEW ARCHITECTURE OF ASE Dynamic Performance Security Mission-Critical Computing Advanced
More informationCSCI-1680 Link Layer Wrap-Up Rodrigo Fonseca
CSCI-1680 Link Layer Wrap-Up Rodrigo Fonseca Based partly on lecture notes by David Mazières, Phil Levis, John Janno< Administrivia Homework I out later today, due next Thursday, Sep 25th Today: Link Layer
More informationLight: A Scalable, High-performance and Fully-compatible User-level TCP Stack. Dan Li ( 李丹 ) Tsinghua University
Light: A Scalable, High-performance and Fully-compatible User-level TCP Stack Dan Li ( 李丹 ) Tsinghua University Data Center Network Performance Hardware Capability of Modern Servers Multi-core CPU Kernel
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationFreeBSD Network Stack Optimizations for Modern Hardware
FreeBSD Network Stack Optimizations for Modern Hardware Robert N. M. Watson FreeBSD Foundation EuroBSDCon 2008 Introduction Hardware and operating system changes TCP input and output paths Hardware offload
More informationWhat s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1
What s New in VMware vsphere 4.1 Performance VMware vsphere 4.1 T E C H N I C A L W H I T E P A P E R Table of Contents Scalability enhancements....................................................................
More informationData Path acceleration techniques in a NFV world
Data Path acceleration techniques in a NFV world Mohanraj Venkatachalam, Purnendu Ghosh Abstract NFV is a revolutionary approach offering greater flexibility and scalability in the deployment of virtual
More informationStackMap: Low-Latency Networking with the OS Stack and Dedicated NICs
StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs Kenichi Yasukata 1, Michio Honda 2, Douglas Santry 2, and Lars Eggert 2 1 Keio University 2 NetApp Abstract StackMap leverages the
More informationASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed
ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER Aspera FASP Data Transfer at 80 Gbps Elimina8ng tradi8onal bo
More informationFlexNIC: Rethinking Network DMA
FlexNIC: Rethinking Network DMA Antoine Kaufmann Simon Peter Tom Anderson Arvind Krishnamurthy University of Washington HotOS 2015 Networks: Fast and Growing Faster 1 T 400 GbE Ethernet Bandwidth [bits/s]
More informationPerformance Evaluation of Myrinet-based Network Router
Performance Evaluation of Myrinet-based Network Router Information and Communications University 2001. 1. 16 Chansu Yu, Younghee Lee, Ben Lee Contents Suez : Cluster-based Router Suez Implementation Implementation
More informationTransport Layer. <protocol, local-addr,local-port,foreign-addr,foreign-port> ϒ Client uses ephemeral ports /10 Joseph Cordina 2005
Transport Layer For a connection on a host (single IP address), there exist many entry points through which there may be many-to-many connections. These are called ports. A port is a 16-bit number used
More informationNFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications
NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan
More informationDPDK Summit China 2017
Summit China 2017 Embedded Network Architecture Optimization Based on Lin Hao T1 Networks Agenda Our History What is an embedded network device Challenge to us Requirements for device today Our solution
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More informationMeasurement-based Analysis of TCP/IP Processing Requirements
Measurement-based Analysis of TCP/IP Processing Requirements Srihari Makineni Ravi Iyer Communications Technology Lab Intel Corporation {srihari.makineni, ravishankar.iyer}@intel.com Abstract With the
More informationData Center Traffic and Measurements: SoNIC
Center Traffic and Measurements: SoNIC Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and ing November 12, 2014 Slides from USENIX symposium on ed Systems
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number
More informationOptimizing Performance: Intel Network Adapters User Guide
Optimizing Performance: Intel Network Adapters User Guide Network Optimization Types When optimizing network adapter parameters (NIC), the user typically considers one of the following three conditions
More informationQuestion Score 1 / 19 2 / 19 3 / 16 4 / 29 5 / 17 Total / 100
NAME: Login name: Computer Science 461 Midterm Exam March 10, 2010 3:00-4:20pm This test has five (5) questions. Put your name on every page, and write out and sign the Honor Code pledge before turning
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationEnabling Fast, Dynamic Network Processing with ClickOS
Enabling Fast, Dynamic Network Processing with ClickOS Joao Martins*, Mohamed Ahmed*, Costin Raiciu, Roberto Bifulco*, Vladimir Olteanu, Michio Honda*, Felipe Huici* * NEC Labs Europe, Heidelberg, Germany
More informationRouteBricks: Exploiting Parallelism To Scale Software Routers
outebricks: Exploiting Parallelism To Scale Software outers Mihai Dobrescu & Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, Sylvia atnasamy
More informationOn the cost of tunnel endpoint processing in overlay virtual networks
J. Weerasinghe; NVSDN2014, London; 8 th December 2014 On the cost of tunnel endpoint processing in overlay virtual networks J. Weerasinghe & F. Abel IBM Research Zurich Laboratory Outline Motivation Overlay
More informationCSCI-1680 Link Layer Wrap-Up Rodrigo Fonseca
CSCI-1680 Link Layer Wrap-Up Rodrigo Fonseca Based partly on lecture notes by David Mazières, Phil Levis, John Jannotti Administrivia Homework I out later today, due next Thursday Today: Link Layer (cont.)
More informationUsing (Suricata over) PF_RING for NIC-Independent Acceleration
Using (Suricata over) PF_RING for NIC-Independent Acceleration Luca Deri Alfredo Cardigliano Outlook About ntop. Introduction to PF_RING. Integrating PF_RING with
More informationDPDK Tunneling Offload RONY EFRAIM & YONGSEOK KOH MELLANOX
x DPDK Tunneling Offload RONY EFRAIM & YONGSEOK KOH MELLANOX Rony Efraim Introduction to DC w/ overlay network Modern data center (DC) use overly network like Virtual Extensible LAN (VXLAN) and GENEVE
More informationDeveloping deterministic networking technology for railway applications using TTEthernet software-based end systems
Developing deterministic networking technology for railway applications using TTEthernet software-based end systems Project n 100021 Astrit Ademaj, TTTech Computertechnik AG Outline GENESYS requirements
More informationMiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces
MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces Hye-Churn Jang Hyun-Wook (Jin) Jin Department of Computer Science and Engineering Konkuk University Seoul, Korea {comfact,
More informationNetworks and distributed computing
Networks and distributed computing Hardware reality lots of different manufacturers of NICs network card has a fixed MAC address, e.g. 00:01:03:1C:8A:2E send packet to MAC address (max size 1500 bytes)
More informationAn Intelligent NIC Design Xin Song
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) An Intelligent NIC Design Xin Song School of Electronic and Information Engineering Tianjin Vocational
More informationjverbs: Java/OFED Integration for the Cloud
jverbs: Java/OFED Integration for the Cloud Authors: Bernard Metzler, Patrick Stuedi, Animesh Trivedi. IBM Research Zurich Date: 03/27/12 www.openfabrics.org 1 Motivation The commodity Cloud is Flexible
More informationReceive Livelock. Robert Grimm New York University
Receive Livelock Robert Grimm New York University The Three Questions What is the problem? What is new or different? What are the contributions and limitations? Motivation Interrupts work well when I/O
More informationLight & NOS. Dan Li Tsinghua University
Light & NOS Dan Li Tsinghua University Performance gain The Power of DPDK As claimed: 80 CPU cycles per packet Significant gain compared with Kernel! What we care more How to leverage the performance gain
More informationAccelerating Load Balancing programs using HW- Based Hints in XDP
Accelerating Load Balancing programs using HW- Based Hints in XDP PJ Waskiewicz, Network Software Engineer Neerav Parikh, Software Architect Intel Corp. Agenda Overview express Data path (XDP) Software
More informationSlides on cross- domain call and Remote Procedure Call (RPC)
Slides on cross- domain call and Remote Procedure Call (RPC) This classic paper is a good example of a microbenchmarking study. It also explains the RPC abstraction and serves as a case study of the nuts-and-bolts
More informationA Look at Intel s Dataplane Development Kit
A Look at Intel s Dataplane Development Kit Dominik Scholz Chair for Network Architectures and Services Department for Computer Science Technische Universität München June 13, 2014 Dominik Scholz: A Look
More informationBe Fast, Cheap and in Control with SwitchKV. Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level
More informationLegUp: Accelerating Memcached on Cloud FPGAs
0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are
More informationIntroduction to Ethernet Latency
Introduction to Ethernet Latency An Explanation of Latency and Latency Measurement The primary difference in the various methods of latency measurement is the point in the software stack at which the latency
More informationCommunication Networks
Communication Networks Spring 2018 Laurent Vanbever nsg.ee.ethz.ch ETH Zürich (D-ITET) April 30 2018 Materials inspired from Scott Shenker & Jennifer Rexford Last week on Communication Networks We started
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationCSCI-1680 Link Layer Wrap-Up Rodrigo Fonseca
CSCI-1680 Link Layer Wrap-Up Rodrigo Fonseca Based partly on lecture notes by David Mazières, Phil Levis, John Jannotti Today: Link Layer (cont.) Framing Reliability Error correction Sliding window Medium
More informationNetworking at the Speed of Light
Networking at the Speed of Light Dror Goldenberg VP Software Architecture MaRS Workshop April 2017 Cloud The Software Defined Data Center Resource virtualization Efficient services VM, Containers uservices
More informationScaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic
More informationThe Network Stack. Chapter Network stack functions 216 CHAPTER 21. THE NETWORK STACK
216 CHAPTER 21. THE NETWORK STACK 21.1 Network stack functions Chapter 21 The Network Stack In comparison with some other parts of OS design, networking has very little (if any) basis in formalism or algorithms
More informationDistributed Systems 27. Process Migration & Allocation
Distributed Systems 27. Process Migration & Allocation Paul Krzyzanowski pxk@cs.rutgers.edu 12/16/2011 1 Processor allocation Easy with multiprocessor systems Every processor has access to the same memory
More informationCapriccio : Scalable Threads for Internet Services
Capriccio : Scalable Threads for Internet Services - Ron von Behren &et al - University of California, Berkeley. Presented By: Rajesh Subbiah Background Each incoming request is dispatched to a separate
More information