Topic: A Deep Dive into Memory Access. Company: Intel Title: Software Engineer Name: Wang, Zhihong
|
|
- Sharyl Cole
- 6 years ago
- Views:
Transcription
1 Topic: A Deep Dive into Memory Access Company: Intel Title: Software Engineer Name: Wang, Zhihong
2 A Typical NFV Scenario: PVP Guest Forwarding Engine virtio vhost Forwarding Engine NIC Ring ops What s actually going on? memcpy DDIO Guest virtio virtio RX TX Shared Memory vhost TX NIC RX vhost RX NIC TX Ring ops memcpy DDIO
3 Overview of Memory System 0 1 N-1 N N+1 CPU 0 CPU 1 2N-1 MESIF protocol Memory 0 Memory 1
4 Overview of Memory System (cont d) AVX 1 to maximize bandwidth 0 Load 2 1 N-1 N N+1 CPU 0 Haswell cache parameters 3 CPU 1 Line Fastest Cache level size latency 5 (Bytes) (Cycle) 4 6 L1D L ~34 Varies Capacity Peak bandwidth (KB) (Bytes/cycle) Memory L2 and L1D 0 in Memory 1 other cores 64 2N-1 64 (Load) + 32 (Store)
5 Let s Do It! Guest Forwarding Engine virtio vhost Forwarding Engine NIC Target for our analysis Ring ops Where the data flows memcpy DDIO Guest virtio virtio RX TX Shared Memory vhost TX NIC RX vhost RX NIC TX Ring ops memcpy DDIO
6 First Impression Guest -> Guest -> Guest N-1 N N+1 CPU 0 CPU 1 2N-1 FWD RX from NIC TX to vhost RX from vhost TX to NIC Memory 0 Memory 1 VM FWD RX from virtio TX to virtio
7 Unexpectedly Guest -> -> L1 Cross-core copies?? Guest CPU 0 -> Guest N-1 First try Notice CPU cycle measurement disturbs overall performance Memory 0
8 Under The Hood Guest -> Guest updates ring only, doesn t touch the data Guest N-1 N N+1 CPU 0 -> Guest CPU 1 2N-1 Data locality in cache: Who operates the data Memory 0 Memory 1
9 Guest Read The Packet Guest -> Guest R N-1 N N+1 CPU 0 CPU 1 -> Guest FWD RX from NIC TX to vhost RX from vhost TX to NIC Memory 0 Memory 1 2N-1 VM FWD RX from virtio Read packet TX to virtio
10 Still Doesn t Feel Right Guest -> No change?? Guest R CPU 0 N-1 -> Guest Memory 0 Guest read packet
11 Under The Hood Guest -> Cache line can be shared when no modification Guest R N-1 N N+1 CPU 0 CPU 1 -> Guest FWD RX from NIC TX to vhost RX from vhost TX to NIC Memory 0 Memory 1 2N-1 VM FWD RX from virtio Read packet TX to virtio
12 Guest Edit The Packet Guest -> Guest M N-1 N N+1 CPU 0 CPU 1 -> Guest FWD RX from NIC TX to vhost Memory 0 RX from vhost Memory 1 TX to NIC 2N-1 VM FWD RX from virtio Edit packet TX to virtio
13 Write-back Guest -> No change? Cross-core copies Guest M CPU 0 N-1 -> Guest Memory 0 Guest edit packet
14 Go See Some C Code desc_addr = gpa_to_vva(dev, desc->addr); rte_prefetch0((void *)(uintptr_t)desc_addr); Oh I see S/W Prefetching to reduce latency rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset), (void *)((uintptr_t)(desc_addr + desc_offset)), cpy_len);
15 Without S/W Prefetching Guest -> Now I understand Bring it back right away! Guest M CPU 0 N-1 -> Guest Memory 0 Guest edit packet; No prefetching
16 How About Guest In Another Guest -> Node? Guest 1 N-1 M N+1 CPU 0 CPU 1 -> Guest FWD RX from NIC TX to vhost Memory 0 RX from vhost Memory 1 TX to NIC 2N-1 VM FWD RX from virtio Edit packet TX to virtio
17 Better NOT Keep related processes on the same node Guest edit packet; No prefetching
18 rte_memcpy()? Why Even Bother? Warm copy DPDK s scenario AVX load/store Alignment handling Guest edit packet; Guest on the same node
19 AVX For Bandwidth xmm0 = _mm_loadu_si128(src); _mm_storeu_si128(dst, xmm0); ymm0 = _mm256_loadu_si256(src); _mm256_storeu_si256(dst, ymm0); 2x peak bandwidth + 40% + 53% AVX512 is coming Guest read packet; Guest on the same node
20 Alignment Matters rte_memcpy((void *)((uint8_t *)dst + 1), src, len - 1); rte_memcpy(dst, src, len); Just like coupons FREE gifts if you use them + 17% + 15% Guest read packet; Guest on the same node
21 Takeaways See actual memory behaviors under the hood Intel 64 and IA-32 Architectures Optimization Reference Manual Benefit from new IA technologies AVX, DDIO
22
23 Cache Allocation Technology 0 1 N-1 N N+1 CPU 0 CPU 1 2N-1 CMT + CAT Noisy neighbor: One Memory core is 0 requesting huge amount of data Memory 1 What if another HIGH priority core is very latency sensitive?
RDMA-like VirtIO Network Device for Palacios Virtual Machines
RDMA-like VirtIO Network Device for Palacios Virtual Machines Kevin Pedretti UNM ID: 101511969 CS-591 Special Topics in Virtualization May 10, 2012 Abstract This project developed an RDMA-like VirtIO network
More informationAchieve Low Latency NFV with Openstack*
Achieve Low Latency NFV with Openstack* Yunhong Jiang Yunhong.Jiang@intel.com *Other names and brands may be claimed as the property of others. Agenda NFV and network latency Why network latency on NFV
More informationKVM as The NFV Hypervisor
KVM as The NFV Hypervisor Jun Nakajima Contributors: Mesut Ergin, Yunhong Jiang, Krishna Murthy, James Tsai, Wei Wang, Huawei Xie, Yang Zhang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED
More informationUnderstanding The Performance of DPDK as a Computer Architect
Understanding The Performance of DPDK as a Computer Architect XIAOBAN WU *, PEILONG LI *, YAN LUO *, LIANG- MIN (LARRY) WANG +, MARC PEPIN +, AND JOHN MORGAN + * UNIVERSITY OF MASSACHUSETTS LOWELL + INTEL
More informationVirtio/vhost status update
Virtio/vhost status update Yuanhan Liu Aug 2016 outline Performance Multiple Queue Vhost TSO Functionality/Stability Live migration Reconnect Vhost PMD Todo Vhost-pci Vhost Tx
More informationvswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018
x vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018 Current Network Solution for Virtualization Control Plane Control Plane virtio virtio user space PF VF2 user space TAP1 SW Datapath
More informationThe dark powers on Intel processor boards
The dark powers on Intel processor boards Processing Resources (3U VPX) Boards with Multicore CPUs: Up to 16 cores using Intel Xeon D-1577 on TR C4x/msd Boards with 4-Core CPUs and Multiple Graphical Execution
More informationNext Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao
Next Gen Virtual Switch CloudNetEngine Founder & CTO Jun Xiao Agenda Thoughts on next generation virtual switch Technical deep dive on CloudNetEngine virtual switch Q & A 2 Major vswitches categorized
More informationNew Approach to OVS Datapath Performance. Founder of CloudNetEngine Jun Xiao
New Approach to OVS Datapath Performance Founder of CloudNetEngine Jun Xiao Agenda VM virtual network datapath evolvement Technical deep dive on a new OVS datapath Performance comparisons Q & A 2 VM virtual
More informationThe.pdf version of this slide deck will have missing info, due to use of animations. The original.pptx deck is available here:
The.pdf version of this slide deck will have missing info, due to use of animations. The original.pptx deck is available here: https://wiki.opnfv.org/download/attachments/10293193/vsperf-dataplane-perf-cap-bench.pptx?api=v2
More informationA Comparison of Performance and Accuracy of Measurement Algorithms in Software
A Comparison of Performance and Accuracy of Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref 1, Yang Zhou 2, Tong Yang 2, Minlan Yu 3 Yale University, Barefoot Networks 1, Peking University
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationDPDK Vhost/Virtio Performance Report Release 17.08
DPDK Vhost/Virtio Performance Report Test Date: August 15 th 2017 Author: Intel DPDK Validation team Revision History Date Revision Comment August 15 th, 2017 1.0 Initial document for release 2 Contents
More informationOpenFlow Software Switch & Intel DPDK. performance analysis
OpenFlow Software Switch & Intel DPDK performance analysis Agenda Background Intel DPDK OpenFlow 1.3 implementation sketch Prototype design and setup Results Future work, optimization ideas OF 1.3 prototype
More informationDPDK Vhost/Virtio Performance Report Release 18.05
DPDK Vhost/Virtio Performance Report Test Date: Jun 1 2018 Author: Intel DPDK Validation Team Revision History Date Revision Comment Jun 1st, 2018 1.0 Initial document for release 2 Release 18.02 Contents
More informationAccelerating VM networking through XDP. Jason Wang Red Hat
Accelerating VM networking through XDP Jason Wang Red Hat Agenda Kernel VS userspace Introduction to XDP XDP for VM Use cases Benchmark and TODO Q&A Kernel Networking datapath TAP A driver to transmit
More informationSubmission instructions (read carefully): SS17 / Assignment 4 Instructor: Markus Püschel. ETH Zurich
263-2300-00: How To Write Fast Numerical Code Assignment 4: 120 points Due Date: Th, April 13th, 17:00 http://www.inf.ethz.ch/personal/markusp/teaching/263-2300-eth-spring17/course.html Questions: fastcode@lists.inf.ethz.ch
More informationXen Network I/O Performance Analysis and Opportunities for Improvement
Xen Network I/O Performance Analysis and Opportunities for Improvement J. Renato Santos G. (John) Janakiraman Yoshio Turner HP Labs Xen Summit April 17-18, 27 23 Hewlett-Packard Development Company, L.P.
More informationA Userspace Packet Switch for Virtual Machines
SHRINKING THE HYPERVISOR ONE SUBSYSTEM AT A TIME A Userspace Packet Switch for Virtual Machines Julian Stecklina OS Group, TU Dresden jsteckli@os.inf.tu-dresden.de VEE 2014, Salt Lake City 1 Motivation
More informationQuo Vadis Virtio? Michael S. Tsirkin Red Hat
Quo Vadis Virtio? 26 Michael S. Tsirkin Red Hat Uses material from https://lwn.net/kernel/ldd3/ Gcompris, tuxpaint, childplay Distributed under the Creative commons license, except logos which are C/TM
More informationDPDK Vhost/Virtio Performance Report Release 18.11
DPDK Vhost/Virtio Performance Report Test Date: December 3st 2018 Author: Intel DPDK Validation Team Revision History Date Revision Comment December 3st, 2018 1.0 Initial document for release 2 Contents
More informationDesign of Vhost-pci - designing a new virtio device for inter-vm communication
Design of Vhost-pci - designing a new virtio device for inter-vm communication Wei Wang wei.w.wang@intel.com Contributors: Jun Nakajima, Mesut Ergin, James Tsai, Guangrong Xiao, Mallesh Koujalagi, Huawei
More informationEnhancing SSD Control of NVMe Devices for Hyperscale Applications. Luca Bert - Seagate Chris Petersen - Facebook
Enhancing SSD Control of NVMe Devices for Hyperscale Applications Luca Bert - Seagate Chris Petersen - Facebook Agenda Introduction & overview (Luca) Problem statement & proposed solution (Chris) SSD implication
More informationGreat Reality #2: You ve Got to Know Assembly Does not generate random values Arithmetic operations have important mathematical properties
Overview Course Overview Course theme Five realities Computer Systems 1 2 Course Theme: Abstraction Is Good But Don t Forget Reality Most CS courses emphasize abstraction Abstract data types Asymptotic
More informationLecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)
Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1 Techniques to Reduce Cache Misses Victim caches Better replacement policies pseudo-lru, NRU Prefetching, cache
More informationKVM PERFORMANCE OPTIMIZATIONS INTERNALS. Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May
KVM PERFORMANCE OPTIMIZATIONS INTERNALS Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May 5 2011 KVM performance optimizations What is virtualization performance? Optimizations in RHEL 6.0 Selected
More informationCache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.
Cache Memories From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging
More informationTungsten Fabric Optimization by DPDK ZHAOYAN CHEN YIPENG WANG
x Tungsten Fabric Optimization by DPDK ZHAOYAN CHEN YIPENG WANG Agenda Introduce Tungsten Fabric Support More CPU cores MPLS over GRE Optimization Hash Table Optimization Batch RX for VM and Fabric What
More informationChangpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group
Changpeng Liu Cloud Storage Software Engineer Intel Data Center Group Notices & Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software
More informationG-NET: Effective GPU Sharing In NFV Systems
G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science
More informationLecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)
Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1 More Cache Basics caches are split as instruction and data; L2 and L3 are unified The /L2 hierarchy can be inclusive,
More informationThese slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.
11 1 This Set 11 1 These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. Text covers multiple-issue machines in Chapter 4, but
More informationIntroducing Cache Pseudo-Locking to reduce memory access latency. Reinette Chatre
Introducing Cache Pseudo-Locking to reduce memory access latency Reinette Chatre About me Software Engineer at Intel (~12 years) Open Source Technology Center (OTC) Currently Enabling Cache Pseudo-Locking
More informationDPDK Roadmap. Tim O Driscoll & Chris Wright Open Networking Summit 2017
DPDK Roadmap Tim O Driscoll & Chris Wright Open Networking Summit 2017 Agenda Overview: What is DPDK? What problems does it solve? Open source community and transition to Linux Foundation: Why is this
More informationDPDK Summit 2016 OpenContrail vrouter / DPDK Architecture. Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr.
DPDK Summit 2016 OpenContrail vrouter / DPDK Architecture Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr. Product Manager CONTRAIL (MULTI-VENDOR) ARCHITECTURE ORCHESTRATOR Interoperates
More informationIntel New RDT Features and Implementation Introduction
Intel New RDT Features and Implementation Introduction Yi Sun Jun. 10 th, 2017 1 Agenda Shared Resource Contention Solution: Intel Resource Director Technology (RDT) Performance Improvement Proofs New
More informationNova Scheduler: Optimizing, Configuring and Deploying NFV VNF's on OpenStack
Nova Scheduler: Optimizing, Configuring and Deploying NFV VNF's on OpenStack Ian Jolliffe, Chris Friesen WHEN IT MATTERS, IT RUNS ON WIND RIVER. 2017 WIND RIVER. ALL RIGHTS RESERVED. Ian Jolliffe 2 2017
More informationBridging OPNFV and ETSI Yardstick and the methodology for pre-deployment validation of NFV Infrastructure
Bridging OPNFV and ETSI Yardstick and the methodology for pre-deployment validation of NFV Infrastructure Ana Cunha (Ericsson) ana.cunha@ericsson.com Agenda The facts The questions The ETSI-NFV methodology
More informationData Path acceleration techniques in a NFV world
Data Path acceleration techniques in a NFV world Mohanraj Venkatachalam, Purnendu Ghosh Abstract NFV is a revolutionary approach offering greater flexibility and scalability in the deployment of virtual
More informationDPDK Summit China 2017
Summit China 2017 Embedded Network Architecture Optimization Based on Lin Hao T1 Networks Agenda Our History What is an embedded network device Challenge to us Requirements for device today Our solution
More informationMost of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s
Most of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s Perspective, 2 nd Edition and are provided from the website
More informationImprove VNF safety with Vhost-User/DPDK IOMMU support
Improve VNF safety with Vhost-User/DPDK IOMMU support No UIO anymore! Maxime Coquelin Software Engineer KVM Forum 2017 AGENDA Background Vhost-user device IOTLB implementation Benchmarks Future improvements
More informationReliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!
Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015! ! My Topic for Today! Goal: a reliable longest name prefix lookup performance
More informationOptimizing Memory Bandwidth
Optimizing Memory Bandwidth Don t settle for just a byte or two. Grab a whole fistful of cache. Mike Wall Member of Technical Staff Developer Performance Team Advanced Micro Devices, Inc. make PC performance
More informationNetchannel 2: Optimizing Network Performance
Netchannel 2: Optimizing Network Performance J. Renato Santos +, G. (John) Janakiraman + Yoshio Turner +, Ian Pratt * + HP Labs - * XenSource/Citrix Xen Summit Nov 14-16, 2007 2003 Hewlett-Packard Development
More informationCS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)
CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8) Important from last time We re trying to build efficient virtual address spaces Why?? Virtual / physical translation is done by HW and
More informationBringing the Power of ebpf to Open vswitch. Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.
Bringing the Power of ebpf to Open vswitch Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.io 1 Outline Introduction and Motivation OVS-eBPF Project OVS-AF_XDP
More informationAfternoon Session 20
Afternoon Session 20 DPDK Sample Apps L2fwd Crypto Ethtool L2fwd IVSHME M Exception Path KNI Perf Thread VMDq Timer RxTx Callbacks PTP Client L3fwd VF L2fwd CAT TEP Term Quota & W mark Multi Process L2fwd
More informationAgilio CX 2x40GbE with OVS-TC
PERFORMANCE REPORT Agilio CX 2x4GbE with OVS-TC OVS-TC WITH AN AGILIO CX SMARTNIC CAN IMPROVE A SIMPLE L2 FORWARDING USE CASE AT LEAST 2X. WHEN SCALED TO REAL LIFE USE CASES WITH COMPLEX RULES TUNNELING
More informationHash Table Design and Optimization for Software Virtual Switches
Hash Table Design and Optimization for Software Virtual Switches P R E S E N T E R : R E N WA N G Y I P E N G WA N G, S A M E H G O B R I E L, R E N WA N G, C H A R L I E TA I, C R I S T I A N D U M I
More informationVIRTIO: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD. CUNMING LIANG, Intel
VIRTIO: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD CUNMING LIANG, Intel Agenda Towards NFV Cloud vhost Data Path Acceleration vdpa Intro vdpa Design vdpa Implementation Summary & Future Work Towards
More informationDistributed caching for cloud computing
Distributed caching for cloud computing Maxime Lorrillere, Julien Sopena, Sébastien Monnet et Pierre Sens February 11, 2013 Maxime Lorrillere (LIP6/UPMC/CNRS) February 11, 2013 1 / 16 Introduction Context
More informationVIRTIO-NET: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD. CUNMING LIANG, Intel
VIRTIO-NET: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD CUNMING LIANG, Intel Agenda Towards NFV Cloud Background & Motivation vhost Data Path Acceleration Intro Design Impl Summary & Future Work Towards
More informationAre You Insured Against Your Noisy Neighbor Sunku Ranganath, Intel Corporation Sridhar Rao, Spirent Communications
Are You Insured Against Your Noisy Neighbor Sunku Ranganath, Intel Corporation Sridhar Rao, Spirent Communications @SunkuRanganath, @ngignir Legal Disclaimer 2018 Intel Corporation. Intel, the Intel logo,
More informationService Edge Virtualization - Hardware Considerations for Optimum Performance
Service Edge Virtualization - Hardware Considerations for Optimum Performance Executive Summary This whitepaper provides a high level overview of Intel based server hardware components and their impact
More informationBESS: A Virtual Switch Tailored for NFV
BESS: A Virtual Switch Tailored for NFV Sangjin Han, Aurojit Panda, Brian Kim, Keon Jang, Joshua Reich, Saikrishna Edupuganti, Christian Maciocco, Sylvia Ratnasamy, Scott Shenker https://github.com/netsys/bess
More informationVALE: a switched ethernet for virtual machines
L < > T H local VALE VALE -- Page 1/23 VALE: a switched ethernet for virtual machines Luigi Rizzo, Giuseppe Lettieri Università di Pisa http://info.iet.unipi.it/~luigi/vale/ Motivation Make sw packet processing
More informationVDPA: VHOST-MDEV AS NEW VHOST PROTOCOL TRANSPORT
VDPA: VHOST-MDEV AS NEW VHOST PROTOCOL TRANSPORT CUNMING(Steve) LIANG, Intel cunming.liang AT intel.com KVM Forum 2018, Edinburgh October, 2018 Background KVM Forum 2018 / Edinburg / 2018 Intel Corporation
More informationIntel s Architecture for NFV
Intel s Architecture for NFV Evolution from specialized technology to mainstream programming Net Futures 2015 Network applications Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationComputer Labs: Profiling and Optimization
Computer Labs: Profiling and Optimization 2 o MIEIC Pedro F. Souto (pfs@fe.up.pt) December 15, 2010 Optimization Speed matters, and it depends mostly on the right choice of Data structures Algorithms If
More informationDataplane Networking journey in Containers
Dataplane Networking journey in Gary Loughnane gary.loughnane@intel.com Kuralamudhan Ramakrishnan kuralamudhan.ramakrishnan@intel.com DPDK Summit Userspace - Dublin- 2017 Discussion topics Container Deployment
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationGot Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China
Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat ACM SIGCOMM 2013, 12-16 August, Hong Kong, China Virtualized Server 1 Application Performance in Virtualized
More informationDatacenter Network Solutions Group
1 Enabling NFV features in kubernetes IVAN COUGHLAN IVAN.COUGHLAN@INTEL.COM Software Architect Kuralamudhan Ramakrishnan kuralamudhan.ramakrishnan@intel.com Senior Software Engineer Data Center Network
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationLecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1)
Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1) 1 Types of Cache Misses Compulsory misses: happens the first time a memory word is accessed the misses for an infinite cache
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationResQ: Enabling SLOs in Network Function Virtualization
ResQ: Enabling SLOs in Network Function Virtualization Amin Tootoonchian* Aurojit Panda Chang Lan Melvin Walls Katerina Argyraki Sylvia Ratnasamy Scott Shenker *Intel Labs UC Berkeley ICSI NYU Nefeli EPFL
More informationProblem 1. (15 points):
CMU 15-418/618: Parallel Computer Architecture and Programming Practice Exercise 1 A Task Queue on a Multi-Core, Multi-Threaded CPU Problem 1. (15 points): The figure below shows a simple single-core CPU
More informationAgenda How DPDK can be used for your Application DPDK Ecosystem boosting your Development Meet the Community Challenges
SPEED MATTERS. All rights reserved. All brand names, trademarks and copyright information cited in this presentation shall remain the property of its registered owners. Agenda How DPDK can be used for
More informationAddressing the Memory Wall
Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationW H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4
W H I T E P A P E R Comparison of Storage Protocol Performance in VMware vsphere 4 Table of Contents Introduction................................................................... 3 Executive Summary............................................................
More informationSupport for Smart NICs. Ian Pratt
Support for Smart NICs Ian Pratt Outline Xen I/O Overview Why network I/O is harder than block Smart NIC taxonomy How Xen can exploit them Enhancing Network device channel NetChannel2 proposal I/O Architecture
More informationPASTE: A Networking API for Non-Volatile Main Memory
PASTE: A Networking API for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Lars Eggert (NetApp) Douglas Santry (NetApp) TSVAREA@IETF 99, Prague May 22th 2017 More details at our HotNets
More informationUserspace NVMe Driver in QEMU
Userspace NVMe Driver in QEMU Fam Zheng Senior Software Engineer KVM Form 2017, Prague About NVMe Non-Volatile Memory Express A scalable host interface specification like SCSI and virtio Up to 64k I/O
More informationDPDK Summit China 2017
DPDK Summit China 2017 2 DPDK in container Status Quo and Future Directions Jianfeng Tan, June 2017 3 LEGAL DISCLAIMER No license (express or implied, by estoppel or otherwise) to any intellectual property
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationNext Generation Technology from Intel Intel Pentium 4 Processor
Next Generation Technology from Intel Intel Pentium 4 Processor 1 The Intel Pentium 4 Processor Platform Intel s highest performance processor for desktop PCs Targeted at consumer enthusiasts and business
More informationCrosstalk between VMs. Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA
Crosstalk between VMs Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA 2 September 2015 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT
More informationBuilding high performance network functions in VPP. Ole Trøan, VPP contributor FOSDEM 2018
Building high performance network functions in VPP Ole Trøan, ot@cisco.com, VPP contributor FOSDEM 2018 1 2 This talk? Goal: Make you into VPP developers Agenda: VPP architecture An example decomposed
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationCACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás
CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,
More informationDPDK Tunneling Offload RONY EFRAIM & YONGSEOK KOH MELLANOX
x DPDK Tunneling Offload RONY EFRAIM & YONGSEOK KOH MELLANOX Rony Efraim Introduction to DC w/ overlay network Modern data center (DC) use overly network like Virtual Extensible LAN (VXLAN) and GENEVE
More informationHow to Write Fast Numerical Code
How to Write Fast Numerical Code Lecture: Memory hierarchy, locality, caches Instructor: Markus Püschel TA: Alen Stojanov, Georg Ofenbeck, Gagandeep Singh Organization Temporal and spatial locality Memory
More informationZero-copy Receive for Virtualized Network Devices
Zero-copy Receive for Virtualized Network Devices Kalman Meth, Joel Nider and Mike Rapoport IBM Research Labs - Haifa {meth,joeln,rapoport}@il.ibm.com Abstract. When receiving network traffic on guest
More informationAn FPGA-Based Optical IOH Architecture for Embedded System
An FPGA-Based Optical IOH Architecture for Embedded System Saravana.S Assistant Professor, Bharath University, Chennai 600073, India Abstract Data traffic has tremendously increased and is still increasing
More informationVirtual Memory. Virtual Memory
Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical
More informationCaches and Memory. Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.13, 5.15, 5.17
Caches and emory Anne Bracy CS 34 Computer Science Cornell University Slides by Anne Bracy with 34 slides by Professors Weatherspoon, Bala, ckee, and Sirer. See P&H Chapter: 5.-5.4, 5.8, 5., 5.3, 5.5,
More informationSupporting Fine-Grained Network Functions through Intel DPDK
Supporting Fine-Grained Network Functions through Intel DPDK Ivano Cerrato, Mauro Annarumma, Fulvio Risso - Politecnico di Torino, Italy EWSDN 2014, September 1st 2014 This project is co-funded by the
More information6.9. Communicating to the Outside World: Cluster Networking
6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and
More informationProgramming Parallel Computers
ICS-E4020 Programming Parallel Computers Jukka Suomela Jaakko Lehtinen Samuli Laine Aalto University Spring 2016 users.ics.aalto.fi/suomela/ppc-2016/ New code must be parallel! otherwise a computer from
More informationImproving Driver Performance A Worked Example
USERSPACE, October 2016 Improving Driver Performance A Worked Example Bruce Richardson Legal Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is
More informationLearning with Purpose
Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts
More informationDPDK Integration within F5 BIG-IP BRENT BLOOD, SR MANAGER SOFTWARE ENGINEERING VIJAY MANICKAM, SR SOFTWARE ENGINEER
x DPDK Integration within F5 BIG-IP BRENT BLOOD, SR MANAGER SOFTWARE ENGINEERING VIJAY MANICKAM, SR SOFTWARE ENGINEER F5 Company Snapshot Founded: 1996 IPO: June 1999 Employees: 4,395 Headquarters: Seattle,
More informationPacketShader: A GPU-Accelerated Software Router
PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,
More informationSoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet
SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet Mao Miao, Fengyuan Ren, Xiaohui Luo, Jing Xie, Qingkai Meng, Wenxue Cheng Dept. of Computer Science and Technology, Tsinghua
More informationImprove Performance of Kube-proxy and GTP-U using VPP
Improve Performance of Kube-proxy and GTP-U using VPP Hongjun Ni (hongjun.ni@intel.com) Danny Zhou (danny.zhou@intel.com) Johnson Li (johnson.li@intel.com) Network Platform Group, DCG, Intel Acknowledgement:
More informationI/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13
I/O Handling ECE 650 Systems Programming & Engineering Duke University, Spring 2018 Based on Operating Systems Concepts, Silberschatz Chapter 13 Input/Output (I/O) Typical application flow consists of
More informationFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC,
Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC, QorIQ, StarCore and Symphony are trademarks of Freescale
More information