WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS
|
|
- Laurence Merritt
- 6 years ago
- Views:
Transcription
1 WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES ON BIG AND SMALL SERVER PLATFORMS Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium Inc.
2 EXECUTIVE SUMMARY How to achieve low tail latency for interactive cloud services? Tail latency more important and challenging The entire stack from SW to HW is involved Understand how tail latency reacts to application and system changes See how current designs work Get insights on future designs Page 1 of 20
3 MOTIVATION Page 2 of 20
4 LOW LATENCY Tail latency e.g., QoS defined as 99 th %ile in 500usec = Page 3 of 20
5 LOW TAIL LATENCY REQUIREMENTS The entire stack from SW to HW is involved Application Resource Manager Application bottleneck Different user cases Scalability Virtualization OS Hardware Overhead of virtualization SW isolation mechanisms Overhead of context switching HW isolation mechanisms Hyperthreading Page 4 of 20
6 CATEGORIZE LC APPLICATIONS By requirement of tail latency us: memcached ms: web server, in-memory database s: persistent database By statefulness Stateful: memcached Stateless: web server Page 5 of 20
7 SELECTED LC WORKLOADS NGINX Web server Stateless 99 th % in tens of ms Memcached Key-value store Stateful 99 th % in hundreds of us NGINX QoS Strictness Memcached Statefulness Page 6 of 20
8 SERVER ARCHITECTURE P L1 I/D: 32/32KB 22 Cores 2 Threads/Core P L1 I/D: 32/32KB L2: 256KB L2: 256KB P L1 I/D: 78/32KB 48 Cores 1 Thread/Core P L1 I/D: 78/32KB LLC: 55MB, 20 ways 14nm LLC: 16MB, 16 ways 28nm Memory: 128G DDR4 Memory: 128G DDR4 NIC: 10Gbps NIC: 10Gbps Intel Xeon E v4 Cavium ThunderX $4,115 $785 Page 7 of 20
9 STUDIED PARAMETERS Application Application bottleneck Different user cases Scalability Resource Manager Virtualization OS Hardware Overhead of virtualization SW isolation mechanisms Overhead of context switching HW isolation mechanisms Hyperthreading Page 8 of 20
10 INPUT LOAD Xeon ThunderX 5.2x 5x Memcached NGINX Page 9 of 20
11 MEMCACHED LATENCY DECOMPOSITION NIC RX NIC IRQ Kernel Syscall Xeon ThunderX Receive At 10% of max throughput Send Little user-space processing Network delay 2x slower than Xeon User Xeon ThunderX At 90% of max throughput , Queuing delay 1,290 1, Page 10 of 20
12 STUDIED PARAMETERS Application Application bottleneck Different user cases Scalability Resource Manager Virtualization OS Hardware Overhead of virtualization SW isolation mechanisms Overhead of context switching HW isolation mechanisms Hyperthreading Page 11 of 20
13 MEMCACHED VALUE SIZE Xeon ThunderX Memory copy Network processing and transmission ThunderX is more sensitive Page 12 of 20
14 NUMBER OF MEMCACHED ITEMS Xeon ThunderX Cache capacity ThunderX is more sensitive Page 13 of 20
15 STUDIED PARAMETERS Application Application bottleneck Different user cases Scalability Resource Manager Virtualization OS Hardware Overhead of virtualization SW isolation mechanisms Overhead of context switching HW isolation mechanisms Hyperthreading Page 14 of 20
16 SCALABILITY Memcached NGINX Interrupt handling Load imbalance Lock contention Page 15 of 20
17 STUDIED PARAMETERS Application Application bottleneck Different user cases Scalability Resource Manager Virtualization OS Hardware Overhead of virtualization SW isolation mechanisms Overhead of context switching HW isolation mechanisms Hyperthreading Page 16 of 20
18 CONTEXT SWITCHING Memcached on Xeon Memcached on ThunderX Statically spawned threads VS dynamically allocated cores ThunderX is more sensitive Page 17 of 20
19 STUDIED PARAMETERS Application Application bottleneck Different user cases Scalability Resource Manager Virtualization OS Hardware Overhead of virtualization SW isolation mechanisms Overhead of context switching HW isolation mechanisms Hyperthreading Page 18 of 20
20 HYPERTHREADING Reduce the overhead of context switching Allocate two threads on two hyperthreads Make better use of execution units Co-locate different applications Memcached & Nginx on the same hyperthreads Memcached & Nginx on different hyperthreads Page 19 of 20
21 IMPLICATIONS QUESTIONS? OF THESE STUDIES Application Resource Manager Reduce queuing delays Improve elasticity Lock alternatives Load balance Virtualization OS Hardware Reduce the overhead of virtualization Avoid context switching Make best use of SW isolation mechanisms Big VS Small Cores Make best use of HW features Page 20 of 20
SWAP: EFFECTIVE FINE-GRAIN MANAGEMENT
: EFFECTIVE FINE-GRAIN MANAGEMENT OF SHARED LAST-LEVEL CACHES WITH MINIMUM HARDWARE SUPPORT Xiaodong Wang, Shuang Chen, Jeff Setter, and José F. Martínez Computer Systems Lab Cornell University Page 1
More informationNo Tradeoff Low Latency + High Efficiency
No Tradeoff Low Latency + High Efficiency Christos Kozyrakis http://mast.stanford.edu Latency-critical Applications A growing class of online workloads Search, social networking, software-as-service (SaaS),
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges
More informationibench: Quantifying Interference in Datacenter Applications
ibench: Quantifying Interference in Datacenter Applications Christina Delimitrou and Christos Kozyrakis Stanford University IISWC September 23 th 2013 Executive Summary Problem: Increasing utilization
More informationWorkload Characterization of Interactive Cloud Services on Big and Small Server Platforms
Workload Characterization of Interactive Cloud Services on Big and Small Server Platforms Shuang Chen, Shay GalOn, Christina Delimitrou, Srilatha Manne, José F. Martínez Computer Systems Laboratory, Cornell
More informationAN OPEN-SOURCE BENCHMARK SUITE FOR MICROSERVICES AND THEIR HARDWARE-SOFTWARE IMPLICATIONS FOR CLOUD AND EDGE SYSTEMS
AN OPEN-SOURCE BENCHMARK SUITE FOR MICROSERVICES AND THEIR HARDWARE-SOFTWARE IMPLICATIONS FOR CLOUD AND EDGE SYSTEMS Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayantara Katarki,
More informationReducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet
Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems
More informationArrakis: The Operating System is the Control Plane
Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building
More informationDistributed caching for cloud computing
Distributed caching for cloud computing Maxime Lorrillere, Julien Sopena, Sébastien Monnet et Pierre Sens February 11, 2013 Maxime Lorrillere (LIP6/UPMC/CNRS) February 11, 2013 1 / 16 Introduction Context
More informationDetection and Mitigation of Performance Attacks in Multi-Tenant Cloud Computing
Institute for Cyber Security Department of Computer Science Detection and Mitigation of Performance Attacks in Multi-Tenant Cloud Computing Carlos Cardenas and Rajendra V. Boppana Computer Science Department
More informationHigh Performance Packet Processing with FlexNIC
High Performance Packet Processing with FlexNIC Antoine Kaufmann, Naveen Kr. Sharma Thomas Anderson, Arvind Krishnamurthy University of Washington Simon Peter The University of Texas at Austin Ethernet
More informationBolt: I Know What You Did Last Summer In the Cloud
Bolt: I Know What You Did Last Summer In the Cloud Christina Delimitrou1 and Christos Kozyrakis2 1Cornell University, 2Stanford University Platform Lab Review February 2018 Executive Summary Problem: cloud
More informationSEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES
SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou Cornell
More informationBolt: I Know What You Did Last Summer In the Cloud
Bolt: I Know What You Did Last Summer In the Cloud Christina Delimitrou 1 and Christos Kozyrakis 2 1 Cornell University, 2 Stanford University ASPLOS April 12 th 2017 Executive Summary Problem: cloud resource
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationDesigning Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen
Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services Presented by: Jitong Chen Outline Architecture of Web-based Data Center Three-Stage framework to benefit
More informationMoneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010
Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed
More informationData Path acceleration techniques in a NFV world
Data Path acceleration techniques in a NFV world Mohanraj Venkatachalam, Purnendu Ghosh Abstract NFV is a revolutionary approach offering greater flexibility and scalability in the deployment of virtual
More informationThe Hardware & Software Implications of Microservices and How Big Data Can Help
The Hardware & Software Implications of Microservices and How Big Data Can Help Christina Delimitrou Cornell University with Yu Gan, Yanqi Zhang, Shuang Chen, Neeraj Kulkarni, Ariana Bruno, Justin Hu,
More informationA Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan
LegoOS A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang Y 4 1 2 Monolithic Server OS / Hypervisor 3 Problems? 4 cpu mem Resource
More informationHigh-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han
High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han Seoul National University, Korea Dongduk Women s University, Korea Contents Motivation and Background
More informationSpeeding up Linux TCP/IP with a Fast Packet I/O Framework
Speeding up Linux TCP/IP with a Fast Packet I/O Framework Michio Honda Advanced Technology Group, NetApp michio@netapp.com With acknowledge to Kenichi Yasukata, Douglas Santry and Lars Eggert 1 Motivation
More informationLinux Storage System Bottleneck Exploration
Linux Storage System Bottleneck Exploration Bean Huo / Zoltan Szubbocsev Beanhuo@micron.com / zszubbocsev@micron.com 215 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications
More informationW H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4
W H I T E P A P E R Comparison of Storage Protocol Performance in VMware vsphere 4 Table of Contents Introduction................................................................... 3 Executive Summary............................................................
More informationBe Fast, Cheap and in Control with SwitchKV. Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level
More informationSOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power
More informationPARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services
Abstract PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services Shuang Chen Cornell University sc2682@cornell.edu Multi-tenancy in modern datacenters is currently limited to a single
More informationBe Fast, Cheap and in Control with SwitchKV Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for
More informationQuickSpecs. HP Z 10GbE Dual Port Module. Models
Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or
More informationImpact of Cache Coherence Protocols on the Processing of Network Traffic
Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background
More informationExtremely Fast Distributed Storage for Cloud Service Providers
Solution brief Intel Storage Builders StorPool Storage Intel SSD DC S3510 Series Intel Xeon Processor E3 and E5 Families Intel Ethernet Converged Network Adapter X710 Family Extremely Fast Distributed
More informationThe Power of Batching in the Click Modular Router
The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering
More informationMy VM is Lighter (and Safer) than your Container
My VM is Lighter (and Safer) than your Container Filipe Manco, Florian Schmidt, Simon Kuenzer, Kenichi Yasukata, Sumit Sati, Costin Lupu*, Costin Raiciu*, Felipe Huici NEC Europe Ltd, *University Politehnica
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationException-Less System Calls for Event-Driven Servers
Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto Talk overview At OSDI'10: exception-less system calls Technique targeted at highly threaded servers
More informationBuilding a High IOPS Flash Array: A Software-Defined Approach
Building a High IOPS Flash Array: A Software-Defined Approach Weafon Tsao Ph.D. VP of R&D Division, AccelStor, Inc. Santa Clara, CA Clarification Myth 1: S High-IOPS SSDs = High-IOPS All-Flash Array SSDs
More informationMoneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories
Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems
More informationEfficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han
Efficient Memory Mapped File I/O for In-Memory File Systems Jungsik Choi, Jiwon Kim, Hwansoo Han Operations Per Second Storage Latency Close to DRAM SATA/SAS Flash SSD (~00μs) PCIe Flash SSD (~60 μs) D-XPoint
More informationDPDK Summit China 2017
Summit China 2017 Embedded Network Architecture Optimization Based on Lin Hao T1 Networks Agenda Our History What is an embedded network device Challenge to us Requirements for device today Our solution
More informationImproving Packet Processing Performance of a Memory- Bounded Application
Improving Packet Processing Performance of a Memory- Bounded Application Jörn Schumacher CERN / University of Paderborn, Germany jorn.schumacher@cern.ch On behalf of the ATLAS FELIX Developer Team LHCb
More informationXen Network I/O Performance Analysis and Opportunities for Improvement
Xen Network I/O Performance Analysis and Opportunities for Improvement J. Renato Santos G. (John) Janakiraman Yoshio Turner HP Labs Xen Summit April 17-18, 27 23 Hewlett-Packard Development Company, L.P.
More informationIsoStack Highly Efficient Network Processing on Dedicated Cores
IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single
More informationAuthors : Ruslan Nikolaev Godmar Back Presented in SOSP 13 on Nov 3-6, 2013
VirtuOS: An operating sytem with kernel virtualization Authors : Ruslan Nikolaev Godmar Back Presented in SOSP 13 on Nov 3-6, 2013 Presentation by Bien Aime MUGABARIGIRA Process Isolation and protection
More informationTackling the Management Challenges of Server Consolidation on Multi-core System
Tackling the Management Challenges of Server Consolidation on Multi-core System Hui Lv (hui.lv@intel.com) Intel June. 2011 1 Agenda SPECvirt_sc2010* Introduction SPECvirt_sc2010* Workload Scalability Analysis
More informationThe benefits and costs of writing a POSIX kernel in a high-level language
1 / 38 The benefits and costs of writing a POSIX kernel in a high-level language Cody Cutler, M. Frans Kaashoek, Robert T. Morris MIT CSAIL Should we use high-level languages to build OS kernels? 2 / 38
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationMulti-core Programming Evolution
Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution
More informationUtilizing the IOMMU scalably
Utilizing the IOMMU scalably Omer Peleg, Adam Morrison, Benjamin Serebrin, and Dan Tsafrir USENIX ATC 15 2017711456 Shin Seok Ha 1 Introduction What is an IOMMU? Provides the translation between IO addresses
More informationFalcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University
Falcon: Scaling IO Performance in Multi-SSD Volumes Pradeep Kumar H Howie Huang The George Washington University SSDs in Big Data Applications Recent trends advocate using many SSDs for higher throughput
More informationEleos: Exit-Less OS Services for SGX Enclaves
Eleos: Exit-Less OS Services for SGX Enclaves Meni Orenbach Marina Minkin Pavel Lifshits Mark Silberstein Accelerated Computing Systems Lab Haifa, Israel What do we do? Improve performance: I/O intensive
More informationComputer Systems Laboratory Sungkyunkwan University
I/O System Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction (1) I/O devices can be characterized by Behavior: input, output, storage
More informationEXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab
EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT Konstantinos Alexopoulos ECE NTUA CSLab MOTIVATION HPC, Multi-node & Heterogeneous Systems Communication with low latency
More informationGaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems
Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi Introduction and Motivation 2 A serious issue to the effective utilization
More informationSCALING HARDWARE AND SOFTWARE
SCALING HARDWARE AND SOFTWARE FOR THOUSAND-CORE SYSTEMS Daniel Sanchez Electrical Engineering Stanford University Multicore Scalability 1.E+06 10 6 1.E+05 10 5 1.E+04 10 4 1.E+03 10 3 1.E+02 10 2 1.E+01
More informationDynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle
Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation
More informationCurriculum 2013 Knowledge Units Pertaining to PDC
Curriculum 2013 Knowledge Units Pertaining to C KA KU Tier Level NumC Learning Outcome Assembly level machine Describe how an instruction is executed in a classical von Neumann machine, with organization
More informationPreserving I/O Prioritization in Virtualized OSes
Preserving I/O Prioritization in Virtualized OSes Kun Suo 1, Yong Zhao 1, Jia Rao 1, Luwei Cheng 2, Xiaobo Zhou 3, Francis C. M. Lau 4 The University of Texas at Arlington 1, Facebook 2, University of
More informationEECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun
EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,
More informationMunara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.
Munara Tolubaeva Technical Consulting Engineer 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. notices and disclaimers Intel technologies features and benefits depend
More informationExploring System Challenges of Ultra-Low Latency Solid State Drives
Exploring System Challenges of Ultra-Low Latency Solid State Drives Sungjoon Koh Changrim Lee, Miryeong Kwon, and Myoungsoo Jung Computer Architecture and Memory systems Lab Executive Summary Motivation.
More informationOpenFlow Software Switch & Intel DPDK. performance analysis
OpenFlow Software Switch & Intel DPDK performance analysis Agenda Background Intel DPDK OpenFlow 1.3 implementation sketch Prototype design and setup Results Future work, optimization ideas OF 1.3 prototype
More informationMeasurement-based Analysis of TCP/IP Processing Requirements
Measurement-based Analysis of TCP/IP Processing Requirements Srihari Makineni Ravi Iyer Communications Technology Lab Intel Corporation {srihari.makineni, ravishankar.iyer}@intel.com Abstract With the
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems
More informationZiye Yang. NPG, DCG, Intel
Ziye Yang NPG, DCG, Intel Agenda What is SPDK? Accelerated NVMe-oF via SPDK Conclusion 2 Agenda What is SPDK? Accelerated NVMe-oF via SPDK Conclusion 3 Storage Performance Development Kit Scalable and
More informationHardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc.
Hardware and Software solutions for scaling highly threaded processors Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Agenda Chip Multi-threaded concepts Lessons learned from 6 years of CMT
More informationDeveloping deterministic networking technology for railway applications using TTEthernet software-based end systems
Developing deterministic networking technology for railway applications using TTEthernet software-based end systems Project n 100021 Astrit Ademaj, TTTech Computertechnik AG Outline GENESYS requirements
More informationArchitecture and Performance Implications
VMWARE WHITE PAPER VMware ESX Server 2 Architecture and Performance Implications ESX Server 2 is scalable, high-performance virtualization software that allows consolidation of multiple applications in
More informationVoltDB vs. Redis Benchmark
Volt vs. Redis Benchmark Motivation and Goals of this Evaluation Compare the performance of several distributed databases that can be used for state storage in some of our applications Low latency is expected
More informationProfiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Motivation Scientific experiments are generating large amounts of data Education
More informationTowards Energy-Proportional Datacenter Memory with Mobile DRAM
Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University
More informationRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance Results Brendan Gregg; Performance and Reliability Engineering Nitesh Kant, Ben Christensen; Edge Engineering updated: Apr 2015 Results based on The Hello Netflix benchmark
More informationQuickRec: Prototyping an Intel Architecture Extension for Record and Replay of Multithreaded Programs
QuickRec: Prototyping an Intel Architecture Extension for Record and Replay of Multithreaded Programs Intel: Gilles Pokam, Klaus Danne, Cristiano Pereira, Rolf Kassa, Tim Kranich, Shiliang Hu, Justin Gottschlich
More informationDongjun Shin Samsung Electronics
2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer
More informationHP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads
HP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads Gen9 server blades give more performance per dollar for your investment. Executive Summary Information Technology (IT)
More informationOptimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications
Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications K. Vaidyanathan, P. Lai, S. Narravula and D. K. Panda Network Based Computing Laboratory
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationLearning with Purpose
Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts
More informationAccelerating 4G Network Performance
WHITE PAPER Accelerating 4G Network Performance OFFLOADING VIRTUALIZED EPC TRAFFIC ON AN OVS-ENABLED NETRONOME SMARTNIC NETRONOME AGILIO SMARTNICS PROVIDE A 5X INCREASE IN vepc BANDWIDTH ON THE SAME NUMBER
More informationMaking the Box Transparent: System Call Performance as a First-class Result. Yaoping Ruan, Vivek Pai Princeton University
Making the Box Transparent: System Call Performance as a First-class Result Yaoping Ruan, Vivek Pai Princeton University Outline n Motivation n Design & implementation n Case study n More results Motivation
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU
More informationColin Cunningham, Intel Kumaran Siva, Intel Sandeep Mahajan, Oracle 03-Oct :45 p.m. - 5:30 p.m. Moscone West - Room 3020
Colin Cunningham, Intel Kumaran Siva, Intel Sandeep Mahajan, Oracle 03-Oct-2017 4:45 p.m. - 5:30 p.m. Moscone West - Room 3020 Big Data Talk Exploring New SSD Usage Models to Accelerate Cloud Performance
More informationFast and Easy Persistent Storage for Docker* Containers with Storidge and Intel
Solution brief Intel Storage Builders Storidge ContainerIO TM Intel Xeon Processor Scalable Family Intel SSD DC Family for PCIe*/NVMe Fast and Easy Persistent Storage for Docker* Containers with Storidge
More information2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.
Ethernet Storage Fabrics Using RDMA with Fast NVMe-oF Storage to Reduce Latency and Improve Efficiency Kevin Deierling & Idan Burstein Mellanox Technologies 1 Storage Media Technology Storage Media Access
More informationEfficient Memory Disaggregation with Infiniswap. Juncheng Gu, Youngmoon Lee, Yiwen Zhang, MosharafChowdhury, Kang G. Shin
Efficient Memory Disaggregation with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, MosharafChowdhury, Kang G. Shin Agenda Motivation and related work Design and system overview Implementation and evaluation
More informationMultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores
MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores Junbin Kang, Benlong Zhang, Tianyu Wo, Chunming Hu, and Jinpeng Huai Beihang University 夏飞 20140904 1 Outline Background
More informationNetworking at the Speed of Light
Networking at the Speed of Light Dror Goldenberg VP Software Architecture MaRS Workshop April 2017 Cloud The Software Defined Data Center Resource virtualization Efficient services VM, Containers uservices
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationPlay2SDG: Bridging the Gap between Serving and Analytics in Scalable Web Applications
Play2SDG: Bridging the Gap between Serving and Analytics in Scalable Web Applications Panagiotis Garefalakis M.Res Thesis Presentation, 7 September 2015 Outline Motivation Challenges Scalable web app design
More informationScott Oaks, Oracle Sunil Raghavan, Intel Daniel Verkamp, Intel 03-Oct :45 p.m. - 4:30 p.m. Moscone West - Room 3020
Scott Oaks, Oracle Sunil Raghavan, Intel Daniel Verkamp, Intel 03-Oct-2017 3:45 p.m. - 4:30 p.m. Moscone West - Room 3020 Big Data Talk Exploring New SSD Usage Models to Accelerate Cloud Performance 03-Oct-2017,
More informationPerformance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware
Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware 2010 VMware Inc. All rights reserved About the Speaker Hemant Gaidhani Senior Technical
More informationDesigning Next Generation FS for NVMe and NVMe-oF
Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO @liranzvibel Santa Clara, CA 1 Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationFlexSC. Flexible System Call Scheduling with Exception-Less System Calls. Livio Soares and Michael Stumm. University of Toronto
FlexSC Flexible System Call Scheduling with Exception-Less System Calls Livio Soares and Michael Stumm University of Toronto Motivation The synchronous system call interface is a legacy from the single
More informationAn Analysis and Empirical Study of Container Networks
An Analysis and Empirical Study of Container Networks Kun Suo *, Yong Zhao *, Wei Chen, Jia Rao * University of Texas at Arlington *, University of Colorado, Colorado Springs INFOCOM 2018@Hawaii, USA 1
More informationRDMA and Hardware Support
RDMA and Hardware Support SIGCOMM Topic Preview 2018 Yibo Zhu Microsoft Research 1 The (Traditional) Journey of Data How app developers see the network Under the hood This architecture had been working
More informationComparing UFS and NVMe Storage Stack and System-Level Performance in Embedded Systems
Comparing UFS and NVMe Storage Stack and System-Level Performance in Embedded Systems Bean Huo, Blair Pan, Peter Pan, Zoltan Szubbocsev Micron Technology Introduction Embedded storage systems have experienced
More informationPart 1: Introduction to device drivers Part 2: Overview of research on device driver reliability Part 3: Device drivers research at ERTOS
Some statistics 70% of OS code is in device s 3,448,000 out of 4,997,000 loc in Linux 2.6.27 A typical Linux laptop runs ~240,000 lines of kernel code, including ~72,000 loc in 36 different device s s
More informationVARIABILITY IN OPERATING SYSTEMS
VARIABILITY IN OPERATING SYSTEMS Brian Kocoloski Assistant Professor in CSE Dept. October 8, 2018 1 CLOUD COMPUTING Current estimate is that 94% of all computation will be performed in the cloud by 2021
More information