No Tradeoff Low Latency + High Efficiency
|
|
- Scott Little
- 5 years ago
- Views:
Transcription
1 No Tradeoff Low Latency + High Efficiency Christos Kozyrakis
2 Latency-critical Applications A growing class of online workloads Search, social networking, software-as-service (SaaS), Occupy 1,000s of servers in datacenters 2
3 Example: Web-search Front end Q Web Servers Back end R Root Q Parent QQ Parent Q Parent Q R R R Leaf R R Leaf Leaf R Leaf R R Leaf Leaf R Leaf R R Leaf Leaf R Metrics: Massive queries distributed per sec datasets, (QPS) multiple a 99 th -% tiers, latency high threshold fan-out 3
4 Characteristics High throughput + low latency (100μsec 10msec) Focus on tail latency (e.g., 99 th percentile SLO) Distributed and (often) stateful Multi-tier with high fan-out Diurnal patterns but load is never 0 4
5 Trends [Adrian Cockcroft 14] Apps as collection of micro-services Loosely-coupled services each with bounded context Even lower tail latency requirements From 100s to just a few μsec 5
6 Conventional Wisdom for LC Apps 6
7 #1 LC apps cannot be power managed 7
8 LC Apps & Power Management Power (DVFS off) [Lo et al 14] Search load Tail latency (DVFS on) 8
9 #2 LC apps must use dedicated servers 9
10 LC Apps & Server Sharing Impact of interference on websearch s latency LLC >300% >300% >300% >300% >300% >300% >300% 264% 123% 300% DRAM >300% >300% >300% >300% >300% >300% >300% 270% 122% HyperThread 110% 107% 114% 115% 105% 117% 120% 136% >300% B A D CPU power 124% 107% 116% 109% 115% 105% 101% 100% 100% Network 36% 36% 37% 37% 39% 42% 48% 55% 64% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% O K Load [Lo et al 15] 10
11 LC Apps & Core Sharing Achieved QoS w/ two low-latency workloads Memcached QPS (% Peak) Bad QoS with minor interference! Both meet QoS 1ms RPC fails Memcached fails Both fail to meet QoS QoS requirement = 95 th < 5x low-load 95 th ms RPC-service QPS (% Peak) [Leverich 14] 11
12 #3 LC apps must use local Flash 12
13 Local Vs Remote NVMe Access p95 read latency (us) x latency 75% throughput drop Local Flash iscsi (1 core) libaio+libevent (1core) IOPS (Thousands) 13
14 Sharing NVMe Flash The curve you paid for p95 read latency (us) %read 99%read 95%read 90%read 75%read 50%read Total IOPS (Thousands) The curve you get with sharing Flash performance varies a lot with read/write ratio 14
15 #4 LC apps must bypass the kernel for I/O cannot use TCP cannot use Ethernet must use specialized HW 15
16 LC Apps & Linux/TCP/Ethernet 64-byte TCP echo HW Limit Linux x Gap Millions x Gap 0 Microseconds 0 Requests per Second [Belay et al 15] 16
17 Conventional Wisdom for LC Apps 17
18 Low Latency + High Efficiency Understand the problem Understand the opportunity 18
19 Understanding the Latency Problem Latency (usec) MM1 memcached N=1 MM4 memcached N=4 [Delimitrou 15] Load Latency-critical apps as queuing systems Tail latency affected by Service time, queuing delay, scheduling time, load imbalance 19
20 Understanding the Latency Problem Latency (usecs) th-% Provisioned QPS % 8% 14% 19% 24% 30% 35% 41% 46% 51% 57% 62% 68% 73% 78% 84% 89% 95% 100% Memcached QPS (% of peak) [Leverich 15] 20
21 Understanding the Opportunity Significant latency slack! Tail latency [Lo et al 14] 0% 20% 40% 60% 80% 100% % of maximum cluster load Iso-latency: maintain constant tail latency at all loads Enables power management, HW sharing, adaptive batching, Key point: use tail latency as control signal 21
22 Low Latency + High Efficiency Understand the problem Understand the opportunity End-to-end & latency-aware management 22
23 Pegasus: Iso-latency + Fine-grain DVFS L P L L L [Lo et al 14] Measures latency slack end-to-end Sets uniform latency goal across all servers RAPL as knob for power Power is set by workload specific policy 23
24 Pegasus: Iso-latency + Fine-grain DVFS Achieves dynamic energy proportionality [Lo et al 14] 24
25 Heracles: Iso-latency + QoS Isolation Latency readings Controller Can BE grow? LC workload DRAM BW CPU + Memory LLC CPU CPU power DVFS CPU Power Network HTB Net. BW Internal feedback loops [Lo et al 15] Goal: meet SLO while using idle resources for batch workloads End-to-end latency measurements control isolation mechanisms 25
26 Heracles: Iso-latency + QoS Isolation Cores LLC Core freq. Network BW LC BE LC BE BE Max LC BE L Latency readings LC workload DRAM BW CPU + Memory LLC CPU Controller DVFS CPU power CPU Power Network HTB Can BE grow? Net. BW ü Internal feedback loops [Lo et al 15] 26
27 Heracles: Iso-latency + QoS Isolation Cores LLC Core freq. Network BW LC BE LC BE BE Max LC BE L Latency readings LC workload DRAM BW CPU + Memory LLC CPU Controller DVFS CPU power CPU Power Network HTB Can BE grow? Net. BW L Internal feedback loops [Lo et al 15] 27
28 Iso-latency + QoS Isolation >90% HW utilization No tail latency problems [Lo et al 15] LC apps + best-effort tasks on the same servers HW & SW isolation mechanisms (cores, caches, network, power) Iso-latency based control for resource allocation 28
29 Low Latency + High Efficiency Understand the problem Understand the opportunity End-to-end & latency-aware management Optimize for modern multi-core hardware Specialize the SW 29
30 System SW for Low Latency + High BW App 1 App 2 Userspace Kernelspace OS Kernel TCP/IP C C C C C RX/TX 30
31 System SW for Low Latency + High BW Control plane App 1 App 2 Userspace Kernelspace OS Kernel TCP/IP RX TX RX TX RX TX RX TX C C C C C RX/TX 31
32 System SW for Low Latency + High BW Ring 3 Control plane App 1 App 2 Guest Ring 0 Host Ring 0 OS kernel Dune RX TX RX TX RX TX RX TX C C C C C 32
33 System SW for Low Latency + High BW Ring 3 Control plane Memcached libix HTTPd libix Guest Ring 0 IX TCP/IP Custom Transport Host Ring 0 OS Kernel Dune RX TX RX TX RX TX RX TX C C C C C [Belay et al 14] 33
34 Run-to-Completion Ring 3 event-driven app 3 Event Conditions libix Batched Syscalls Guest Ring 0 TCP/IP 2 4 TCP/IP RX FIFO 1 RX TX 6 5 Timer Improves data cache locality Removes scheduling unpredictably 34
35 Adaptive Batching Ring 3 event-driven app 3 Event Conditions libix Batched Syscalls Guest Ring 0 TCP/IP 2 4 TCP/IP RX FIFO 1 Adaptive Batch Calculation RX TX 6 5 Timer Enabled by ISO-latency Improves instruction cache locality and prefetching 35
36 IX: Low Latency + High Bandwidth Latency (µs) IX tail lower than Linux avg 0 Linux (avg) Linux (99 th pct) IX (avg) IX (99 th pct) 2x Less Tail Latency + TCP + commodity HW + protection + 1M connections 5x More RPS SLA USR: Throughput (RPS x 10 6 ) [Belay et al 14] 36
37 Low Latency + High Bandwidth + High Efficiency MC load MC load MC latency MC latency SLO dynamic Pareto Power Time (seconds) Time (seconds) Time (seconds) BE Performance cached running Figure on IX 7: Consolidation for three load patterns. of memcached with a background batch task Time (seconds) 37
38 ReFlex: IX + QoS Scheduling Ring 3 Control Plane App libix App libix Guest Ring 0 IX ReFlex Host Ring 0 Linux kernel Dune RX TX RX TX SQ CQ Core Core Core 38
39 ReFlex: IX + QoS Scheduling Ring 3 ReFlex Server 3 Event Conditions libix Batched Syscalls Guest Ring 0 NVMe TCP/IP 2 TCP/IP Scheduler NVMe CQ 1 RX TX SQ 4 [Klimovic et al 17] 39
40 ReFlex: IX + QoS Scheduling Ring 3 Event Conditions ReFlex Server libix 7 Batched Syscalls Guest Ring 0 NVMe 6 TCP/IP TCP/IP Scheduler NVMe 8 CQ RX TX SQ 5 [Klimovic et al 17] 40
41 ReFlex: Local Remote Latency p95 Read Latency (us) Linux: 75K IOPS/core ReFlex: 850K IOPS/core Local-1T ReFlex-1T Libaio-1T IOPS (Thousands) 41
42 ReFlex: Local Remote Latency p95 Read Latency (us) Unloaded latency Local Flash 78 µs ReFlex 99 µs Libaio 121 µs Local-1T ReFlex-1T Libaio-1T IOPS (Thousands) 42
43 ReFlex: Local Remote Latency p95 Read Latency (us) ReFlex saturates Flash device Local-1T Local-2T ReFlex-1T ReFlex-2T Libaio-1T Libaio-2T IOPS (Thousands) 43
44 ReFlex: Private Shared QoS Read p95 latency (us) I/O sched disabled I/O sched enabled Latency SLO IOPS (Thousands) I/O sched disabled Tenant A IOPS SLO I/O sched enabled Tenant B IOPS SLO 0 Tenant A Tenant B Tenant C Tenant D 0 Tenant A Tenant B Tenant C Tenant D Tenants A & B: latency-critical; Tenant C + D: best effort Without scheduler: latency and bandwidth QoS for A/B are violated Scheduler rate limits best-effort tenants to enforce SLOs 44
45 ReFlex: Private Shared QoS Read p95 latency (us) I/O sched disabled I/O sched enabled Latency SLO IOPS (Thousands) I/O sched disabled Tenant A IOPS SLO I/O sched enabled Tenant B IOPS SLO 0 Tenant A Tenant B Tenant C Tenant D 0 Tenant A Tenant B Tenant C Tenant D Tenants A & B: latency-critical; Tenant C + D: best effort Without scheduler: latency and bandwidth QoS for A/B are violated Scheduler rate limits best-effort tenants to enforce SLOs 45
46 ReFlex: Private Shared QoS Read p95 latency (us) I/O sched disabled I/O sched enabled Latency SLO IOPS (Thousands) I/O sched disabled Tenant A IOPS SLO I/O sched enabled Tenant B IOPS SLO 0 Tenant A Tenant B Tenant C Tenant D 0 Tenant A Tenant B Tenant C Tenant D Tenants A & B: latency-critical; Tenant C + D: best effort Without scheduler: latency and bandwidth QoS for A/B are violated Scheduler rate limits best-effort tenants to enforce SLOs 46
47 Looking Forward Programming models for low tail latency Especially for multi-tier applications Variability within and across applications Short vs long requests Cluster-level issues Provisioning, scheduling, load-balancing, throttling, μsecond-scale tail latency Can we still get efficiency at ultra low latency? Hardware for latency-critical applications Big vs little cores, isolation mechanisms, accelerators 47
48 Conclusions Conventional wisdom is wrong Low latency is compatible with high efficiency Efficiency: energy proportionality, resource usage, throughput Encouraging initial results Room for improvements throughout the stack [Source:ThinkStock] 48
ReFlex: Remote Flash Local Flash
ReFlex: Remote Flash Local Flash Ana Klimovic Heiner Litz Christos Kozyrakis October 28, 216 IAP Cloud Workshop Flash in Datacenters Flash provides 1 higher throughput and 2 lower latency than disk PCIe
More informationReFlex: Remote Flash Local Flash
ReFlex: Remote Flash Local Flash Ana Klimovic Heiner Litz Christos Kozyrakis NVMW 18 Memorable Paper Award Finalist 1 Flash in Datacenters Flash provides 1000 higher throughput and 100 lower latency than
More informationTowards Energy Proportionality for Large-Scale Latency-Critical Workloads
Towards Energy Proportionality for Large-Scale Latency-Critical Workloads David Lo *, Liqun Cheng *, Rama Govindaraju *, Luiz André Barroso *, Christos Kozyrakis Stanford University * Google Inc. 2012
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges
More informationPocket: Elastic Ephemeral Storage for Serverless Analytics
Pocket: Elastic Ephemeral Storage for Serverless Analytics Ana Klimovic*, Yawen Wang*, Patrick Stuedi +, Animesh Trivedi +, Jonas Pfefferle +, Christos Kozyrakis* *Stanford University, + IBM Research 1
More informationWORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS
WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES ON BIG AND SMALL SERVER PLATFORMS Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium
More informationibench: Quantifying Interference in Datacenter Applications
ibench: Quantifying Interference in Datacenter Applications Christina Delimitrou and Christos Kozyrakis Stanford University IISWC September 23 th 2013 Executive Summary Problem: Increasing utilization
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this
More informationFlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs
FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Changlim Lee, Mohammad Alian, Myoungjun Chun,
More informationSEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES
SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou Cornell
More informationTowards Energy Proportionality for Large-Scale Latency-Critical Workloads
Towards Energy Proportionality for Large-Scale Latency-Critical Workloads David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso and Christos Kozyrakis Stanford University Google, Inc. Abstract Reducing
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay 1 George Prekas 2 Ana Klimovic 1 Samuel Grossman 1 Christos Kozyrakis 1 1 Stanford University Edouard Bugnion 2
More information2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.
Ethernet Storage Fabrics Using RDMA with Fast NVMe-oF Storage to Reduce Latency and Improve Efficiency Kevin Deierling & Idan Burstein Mellanox Technologies 1 Storage Media Technology Storage Media Access
More informationEnergy Proportionality and Workload Consolidation for Latency-critical Applications
Energy Proportionality and Workload Consolidation for Latency-critical Applications George Prekas 1, Mia Primorac 1, Adam Belay 2, Christos Kozyrakis 2, Edouard Bugnion 1 1 EPFL 2 Stanford Abstract Energy
More informationScaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic
More informationSELF-LEARNING CACHES IRFAN AHMAD CACHEPHYSICS. Copyright 2017 CachePhysics.
SELF-LEARNING CACHES IRFAN AHMAD CACHEPHYSICS Copyright 217 CachePhysics. ABOUT CachePhysics Irfan Ahmad CachePhysics Cofounder CloudPhysics Cofounder VMware (Kernel, Resource Management), Transmeta, 3+
More informationStorage Systems for Serverless Analytics
Storage Systems for Serverless Analytics Ana Klimovic * Yawen Wang * Christos Kozyrakis * Patrick Stuedi ⱡ Jonas Pfefferle ⱡ Animesh Trivedi ⱡ * ⱡ Serverless: a new cloud computing paradigm Users write
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationFalcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University
Falcon: Scaling IO Performance in Multi-SSD Volumes Pradeep Kumar H Howie Huang The George Washington University SSDs in Big Data Applications Recent trends advocate using many SSDs for higher throughput
More informationNext-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads
Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Liran Zvibel CEO, Co-founder WekaIO @liranzvibel 1 WekaIO Matrix: Full-featured and Flexible Public or Private S3 Compatible
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationEECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun
EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,
More informationIBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?
FlashSystem Family 2015 IBM FlashSystem IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein? PiRT - Power i Round Table 17 Sep. 2015 Daniel Gysin IBM
More informationZiye Yang. NPG, DCG, Intel
Ziye Yang NPG, DCG, Intel Agenda What is SPDK? Accelerated NVMe-oF via SPDK Conclusion 2 Agenda What is SPDK? Accelerated NVMe-oF via SPDK Conclusion 3 Storage Performance Development Kit Scalable and
More informationIO virtualization. Michael Kagan Mellanox Technologies
IO virtualization Michael Kagan Mellanox Technologies IO Virtualization Mission non-stop s to consumers Flexibility assign IO resources to consumer as needed Agility assignment of IO resources to consumer
More informationPlay2SDG: Bridging the Gap between Serving and Analytics in Scalable Web Applications
Play2SDG: Bridging the Gap between Serving and Analytics in Scalable Web Applications Panagiotis Garefalakis M.Res Thesis Presentation, 7 September 2015 Outline Motivation Challenges Scalable web app design
More informationSOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG
SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG 1 WHO ARE THOSE GUYS Accela Zhao, Technologist at EMC OCTO, active Openstack community contributor, experienced in cloud scheduling
More informationWorkloads, Scalability and QoS Considerations in CMP Platforms
Workloads, Scalability and QoS Considerations in CMP Platforms Presenter Don Newell Sr. Principal Engineer Intel Corporation 2007 Intel Corporation Agenda Trends and research context Evolving Workload
More informationRemoving the I/O Bottleneck in Enterprise Storage
Removing the I/O Bottleneck in Enterprise Storage WALTER AMSLER, SENIOR DIRECTOR HITACHI DATA SYSTEMS AUGUST 2013 Enterprise Storage Requirements and Characteristics Reengineering for Flash removing I/O
More informationRUBIK: FAST ANALYTICAL POWER MANAGEMENT
RUBIK: FAST ANALYTICAL POWER MANAGEMENT FOR LATENCY-CRITICAL SYSTEMS HARSHAD KASTURE, DAVIDE BARTOLINI, NATHAN BECKMANN, DANIEL SANCHEZ MICRO 2015 Motivation 2! Low server utilization in today s datacenters
More informationNFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications
NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan
More informationKernel Bypass. Sujay Jayakar (dsj36) 11/17/2016
Kernel Bypass Sujay Jayakar (dsj36) 11/17/2016 Kernel Bypass Background Why networking? Status quo: Linux Papers Arrakis: The Operating System is the Control Plane. Simon Peter, Jialin Li, Irene Zhang,
More informationOracle Exadata: Strategy and Roadmap
Oracle Exadata: Strategy and Roadmap - New Technologies, Cloud, and On-Premises Juan Loaiza Senior Vice President, Database Systems Technologies, Oracle Safe Harbor Statement The following is intended
More informationData Path acceleration techniques in a NFV world
Data Path acceleration techniques in a NFV world Mohanraj Venkatachalam, Purnendu Ghosh Abstract NFV is a revolutionary approach offering greater flexibility and scalability in the deployment of virtual
More informationSWAP: EFFECTIVE FINE-GRAIN MANAGEMENT
: EFFECTIVE FINE-GRAIN MANAGEMENT OF SHARED LAST-LEVEL CACHES WITH MINIMUM HARDWARE SUPPORT Xiaodong Wang, Shuang Chen, Jeff Setter, and José F. Martínez Computer Systems Lab Cornell University Page 1
More informationMoneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories
Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems
More informationAutomated Control for Elastic Storage
Automated Control for Elastic Storage Summarized by Matthew Jablonski George Mason University mjablons@gmu.edu October 26, 2015 Lim, H. C. and Babu, S. and Chase, J. S. (2010) Automated Control for Elastic
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More informationThe Hardware & Software Implications of Microservices and How Big Data Can Help
The Hardware & Software Implications of Microservices and How Big Data Can Help Christina Delimitrou Cornell University with Yu Gan, Yanqi Zhang, Shuang Chen, Neeraj Kulkarni, Ariana Bruno, Justin Hu,
More informationPRESENTATION TITLE GOES HERE
Performance Basics PRESENTATION TITLE GOES HERE Leah Schoeb, Member of SNIA Technical Council SNIA EmeraldTM Training SNIA Emerald Power Efficiency Measurement Specification, for use in EPA ENERGY STAR
More informationHighly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture
A Cost Effective,, High g Performance,, Highly Scalable, Non-RDMA NVMe Fabric Bob Hansen,, VP System Architecture bob@apeirondata.com Storage Developers Conference, September 2015 Agenda 3 rd Platform
More informationBuilding Adaptive Performance Models for Dynamic Resource Allocation in Cloud Data Centers
Building Adaptive Performance Models for Dynamic Resource Allocation in Cloud Data Centers Jin Chen University of Toronto Joint work with Gokul Soundararajan and Prof. Cristiana Amza. Today s Cloud Pay
More informationTowards Energy-Proportional Datacenter Memory with Mobile DRAM
Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University
More informationEnabling NVMe I/O Scale
Enabling NVMe I/O Determinism @ Scale Chris Petersen, Hardware System Technologist Wei Zhang, Software Engineer Alexei Naberezhnov, Software Engineer Facebook Facebook @ Scale 800 Million 1.3 Billion 2.2
More informationDesigning Next Generation FS for NVMe and NVMe-oF
Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO @liranzvibel Santa Clara, CA 1 Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO Albis Pocket
More informationSOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power
More informationNVMe Direct. Next-Generation Offload Technology. White Paper
NVMe Direct Next-Generation Offload Technology The market introduction of high-speed NVMe SSDs and 25/40/50/100Gb Ethernet creates exciting new opportunities for external storage NVMe Direct enables high-performance
More informationThe Design and Implementation of AQuA: An Adaptive Quality of Service Aware Object-Based Storage Device
The Design and Implementation of AQuA: An Adaptive Quality of Service Aware Object-Based Storage Device Joel Wu and Scott Brandt Department of Computer Science University of California Santa Cruz MSST2006
More informationLoad-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise)
Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise) Application vs. Pure Workloads Benchmarks that reproduce application workloads Assist in
More informationW H I T E P A P E R. What s New in VMware vsphere 4: Performance Enhancements
W H I T E P A P E R What s New in VMware vsphere 4: Performance Enhancements Scalability Enhancements...................................................... 3 CPU Enhancements............................................................
More informationExploring System Challenges of Ultra-Low Latency Solid State Drives
Exploring System Challenges of Ultra-Low Latency Solid State Drives Sungjoon Koh Changrim Lee, Miryeong Kwon, and Myoungsoo Jung Computer Architecture and Memory systems Lab Executive Summary Motivation.
More informationStorage Protocol Offload for Virtualized Environments Session 301-F
Storage Protocol Offload for Virtualized Environments Session 301-F Dennis Martin, President August 2016 1 Agenda About Demartek Offloads I/O Virtualization Concepts RDMA Concepts Overlay Networks and
More informationSpeeding up Linux TCP/IP with a Fast Packet I/O Framework
Speeding up Linux TCP/IP with a Fast Packet I/O Framework Michio Honda Advanced Technology Group, NetApp michio@netapp.com With acknowledge to Kenichi Yasukata, Douglas Santry and Lars Eggert 1 Motivation
More informationArachne. Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout
Arachne Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout Granular Computing Platform Zaharia Winstein Levis Applications Kozyrakis Cluster Scheduling Ousterhout Low-Latency RPC
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO FS Albis Streaming
More informationSPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation
SPDK China Summit 2018 Ziye Yang Senior Software Engineer Network Platforms Group, Intel Corporation Agenda SPDK programming framework Accelerated NVMe-oF via SPDK Conclusion 2 Agenda SPDK programming
More informationFMS18 Invited Session 101-B1 Hardware Acceleration Techniques for NVMe-over-Fabric
Flash Memory Summit 2018 Santa Clara, CA FMS18 Invited Session 101-B1 Hardware Acceleration Techniques for NVMe-over-Fabric Paper Abstract: The move from direct-attach to Composable Infrastructure is being
More informationRack-scale Data Processing System
Rack-scale Data Processing System Jana Giceva, Darko Makreshanski, Claude Barthels, Alessandro Dovis, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich Rack-scale Data Processing
More informationOracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer
Oracle Exadata X7 Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer 05.12.2017 Oracle Engineered Systems ZFS Backup Appliance Zero Data Loss Recovery Appliance Exadata Database
More informationLink Aggregation: A Server Perspective
Link Aggregation: A Server Perspective Shimon Muller Sun Microsystems, Inc. May 2007 Supporters Howard Frazier, Broadcom Corp. Outline The Good, The Bad & The Ugly LAG and Network Virtualization Networking
More informationPerformance Isolation in Multi- Tenant Relational Database-asa-Service. Sudipto Das (Microsoft Research)
Performance Isolation in Multi- Tenant Relational Database-asa-Service Sudipto Das (Microsoft Research) CREATE DATABASE CREATE TABLE SELECT... INSERT UPDATE SELECT * FROM FOO WHERE App1 App2 App3 App1
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationNested QoS: Providing Flexible Performance in Shared IO Environment
Nested QoS: Providing Flexible Performance in Shared IO Environment Hui Wang Peter Varman Rice University Houston, TX 1 Outline Introduction System model Analysis Evaluation Conclusions and future work
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationSharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment
SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment This document is provided as-is. Information and views expressed in this document, including
More informationRAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko
RAMCloud Scalable High-Performance Storage Entirely in DRAM 2009 by John Ousterhout et al. Stanford University presented by Slavik Derevyanko Outline RAMCloud project overview Motivation for RAMCloud storage:
More informationN V M e o v e r F a b r i c s -
N V M e o v e r F a b r i c s - H i g h p e r f o r m a n c e S S D s n e t w o r k e d f o r c o m p o s a b l e i n f r a s t r u c t u r e Rob Davis, VP Storage Technology, Mellanox OCP Evolution Server
More informationpblk the OCSSD FTL Linux FAST Summit 18 Javier González Copyright 2018 CNEX Labs
pblk the OCSSD FTL Linux FAST Summit 18 Javier González Read Latency Read Latency with 0% Writes Random Read 4K Percentiles 2 Read Latency Read Latency with 20% Writes Random Read 4K + Random Write 4K
More informationDeveloping deterministic networking technology for railway applications using TTEthernet software-based end systems
Developing deterministic networking technology for railway applications using TTEthernet software-based end systems Project n 100021 Astrit Ademaj, TTTech Computertechnik AG Outline GENESYS requirements
More informationSingle Root I/O Virtualization (SR-IOV) and iscsi Uncompromised Performance for Virtual Server Environments Leonid Grossman Exar Corporation
Single Root I/O Virtualization (SR-IOV) and iscsi Uncompromised Performance for Virtual Server Environments Leonid Grossman Exar Corporation Introduction to Exar iscsi project and related datacenter trends
More informationOASIS: Self-tuning Storage for Applications
OASIS: Self-tuning Storage for Applications Kostas Magoutis, Prasenjit Sarkar, Gauri Shah 14 th NASA Goddard- 23 rd IEEE Mass Storage Systems Technologies, College Park, MD, May 17, 2006 Outline Motivation
More informationNVMf based Integration of Non-volatile Memory in a Distributed System - Lessons learned
14th ANNUAL WORKSHOP 2018 NVMf based Integration of Non-volatile Memory in a Distributed System - Lessons learned Jonas Pfefferle, Bernard Metzler, Patrick Stuedi, Animesh Trivedi and Adrian Schuepbach
More informationLow-Latency Datacenters. John Ousterhout
Low-Latency Datacenters John Ousterhout The Datacenter Revolution Phase 1: Scale How to use 10,000 servers for a single application? New storage systems: Bigtable, HDFS,... New models of computation: MapReduce,
More informationImprove Web Application Performance with Zend Platform
Improve Web Application Performance with Zend Platform Shahar Evron Zend Sr. PHP Specialist Copyright 2007, Zend Technologies Inc. Agenda Benchmark Setup Comprehensive Performance Multilayered Caching
More informationSelf-driving Datacenter: Analytics
Self-driving Datacenter: Analytics George Boulescu Consulting Systems Engineer 19/10/2016 Alvin Toffler is a former associate editor of Fortune magazine, known for his works discussing the digital revolution,
More informationBalancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation
Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation Hui Wang, Peter Varman Rice University FAST 14, Feb 2014 Tiered Storage Tiered storage: HDs and SSDs q Advantages:
More informationReference Design: NVMe-oF JBOF
Reference Design: NVMe-oF JBOF 1 Composable Infrastructure Two Target Architectures RNIC RNIC Driver NVMe Driver NVMe Fabric Driver NVMe SSDs Application NVMe Driver NVMe Fabric Driver RNIC Driver RNIC
More informationThe NE010 iwarp Adapter
The NE010 iwarp Adapter Gary Montry Senior Scientist +1-512-493-3241 GMontry@NetEffect.com Today s Data Center Users Applications networking adapter LAN Ethernet NAS block storage clustering adapter adapter
More informationEfficient QoS for Multi-Tiered Storage Systems
Efficient QoS for Multi-Tiered Storage Systems Ahmed Elnably Hui Wang Peter Varman Rice University Ajay Gulati VMware Inc Tiered Storage Architecture Client Multi-Tiered Array Client 2 Scheduler SSDs...
More informationPreserving I/O Prioritization in Virtualized OSes
Preserving I/O Prioritization in Virtualized OSes Kun Suo 1, Yong Zhao 1, Jia Rao 1, Luwei Cheng 2, Xiaobo Zhou 3, Francis C. M. Lau 4 The University of Texas at Arlington 1, Facebook 2, University of
More informationEnergy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications
Energy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications George'Prekas,'Mia'Primorac,' Adam'Belay,'Christos'Kozyrakis, Edouard'Bugnion Latency(critical,applications,[OSDI14]
More informationSoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet
SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet Mao Miao, Fengyuan Ren, Xiaohui Luo, Jing Xie, Qingkai Meng, Wenxue Cheng Dept. of Computer Science and Technology, Tsinghua
More informationSun N1: Storage Virtualization and Oracle
OracleWorld 2003 Session 36707 - Sun N1: Storage Virtualization and Oracle Glenn Colaco Performance Engineer Sun Microsystems Performance and Availability Engineering September 9, 2003 Background PAE works
More informationLegUp: Accelerating Memcached on Cloud FPGAs
0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are
More informationBe Fast, Cheap and in Control with SwitchKV. Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level
More informationD E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager
1 T HE D E N A L I N E X T - G E N E R A T I O N H I G H - D E N S I T Y S T O R A G E I N T E R F A C E Laura Caulfield Senior Software Engineer Arie van der Hoeven Principal Program Manager Outline Technology
More informationDeployment Planning and Optimization for Big Data & Cloud Storage Systems
Deployment Planning and Optimization for Big Data & Cloud Storage Systems Bianny Bian Intel Corporation Outline System Planning Challenges Storage System Modeling w/ Intel CoFluent Studio Simulation Methodology
More informationMQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices
MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices Arash Tavakkol, Juan Gómez-Luna, Mohammad Sadrosadati, Saugata Ghose, Onur Mutlu February 13, 2018 Executive Summary
More informationNetworking at the Speed of Light
Networking at the Speed of Light Dror Goldenberg VP Software Architecture MaRS Workshop April 2017 Cloud The Software Defined Data Center Resource virtualization Efficient services VM, Containers uservices
More informationDeadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen
Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen Presenter: Haiying Shen Associate professor *Department of Electrical and Computer Engineering, Clemson University,
More informationCloud & Datacenter EGA
Cloud & Datacenter EGA The Stock Exchange of Thailand Materials excerpt from SET internal presentation and virtualization vendor e.g. vmware For Educational purpose and Internal Use Only SET Virtualization/Cloud
More informationMulticore ARM Processors for Safety Critical Avionics
Multicore ARM Processors for Safety Critical Avionics Gary Gilliland DDC-I Technical Marketing Manger This is a non-itar presentation, for public release and reproduction from FSW website. 1 Gary Gilliland
More informationA Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Nandita Vijaykumar Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarangnirun,
More informationBuilding a High IOPS Flash Array: A Software-Defined Approach
Building a High IOPS Flash Array: A Software-Defined Approach Weafon Tsao Ph.D. VP of R&D Division, AccelStor, Inc. Santa Clara, CA Clarification Myth 1: S High-IOPS SSDs = High-IOPS All-Flash Array SSDs
More informationAnalysis and Optimization. Carl Waldspurger Irfan Ahmad CloudPhysics, Inc.
PRESENTATION Practical Online TITLE GOES Cache HERE Analysis and Optimization Carl Waldspurger Irfan Ahmad CloudPhysics, Inc. SNIA Legal Notice The material contained in this tutorial is copyrighted by
More informationVEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief
Digital Business Demands Performance and Scale As enterprises shift to online and softwaredriven business models, Oracle infrastructure is being pushed to run at exponentially higher scale and performance.
More informationAkaros. Barret Rhoden, Kevin Klues, Andrew Waterman, Ron Minnich, David Zhu, Eric Brewer UC Berkeley and Google
Akaros Barret Rhoden, Kevin Klues, Andrew Waterman, Ron Minnich, David Zhu, Eric Brewer UC Berkeley and Google High Level View Research OS, made for single nodes Large-scale SMP / many-core architectures
More informationAbstract. Testing Parameters. Introduction. Hardware Platform. Native System
Abstract In this paper, we address the latency issue in RT- XEN virtual machines that are available in Xen 4.5. Despite the advantages of applying virtualization to systems, the default credit scheduler
More information