Farewell to Servers: Resource Disaggregation
|
|
- Douglas Welch
- 5 years ago
- Views:
Transcription
1 Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang
2 2
3 Monolithic Computer OS / Hypervisor 3
4 Can monolithic Application Hardware servers continue to Heterogeneity meet datacenter needs? Flexibility Perf / $
5 TPU GPU FPGA HBM ASIC NVM NVMe DNA Storage 5
6 Making new hardware work with existing servers is like fitting puzzles 6
7 Can monolithic Application Hardware servers continue to Heterogeneity meet datacenter needs? Flexibility Perf / $
8 Poor Hardware Elasticity Hard to change hardware components - Add (hotplug), remove, reconfigure, restart No fine-grained failure handling - The failure of one device can crash a whole machine 8
9 Can monolithic Application Hardware servers continue to Heterogeneity meet datacenter needs? Flexibility Perf / $
10 Poor Resource Utilization Whole VM/container has to run on one physical machine - Move current applications to make room for new ones wasted! cpu mem Server 1 Server 2 Job 12 Available Space Required Space 10
11 Resource Utilization in Production Clusters * Google Production Cluster Trace Data. * Alibaba Production Cluster Trace Data. Unused Resource + Waiting/Killed Jobs Because of Physical-Node Constraints 11
12 Can monolithic Application Hardware servers continue to Heterogeneity meet datacenter needs? Flexibility Perf / $
13 How to achieve better heterogeneity, flexibility, and perf/$? Go beyond physical node boundary 13
14 Resource Disaggregation: Breaking monolithic servers into networkattached, independent hardware components 14
15 15
16 Application Flexibility Heterogeneity Hardware Perf / $ Network 16
17 Why Possible Now? Network is faster InfiniBand (200Gbps, 600ns) Optical Fabric (400Gbps, 100ns) Berkeley Firebox More processing power at device SmartNIC, SmartSSD, PIM Network interface closer to device Omni-Path, Innova-2 HP The Machine Intel Rack-Scale System IBM Composable System 17
18 Disaggregated Datacenter End-to-End Solution Unmodified Application Dist Sys Performance Heterogeneity OS Flexibility Network Reliability Hardware $ Cost
19 Disaggregated Datacenter End-to-End Solution Physically Disaggregated Resources Disaggregated Operating System (OSDI 18) New Processor and Memory Architecture Networking for Disaggregated Resources Kernel-Level RDMA Virtualization (SOSP 17) RDMA Network
20 20
21 Can Existing Kernels Fit? monolithic kernel CPU microkernel CPU Kern Core Kern GPU Kern P-NIC mem NIC mem NIC Disk Server Disk network across servers Server Monolithic/Micro-kernel (e.g., Linux, L4) Shared Main Memory Disk NIC Monolithic Server Multikernel (e.g., Barrelfish, Helios, fos) 21
22 Existing Kernels Don t Fit Access remote resources Network Distributed resource mgmt Fine-grained failure handling 22
23 When hardware is disaggregated The OS should be also 23
24 OS Virtual File & Process Memory Storage Mgmt System System Network 24
25 Network Process Mgmt Network Virtual Memory System Network File & Storage System Network File & Storage System Network 25
26 The Splitkernel Architecture Split OS functions into monitors Run each monitor at h/w device Process Monitor Processor (CPU) GPU Minitor Processor (GPU) XPU Manager New h/w (XPU) Network messaging across non-coherent components Distributed resource mgmt and failure handling network messaging across non-coherent components Memory Monitor NVM Monitor HDD Monitor SSD Monitor Memory NVM Hard Disk SSD 26
27 LegoOS The First Disaggregated OS Memory Processor Storage NVM 27
28 How Should LegoOS Appear to Users? As a set of hardware devices? As a giant machine? Our answer: as a set of virtual Nodes (vnodes) - Similar semantics to virtual machines - Unique vid, vip, storage mount point - Can run on multiple processor, memory, and storage components 28
29 Abstraction - vnode vnode1 Process Monitor Processor (CPU) GPU Minitor Processor (GPU) XPU Manager New h/w (XPU) vnode2 network messaging across non-coherent components Memory Monitor NVM Monitor HDD Monitor SSD Monitor Memory NVM Hard Disk SSD One vnode can run multiple hardware components One hardware component can run multiple vnodes 29
30 Abstraction Appear as vnodes to users Linux ABI compatible Support unmodified Linux system call interface (common ones) A level of indirection to translate Linux interface to LegoOS interface 30
31 LegoOS Design 1. Clean separation of OS and hardware functionalities 2. Build monitor with hardware constraints 3. RDMA-based message passing for both kernel and applications 4. Two-level distributed resource management 5. Memory failure tolerance through replication 31
32 Separate Processor and Memory CPU Processor $ CPU $ Last-Level TLB MMU DRAM PT 32
33 Separate Processor and Memory CPU Processor $ CPU $ Last-Level Separate and move hardware units Network to memory component Memory TLB MMU DRAM PT Memory 33
34 Separate Processor and Memory Virtual Memory CPU Processor $ CPU $ Last-Level Separate and move hardware units Network to memory component Memory TLB MMU DRAM PT Memory 34
35 Separate Processor and Memory CPU Processor $ CPU $ Last-Level Separate and move virtual memory system Network Memory to memory component TLB MMU Virtual Memory DRAM PT Memory 35
36 Separate Processor and Memory Virtual Address Virtual Address CPU Virtual Address Virtual Address Processor $ CPU $ Last-Level Network Processor components only see virtual memory addresses All levels of cache are virtual cache Memory TLB MMU Virtual Memory DRAM PT Memory Memory components manage virtual and physical memory 36
37 Challenge: network is 2x-4x slower than memory bus 37
38 Add Extended Cache at Processor CPU Processor $ CPU $ Last-Level Network Memory TLB MMU Virtual Memory DRAM PT Memory 38
39 Add Extended Cache at Processor Processor CPU $ CPU $ Last-Level DRAM Add small DRAM/HBM at processor Use it as Extended Cache, or ExCache TLB DRAM Network MMU Virtual Memory PT Memory Software and hardware comanaged Memory Inclusive Virtual cache 39
40 LegoOS Design 1. Clean separation of OS and hardware functionalities 2. Build monitor with hardware constraints 3. RDMA-based message passing for both kernel and applications 4. Two-level distributed resource management 5. Memory failure tolerance through replication 40
41 Distributed Resource Management Process Monitor Processor (CPU) GPU Minitor Processor (GPU) Global Resource Mgmt Global Process Manager (GPM) Global Memory Manager (GMM) network messaging across non-coherent components Global Storage Manager (GSM) Memory Monitor NVM Monitor HDD Monitor SSD Monitor 1. Coarse-grain allocation Memory NVM Hard Disk SSD 2. Load-balancing 3. Failure handling 41
42 Implementation and Emulation Process Monitor Processor Processor CPU CPU LLC CPU CPU Disk Reserve DRAM as ExCache (4KB page as cache line) h/w only on hit path, s/w managed miss path ExCache Indirection layer to store states for 113 Linux syscalls RDMA Network Memory Monitor Linux Kernel Module CPU CPU CPU CPU CPU LLC Disk LLC Disk DRAM DRAM Memory Storage Memory Limit number of cores, kernel-space only Storage/Global Resource Monitors Implemented as kernel module on Linux Network RDMA RPC stack based on LITE [SOSP 17] 42
43 Slowdown Performance Evaluation ExCache/Memory Size (MB) LegoOS Config: 1P, 1M, 1S Linux swap SSD Linux swap ramdisk InfiniSwap LegoOS Unmodified TensorFlow, running CIFAR-10 Working set: 0.9G 4 threads Systems in comparison Baseline: Linux with unlimited memory Swap to SSD, and ramdisk Only 1.3x to 1.7x slowdown when InfiniSwap [NSDI 17] disaggregating devices with LegoOS To gain better resource packing, 43 elasticity, and fault tolerance!
44 LegoOS Summary Resource disaggregation calls for new system LegoOS: a new OS designed and built from scratch for datacenter resource disaggregation Split OS into distributed micro-os services, running at device Many challenges and many potentials 44
45 Disaggregated Datacenter flexible, heterogeneous, elastic, perf/$, resilient, scalable, easy-to-use Physically Disaggregated Resources Disaggregated Operating System (OSDI 18) New Processor and Memory Architecture Networking for Disaggregated Resources Kernel-Level RDMA Virtualization (SOSP 17) RDMA Network
46 Network Requirements for Resource Disaggregation Low latency High bandwidth Scale RDMA Reliable 46
47 RDMA (Remote Direct Memory Access) CPU User Kernel Memory NIC CPU User Kernel Memory NIC Directly read/write remote memory Bypass kernel Socket over Ethernet Memory zero copy CPU User CPU User Benefits: Kernel Kernel Low latency Memory NIC Memory NIC High throughput Low CPU utilization RDMA 47
48 Things have worked well in HPC Special hardware Few applications Cheaper developer 48
49 RDMA-Based Datacenter Applications Pilaf [ATC 13] HERD-RPC [ATC 16] Cell [ATC 16] FaRM [NSDI 14] Wukong [OSDI 16] FaSST [OSDI 16] HERD [SIGCOMM 14] Hotpot [SoCC 17] NAM-DB [VLDB 17] RSI [VLDB 16] DrTM [SOSP 15] APUS [SoCC 17] Octopus [ATC 17] DrTM+R [EuroSys 16] FaRM+Xact Mojim [SOSP 15] [ASPLOS 15] 49
50 Things have worked well in HPC Special hardware Few applications Cheaper developer What about datacenters? Commodity, cheaper hardware Many (changing) applications Resource sharing and isolation 50
51 Native RDMA User Space Conn Mgmt User-Level RDMA App send recv Connections Queues Keys node, lkey, rkey addr Mem Mgmt Memory space Library Kernel Space OS Kernel Bypassing Hardware RNIC Permission check Address mapping Cached PTEs lkey 1 lkey n rkey 1 rkey n 51
52 Userspace Fat applications No resource sharing Hardware Low-level Difficult to use Difficult to share RDMA Socket Developers want High-level Easy to use Resource share Isolation Abstraction Mismatch 52
53 Things have worked well in HPC Special hardware Few applications Cheaper developer What about datacenters? Commodity, cheaper hardware Many (changing) applications Resource sharing and isolation 53
54 Userspace Native RDMA Hardware User Space Conn Mgmt User-Level RDMA App send recv Connections Queues Keys node, lkey, rkey addr Mem Mgmt Memory space Library Kernel Space OS Kernel Bypassing Hardware RNIC Permission check Address mapping Cached PTEs lkey 1 lkey n rkey 1 rkey n 54
55 Userspace On-NIC SRAM stores and caches metadata Hardware Requests /us Expensive, Total Size (MB) Write-64B Write-1K unscalable hardware 55
56 Things have worked well in HPC Special hardware Few applications Cheaper developer What about datacenters? Commodity, cheaper hardware Many (changing) applications Resource sharing and isolation 56
57 Fat applications No resource sharing Expensive, unscalable hardware Are we removing too much from kernel? 57
58 LITE - Without Local Indirection Kernel TiEr High-level High-level abstraction abstraction Resource Resource sharing sharing Protection Performance isolation Performance isolation Protection 58
59 User Space Conn Mgmt User-Level RDMA App send recv Connections Queues Keys node, lkey, rkey addr Mem Mgmt Memory space Library Hardware RNIC Permission check Address mapping Cached PTEs lkey 1 lkey n rkey 1 rkey n 59
60 Simpler applications User-Level RDMA App User Space Conn Mgmt send recv node, lkey, rkey addr LITE APIs Memory RPC/Msg APIs Sync APIs Mem Mgmt Kernel Space LITE Connections Queues Keys Memory space Hardware RNIC Permission check Address mapping Cached PTEs lkey 1 lkey n rkey 1 rkey n 60
61 Simpler applications User-Level RDMA App User Space Conn Mgmt send recv node, lkey, rkey addr Mem Mgmt LITE APIs Memory RPC/Msg APIs Sync APIs Kernel Space LITE Connections Queues Keys Permission check Address mapping Global lkey Memory space Global rkey Hardware RNIC Global lkey Global rkey Cheaper hardware Scalable performance 61
62 Implementing Remote memset Native RDMA LITE 62
63 Main Challenge: How to preserve the performance benefit of RDMA? 63
64 LITE Design Principles 1.Indirection only at local node 2.Avoid hardware-level indirection 3.Hide kernel-space crossing cost except for the problem of too many layers of indirection David Wheeler Great Performance and Scalability 64
65 LITE RDMA:Size of MR Scalability Requests /us Write-64B LITE_write-64B Write-1K LITE_write-1K 0 LITE 1 scales 4 much 16 better 64 than 256 native 1024 RDMA wrt MR Total size Size (MB) and numbers 65
66 LITE Application Effort Application LOC LOC using LITE Student Days LITE-Log LITE-MapReduce 600* 49 4 LITE-Graph LITE-Kernel-DSM Simple to use Needs no expert knowledge Flexible, powerful abstraction Easy to achieve optimized performance * LITE-MapReduce ports from the 3000-LOC Phoenix with 600 lines of change or addition 66
67 MapReduce Results LITE-MapReduce adapted from Phoenix [1] Runtime (sec) Hadoop Phoenix LITE LITE-MapReduce 2 outperforms Hadoop by 4.3x to 5.3x 0 Phoenix 2-node 4-node 8-node [1]: Ranger etal., Evaluating MapReduce for Multi-core and Multiprocessor Systems. (HPCA 07) 67
68 LITE Summary Virtualizes RDMA into flexible, easy-to-use abstraction Preserves RDMA s performance benefits Indirection not always degrade performance! Division across user space, kernel, and hardware 68
69 Disaggregated Datacenter flexible, heterogeneous, elastic, perf/$, resilient, scalable, easy-to-use Physically Disaggregated Resources Disaggregated Operating System (OSDI 18) New Processor and Memory Architecture Networking for Disaggregated Resources Kernel-Level RDMA Virtualization (SOSP 17) RDMA Network
70 Disaggregated Datacenter flexible, heterogeneous, elastic, perf/$, resilient, scalable, easy-to-use Physically Disaggregated Resources Virtually Disaggregated Resources Disaggregated OS (OSDI 18) New Processor and Memory Architecture Disaggregated Persistent Memory Network-Attached NVM Distributed Shared Persistent Memory (SoCC 17) Distributed Non-Volatile Memory Networking for Disaggregated Resources Kernel-Level RDMA Virtualization (SOSP 17) RDMA Network New Network Topology, Routing, Congestion-Ctrl InfiniBand
71 Conclusion New hardware and software trends point to resource disaggregation My research pioneers in building an end-to-end solution for disaggregated datacenter Opens up new research opportunities in hardware, software, networking, security, and programming language 71
72 Thank you Questions? wuklab.io
Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation
Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang Datacenter 3 Monolithic Computer OS / Hypervisor 4 Can monolithic Application Hardware
More informationA Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan
LegoOS A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang Y 4 1 2 Monolithic Server OS / Hypervisor 3 Problems? 4 cpu mem Resource
More informationLITE Kernel RDMA. Support for Datacenter Applications. Shin-Yeh Tsai, Yiying Zhang
LITE Kernel RDMA Support for Datacenter Applications Shin-Yeh Tsai, Yiying Zhang Time 2 Berkeley Socket Userspace Kernel Hardware Time 1983 2 Berkeley Socket TCP Offload engine Arrakis & mtcp IX Userspace
More informationInfiniswap. Efficient Memory Disaggregation. Mosharaf Chowdhury. with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin
Infiniswap Efficient Memory Disaggregation Mosharaf Chowdhury with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow
More informationDistributed Shared Persistent Memory
Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying Zhang Persistent Memory (PM/NVM) Byte Addressable Persistent CPU Cache Low Latency Capacity Cost effective PM DRAM 2 Many PM Work, but
More informationEfficient Memory Disaggregation with Infiniswap. Juncheng Gu, Youngmoon Lee, Yiwen Zhang, MosharafChowdhury, Kang G. Shin
Efficient Memory Disaggregation with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, MosharafChowdhury, Kang G. Shin Agenda Motivation and related work Design and system overview Implementation and evaluation
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO Albis Pocket
More informationRDMA and Hardware Support
RDMA and Hardware Support SIGCOMM Topic Preview 2018 Yibo Zhu Microsoft Research 1 The (Traditional) Journey of Data How app developers see the network Under the hood This architecture had been working
More informationPocket: Elastic Ephemeral Storage for Serverless Analytics
Pocket: Elastic Ephemeral Storage for Serverless Analytics Ana Klimovic*, Yawen Wang*, Patrick Stuedi +, Animesh Trivedi +, Jonas Pfefferle +, Christos Kozyrakis* *Stanford University, + IBM Research 1
More informationNear- Data Computa.on: It s Not (Just) About Performance
Near- Data Computa.on: It s Not (Just) About Performance Steven Swanson Non- Vola0le Systems Laboratory Computer Science and Engineering University of California, San Diego 1 Solid State Memories NAND
More informationSmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center
SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center Jeff Defilippi Senior Product Manager Arm #Arm Tech Symposia The Cloud to Edge Infrastructure Foundation for a World of 1T Intelligent
More informationCSC501 Operating Systems Principles. OS Structure
CSC501 Operating Systems Principles OS Structure 1 Announcements q TA s office hour has changed Q Thursday 1:30pm 3:00pm, MRC-409C Q Or email: awang@ncsu.edu q From department: No audit allowed 2 Last
More informationIO virtualization. Michael Kagan Mellanox Technologies
IO virtualization Michael Kagan Mellanox Technologies IO Virtualization Mission non-stop s to consumers Flexibility assign IO resources to consumer as needed Agility assignment of IO resources to consumer
More informationDeconstructing RDMA-enabled Distributed Transaction Processing: Hybrid is Better!
Deconstructing RDMA-enabled Distributed Transaction Processing: Hybrid is Better! Xingda Wei, Zhiyuan Dong, Rong Chen, Haibo Chen Institute of Parallel and Distributed Systems (IPADS) Shanghai Jiao Tong
More informationNetwork Requirements for Resource Disaggregation
Network Requirements for Resource Disaggregation Peter Gao (Berkeley), Akshay Narayan (MIT), Sagar Karandikar (Berkeley), Joao Carreira (Berkeley), Sangjin Han (Berkeley), Rachit Agarwal (Cornell), Sylvia
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationDesigning Next Generation FS for NVMe and NVMe-oF
Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO @liranzvibel Santa Clara, CA 1 Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO
More informationFlavors of Memory supported by Linux, their use and benefit. Christoph Lameter, Ph.D,
Flavors of Memory supported by Linux, their use and benefit Christoph Lameter, Ph.D, Twitter: @qant Flavors Of Memory The term computer memory is a simple term but there are numerous nuances
More informationMoneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010
Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed
More information2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.
Ethernet Storage Fabrics Using RDMA with Fast NVMe-oF Storage to Reduce Latency and Improve Efficiency Kevin Deierling & Idan Burstein Mellanox Technologies 1 Storage Media Technology Storage Media Access
More informationN V M e o v e r F a b r i c s -
N V M e o v e r F a b r i c s - H i g h p e r f o r m a n c e S S D s n e t w o r k e d f o r c o m p o s a b l e i n f r a s t r u c t u r e Rob Davis, VP Storage Technology, Mellanox OCP Evolution Server
More informationStrata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson
A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements
More informationMessaging Overview. Introduction. Gen-Z Messaging
Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional
More informationSolros: A Data-Centric Operating System Architecture for Heterogeneous Computing
Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing Changwoo Min, Woonhak Kang, Mohan Kumar, Sanidhya Kashyap, Steffen Maass, Heeseung Jo, Taesoo Kim Virginia Tech, ebay, Georgia
More informationCS 261 Fall Mike Lam, Professor. Virtual Memory
CS 261 Fall 2016 Mike Lam, Professor Virtual Memory Topics Operating systems Address spaces Virtual memory Address translation Memory allocation Lingering questions What happens when you call malloc()?
More informationEnhancing NVMe-oF Capabilities Using Storage Abstraction
Enhancing NVMe-oF Capabilities Using Storage Abstraction SNIA Storage Developer Conference Yaron Klein and Verly Gafni-Hoek February 2018 2018 Toshiba Memory America, Inc. Outline NVMe SSD Overview NVMe-oF
More informationSoftware and Tools for HPE s The Machine Project
Labs Software and Tools for HPE s The Machine Project Scalable Tools Workshop Aug/1 - Aug/4, 2016 Lake Tahoe Milind Chabbi Traditional Computing Paradigm CPU DRAM CPU DRAM CPU-centric computing 2 CPU-Centric
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO FS Albis Streaming
More informationHighly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture
A Cost Effective,, High g Performance,, Highly Scalable, Non-RDMA NVMe Fabric Bob Hansen,, VP System Architecture bob@apeirondata.com Storage Developers Conference, September 2015 Agenda 3 rd Platform
More informationgenzconsortium.org Gen-Z Technology: Enabling Memory Centric Architecture
Gen-Z Technology: Enabling Memory Centric Architecture Why Gen-Z? Gen-Z Consortium 2017 2 Why Gen-Z? Gen-Z Consortium 2017 3 Why Gen-Z? Businesses Need to Monetize Data Big Data AI Machine Learning Deep
More informationLeveraging Flash in HPC Systems
Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,
More informationSCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH
Faculty of Computer Science Institute of Systems Architecture, Operating Systems Group SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH LAYER CAKE Application Runtime OS Kernel ISA Physical RAM 2 COMMODITY
More informationRAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationFaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs
FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia (CMU), Michael Kaminsky (Intel Labs), David Andersen (CMU) RDMA RDMA is a network feature that
More informationWindows Support for PM. Tom Talpey, Microsoft
Windows Support for PM Tom Talpey, Microsoft Agenda Windows and Windows Server PM Industry Standards Support PMDK Support Hyper-V PM Support SQL Server PM Support Storage Spaces Direct PM Support SMB3
More informationAccessing NVM Locally and over RDMA Challenges and Opportunities
Accessing NVM Locally and over RDMA Challenges and Opportunities Wendy Elsasser Megan Grodowitz William Wang MSST - May 2018 Emerging NVM A wide variety of technologies with varied characteristics Address
More informationWindows Support for PM. Tom Talpey, Microsoft
Windows Support for PM Tom Talpey, Microsoft Agenda Industry Standards Support PMDK Open Source Support Hyper-V Support SQL Server Support Storage Spaces Direct Support SMB3 and RDMA Support 2 Windows
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More informationIsoStack Highly Efficient Network Processing on Dedicated Cores
IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single
More informationPASTE: A Network Programming Interface for Non-Volatile Main Memory
PASTE: A Network Programming Interface for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Giuseppe Lettieri (Università di Pisa) Lars Eggert and Douglas Santry (NetApp) USENIX NSDI 2018
More informationThe SNIA NVM Programming Model. #OFADevWorkshop
The SNIA NVM Programming Model #OFADevWorkshop Opportunities with Next Generation NVM NVMe & STA SNIA 2 NVM Express/SCSI Express: Optimized storage interconnect & driver SNIA NVM Programming TWG: Optimized
More informationChelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING
Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity
More informationExtending RDMA for Persistent Memory over Fabrics. Live Webcast October 25, 2018
Extending RDMA for Persistent Memory over Fabrics Live Webcast October 25, 2018 Today s Presenters John Kim SNIA NSF Chair Mellanox Tony Hurson Intel Rob Davis Mellanox SNIA-At-A-Glance 3 SNIA Legal Notice
More informationLoad-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise)
Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise) Application vs. Pure Workloads Benchmarks that reproduce application workloads Assist in
More informationUnevenly Distributed. Adrian
Unevenly Distributed Adrian Colyer @adriancolyer blog.acolyer.org 350 Foundations Frontiers 5 Reasons to
More informationDesigning High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing
Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing Talk at Storage Developer Conference SNIA 2018 by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu
More informationFlexible Architecture Research Machine (FARM)
Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense
More informationPreemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization
Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Wei Chen, Jia Rao*, and Xiaobo Zhou University of Colorado, Colorado Springs * University of Texas at Arlington Data Center
More informationEfficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han
Efficient Memory Mapped File I/O for In-Memory File Systems Jungsik Choi, Jiwon Kim, Hwansoo Han Operations Per Second Storage Latency Close to DRAM SATA/SAS Flash SSD (~00μs) PCIe Flash SSD (~60 μs) D-XPoint
More informationProtoFlex: FPGA Accelerated Full System MP Simulation
ProtoFlex: FPGA Accelerated Full System MP Simulation Eric S. Chung, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi, Ken Mai Computer Architecture Lab at Our work in this area has been supported in part
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
FUT3040BU Storage at Memory Speed: Finally, Nonvolatile Memory Is Here Rajesh Venkatasubramanian, VMware, Inc Richard A Brunner, VMware, Inc #VMworld #FUT3040BU Disclaimer This presentation may contain
More informationTailwind: Fast and Atomic RDMA-based Replication. Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes
Tailwind: Fast and Atomic RDMA-based Replication Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes In-Memory Key-Value Stores General purpose in-memory key-value stores are widely used nowadays
More informationFaRM: Fast Remote Memory
FaRM: Fast Remote Memory Problem Context DRAM prices have decreased significantly Cost effective to build commodity servers w/hundreds of GBs E.g. - cluster with 100 machines can hold tens of TBs of main
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 205 Lecture 23 LAST TIME: VIRTUAL MEMORY! Began to focus on how to virtualize memory! Instead of directly addressing physical memory, introduce a level of
More informationVirtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011
Virtual Memory CS6, Lecture 5 Prof. Stephen Chong October 2, 2 Announcements Midterm review session: Monday Oct 24 5:3pm to 7pm, 6 Oxford St. room 33 Large and small group interaction 2 Wall of Flame Rob
More informationNVthreads: Practical Persistence for Multi-threaded Applications
NVthreads: Practical Persistence for Multi-threaded Applications Terry Hsu*, Purdue University Helge Brügner*, TU München Indrajit Roy*, Google Inc. Kimberly Keeton, Hewlett Packard Labs Patrick Eugster,
More informationMemory Management. Disclaimer: some slides are adopted from book authors slides with permission 1
Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 CPU management Roadmap Process, thread, synchronization, scheduling Memory management Virtual memory Disk
More informationNo Tradeoff Low Latency + High Efficiency
No Tradeoff Low Latency + High Efficiency Christos Kozyrakis http://mast.stanford.edu Latency-critical Applications A growing class of online workloads Search, social networking, software-as-service (SaaS),
More informationPractical Near-Data Processing for In-Memory Analytics Frameworks
Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard
More informationPASTE: A Networking API for Non-Volatile Main Memory
PASTE: A Networking API for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Lars Eggert (NetApp) Douglas Santry (NetApp) TSVAREA@IETF 99, Prague May 22th 2017 More details at our HotNets
More informationCS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives
CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives Virtual Machines Resource Virtualization Separating the abstract view of computing resources from the implementation of these resources
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More information2 nd Half. Memory management Disk management Network and Security Virtual machine
Final Review 1 2 nd Half Memory management Disk management Network and Security Virtual machine 2 Abstraction Virtual Memory (VM) 4GB (32bit) linear address space for each process Reality 1GB of actual
More informationVirtual Memory. CS 351: Systems Programming Michael Saelee
Virtual Memory CS 351: Systems Programming Michael Saelee registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) previously: SRAM
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number
More informationCSE 560 Computer Systems Architecture
This Unit: CSE 560 Computer Systems Architecture App App App System software Mem I/O The operating system () A super-application Hardware support for an Page tables and address translation s and hierarchy
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationvirtual memory Page 1 CSE 361S Disk Disk
CSE 36S Motivations for Use DRAM a for the Address space of a process can exceed physical memory size Sum of address spaces of multiple processes can exceed physical memory Simplify Management 2 Multiple
More informationSOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power
More informationVirtual Memory Nov 9, 2009"
Virtual Memory Nov 9, 2009" Administrivia" 2! 3! Motivations for Virtual Memory" Motivation #1: DRAM a Cache for Disk" SRAM" DRAM" Disk" 4! Levels in Memory Hierarchy" cache! virtual memory! CPU" regs"
More informationRDMA Requirements for High Availability in the NVM Programming Model
RDMA Requirements for High Availability in the NVM Programming Model Doug Voigt HP Agenda NVM Programming Model Motivation NVM Programming Model Overview Remote Access for High Availability RDMA Requirements
More informationSurvey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016
Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016 VNFaaS (Virtual Network Function as a Service) In our present work, we consider the VNFaaS use-case
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges
More informationDesigning Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen
Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services Presented by: Jitong Chen Outline Architecture of Web-based Data Center Three-Stage framework to benefit
More informationVirtual Memory 2: demand paging
Virtual Memory : demand paging also: anatomy of a process Guillaume Salagnac Insa-Lyon IST Semester Fall 8 Reminder: OS duties CPU CPU cache (SRAM) main memory (DRAM) fast storage (SSD) large storage (disk)
More informationCS6453. Data-Intensive Systems: Rachit Agarwal. Technology trends, Emerging challenges & opportuni=es
CS6453 Data-Intensive Systems: Technology trends, Emerging challenges & opportuni=es Rachit Agarwal Slides based on: many many discussions with Ion Stoica, his class, and many industry folks Servers Typical
More informationA Distributed Hash Table for Shared Memory
A Distributed Hash Table for Shared Memory Wytse Oortwijn Formal Methods and Tools, University of Twente August 31, 2015 Wytse Oortwijn (Formal Methods and Tools, AUniversity Distributed of Twente) Hash
More informationVirtual Memory. Kevin Webb Swarthmore College March 8, 2018
irtual Memory Kevin Webb Swarthmore College March 8, 2018 Today s Goals Describe the mechanisms behind address translation. Analyze the performance of address translation alternatives. Explore page replacement
More information24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.
24-vm.txt Mon Nov 21 22:13:36 2011 1 Notes on Virtual Machines 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Tannenbaum, 3.2 Barham, et al., "Xen and the art of virtualization,"
More informationHow Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC
How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator
More informationNOW and the Killer Network David E. Culler
NOW and the Killer Network David E. Culler culler@cs http://now.cs.berkeley.edu NOW 1 Remember the Killer Micro 100,000,000 10,000,000 R10000 Pentium Transistors 1,000,000 100,000 i80286 i80386 R3000 R2000
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationWORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS
WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES ON BIG AND SMALL SERVER PLATFORMS Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium
More informationMaximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD
Maximum Performance How to get it and how to avoid pitfalls Christoph Lameter, PhD cl@linux.com Performance Just push a button? Systems are optimized by default for good general performance in all areas.
More informationDistributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju
Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationWhy AI Frameworks Need (not only) RDMA?
Why AI Frameworks Need (not only) RDMA? With Design and Implementation Experience of Networking Support on TensorFlow GDR, Apache MXNet, WeChat Amber, and Tencent Angel Bairen Yi (byi@connect.ust.hk) Jingrong
More informationM7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle
M7: Next Generation SPARC Hotchips 26 August 12, 2014 Stephen Phillips Senior Director, SPARC Architecture Oracle Safe Harbor Statement The following is intended to outline our general product direction.
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 12 -- Virtual Memory 2014-2-27 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L12: Virtual
More informationDON T CRY OVER SPILLED RECORDS Memory elasticity of data-parallel applications and its application to cluster scheduling
DON T CRY OVER SPILLED RECORDS Memory elasticity of data-parallel applications and its application to cluster scheduling Călin Iorgulescu (EPFL), Florin Dinu (EPFL), Aunn Raza (NUST Pakistan), Wajih Ul
More informationVirtual Memory I. CSE 351 Spring Instructor: Ruth Anderson
Virtual Memory I CSE 35 Spring 27 Instructor: Ruth Anderson Teaching Assistants: Dylan Johnson Kevin Bi Linxing Preston Jiang Cody Ohlsen Yufang Sun Joshua Curtis Administrivia Midterms Graded If you did
More informationECE 598-MS: Advanced Memory and Storage Systems Lecture 7: Unified Address Translation with FlashMap
ECE 598-MS: Advanced Memory and Storage Systems Lecture 7: Unified Address Translation with Map Jian Huang Use As Non-Volatile Memory DRAM (nanoseconds) Application Memory Component SSD (microseconds)
More informationColin Cunningham, Intel Kumaran Siva, Intel Sandeep Mahajan, Oracle 03-Oct :45 p.m. - 5:30 p.m. Moscone West - Room 3020
Colin Cunningham, Intel Kumaran Siva, Intel Sandeep Mahajan, Oracle 03-Oct-2017 4:45 p.m. - 5:30 p.m. Moscone West - Room 3020 Big Data Talk Exploring New SSD Usage Models to Accelerate Cloud Performance
More informationMoneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories
Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems
More informationMultifunction Networking Adapters
Ethernet s Extreme Makeover: Multifunction Networking Adapters Chuck Hudson Manager, ProLiant Networking Technology Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information contained
More informationRealizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics
Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics Zvonimir Z. Bandic, Sr. Director, Next Generation Platform Technologies Western Digital Corporation
More informationNVMe over Fabrics. High Performance SSDs networked over Ethernet. Rob Davis Vice President Storage Technology, Mellanox
NVMe over Fabrics High Performance SSDs networked over Ethernet Rob Davis Vice President Storage Technology, Mellanox Ilker Cebeli Senior Director of Product Planning, Samsung May 3, 2017 Storage Performance
More informationMemory Management. Disclaimer: some slides are adopted from book authors slides with permission 1
Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 Recap Paged MMU: Two main Issues Translation speed can be slow TLB Table size is big Multi-level page table
More informationNext-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads
Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Liran Zvibel CEO, Co-founder WekaIO @liranzvibel 1 WekaIO Matrix: Full-featured and Flexible Public or Private S3 Compatible
More information