Bare Metal Library. Abstractions for modern hardware Cyprien Noel
|
|
- Cody Moody
- 5 years ago
- Views:
Transcription
1 Bare Metal Library Abstractions for modern hardware Cyprien Noel
2 Plan Modern Hardware? New challenges & opportunities Three use cases Current solutions Leveraging hardware Simple abstraction
3 Myself High performance trading systems Lock-free algos, distributed systems H2O Distributed CPU machine learning, async SGD Flickr Scaling deep learning on GPU Multi GPU Caffe RDMA, multicast, distributed Hogwild CaffeOnSpark UC Berkeley NCCL Caffe, GPU cluster tooling Bare Metal
4 Modern Hardware?
5 Device-to-device networks
6
7 Moving from ms software to µs hardware Number crunching GPU FS, block io, virt mem Pmem Network stack RDMA RAID, replication Erasure codes Device mem Coherent fabrics And more: Video, crypto etc.
8 OS abstractions replaced by CUDA OFED Libpmem DPDK SPDK Libfabric UCX VMA More every week... More powerful, but also more complex and non-interoperable
9 Summary So Far Big changes coming! At least for high-performance applications CPU should orchestrate Not in critical path Device-to-device networks Retrofitting existing architectures difficult CPU-centric abstractions ms software on µs hardware (e.g. 100s instructions per packet) OK in some cases, e.g. VMA (kernel bypass sockets), but much lower acceleration, most features inexpressible
10 What do we do? Start from scratch? E.g. Google Fushia - no fs, block io, network etc. Very interesting but future work Use already accelerated frameworks? E.g. PyTorch, BeeGFS Not general purpose, no interop, not device-to-device Work incrementally from use cases Look for simplest hardware solution Hopefully useful abstractions will emerge
11 Use cases Build datasets Add, update elements Apply functions to sets, map-reduce Data versioning Training & inference Compute graphs, pipelines Deployment Model versioning
12 Datasets Typical solution Protobuf messages KV store Dist. file system Limitations Serialization granularity Copies: kv log, kernel1, replication, kernel2, fs Remote CPU involved, stragglers Cannot place data in device (x12)
13 EC shard Datasets Simplest hardware implementation Write protobuf in arena, like Flatbuffers Pick an offset on disks, e.g. a namespace Call ibv_exp_ec_encode_async Comments Management, coordination, crash resiliency Thin wrapper over HW: line rate perf. User abstraction? Simple, familiar Efficient, device friendly (x12)
14 mmap Extension to classic mmap Distributed Typed - Protobuf, other formats planned Protobuf is amazing Forward and backward compatible Lattice
15 mmap C++ const Test& test = mmap<test>("/test"); int i = test.field(); Python test = Test() bm.mmap("/test", test) i = test.field()
16 mmap, recap Simple abstraction for data storage Fully accelerated, mechanically friendly Thin wrapper over HW, device-to-device, zero copy ~1.5x replication factor Network automatically balanced Solves straggler problem No memory pinning or TLB thrashing, NUMA aware
17 Use cases Compute Map-reduce, compute graphs, pipelines Typical setup Spark, DL frameworks Distribution using Akka, grpc, MPI Kubernetes or SLURM scheduling Limitations No interop Placement difficult Inefficient resources allocation
18 Compute Simplest hardware implementation Define a task, e.g. img. resize, CUDA kernel, PyTorch graph Place tasks in queue Work stealing - RDMA atomics Device-to-device chaining - GPU Direct Async User abstraction?
19 task def compute(x, y): return x * y # Runs locally compute(1, 2) # Might be rebalanced on cluster data = bm.list() bm.mmap("/data", data) compute(data, 2)
20 task, recap Simple abstraction for CPU and device kernels Work stealing instead of explicit schedule No GPU hoarding Better work balancing Dynamic placement, HA Device-to-device chaining Data placed directly in device memory Efficient pipelines, even very short tasks E.g. model parallelism, low latency inference
21 Use cases Versioning Track datasets and models Deploy / rollback models Typical setup Copy before update Symlinks as versions to data Staging / production environments split
22 Versioning Simplest hardware implementation Keep multiple write ahead logs mmap updates tasks queues User abstraction?
23 branch Like a git branch But any size data Simplifies collaboration, experimentation Generalized staging / production split Simplifies HA File system fsync, msync (Very hard! Rajimwale et al. DSN 11) Replaces transactions, e.g. queues, persistent memory Allows duplicate work merge
24 branch C++ Test* test = mutable_mmap<test>("/test"); branch b; # Only visible in current branch test->set_field(12); Similar in Python
25 Summary mmap, task, and branch simplify hardware-acceleration Helps build pipelines, manage cluster resources etc. Early micro benchmarks suggest very high performance
26 Thank You! Will be open sourced BSD Contact me if interested - cyprien.noel@berkeley.edu Thanks to our sponsor
Why AI Frameworks Need (not only) RDMA?
Why AI Frameworks Need (not only) RDMA? With Design and Implementation Experience of Networking Support on TensorFlow GDR, Apache MXNet, WeChat Amber, and Tencent Angel Bairen Yi (byi@connect.ust.hk) Jingrong
More informationDeep Learning Frameworks with Spark and GPUs
Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,
More informationThe Path to GPU as a Service in Kubernetes Renaud Gaubert Lead Kubernetes Engineer
The Path to GPU as a Service in Kubernetes Renaud Gaubert , Lead Kubernetes Engineer May 03, 2018 RUNNING A GPU APPLICATION Customers using DL DL Application RHEL 7.3 CUDA 8.0 Driver 375
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationA performance comparison of Deep Learning frameworks on KNL
A performance comparison of Deep Learning frameworks on KNL R. Zanella, G. Fiameni, M. Rorro Middleware, Data Management - SCAI - CINECA IXPUG Bologna, March 5, 2018 Table of Contents 1. Problem description
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationCurrent Topics in OS Research. So, what s hot?
Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general
More informationRemote Persistent Memory SNIA Nonvolatile Memory Programming TWG
Remote Persistent Memory SNIA Nonvolatile Memory Programming TWG Tom Talpey Microsoft 2018 Storage Developer Conference. SNIA. All Rights Reserved. 1 Outline SNIA NVMP TWG activities Remote Access for
More informationODP Relationship to NFV. Bill Fischofer, LNG 31 October 2013
ODP Relationship to NFV Bill Fischofer, LNG 31 October 2013 Alphabet Soup NFV - Network Functions Virtualization, a carrier initiative organized under ETSI (European Telecommunications Standards Institute)
More informationBen Walker Data Center Group Intel Corporation
Ben Walker Data Center Group Intel Corporation Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation.
More informationSNIA NVM Programming Model Workgroup Update. #OFADevWorkshop
SNIA NVM Programming Model Workgroup Update #OFADevWorkshop Persistent Memory (PM) Vision Fast Like Memory PM Brings Storage PM Durable Like Storage To Memory Slots 2 Latency Thresholds Cause Disruption
More informationNVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit
NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit Ben Walker Data Center Group Intel Corporation 2018 Storage Developer Conference. Intel Corporation. All Rights Reserved. 1 Notices
More informationScalable Distributed Training with Parameter Hub: a whirlwind tour
Scalable Distributed Training with Parameter Hub: a whirlwind tour TVM Stack Optimization High-Level Differentiable IR Tensor Expression IR AutoTVM LLVM, CUDA, Metal VTA AutoVTA Edge FPGA Cloud FPGA ASIC
More informationRDMA Requirements for High Availability in the NVM Programming Model
RDMA Requirements for High Availability in the NVM Programming Model Doug Voigt HP Agenda NVM Programming Model Motivation NVM Programming Model Overview Remote Access for High Availability RDMA Requirements
More informationGPUDIRECT: INTEGRATING THE GPU WITH A NETWORK INTERFACE DAVIDE ROSSETTI, SW COMPUTE TEAM
GPUDIRECT: INTEGRATING THE GPU WITH A NETWORK INTERFACE DAVIDE ROSSETTI, SW COMPUTE TEAM GPUDIRECT FAMILY 1 GPUDirect Shared GPU-Sysmem for inter-node copy optimization GPUDirect P2P for intra-node, accelerated
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number
More informationASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed
ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER 80 GBIT/S OVER IP USING DPDK Performance, Code, and Architecture Charles Shiflett Developer of next-generation
More informationDeep learning in MATLAB From Concept to CUDA Code
Deep learning in MATLAB From Concept to CUDA Code Roy Fahn Applications Engineer Systematics royf@systematics.co.il 03-7660111 Ram Kokku Principal Engineer MathWorks ram.kokku@mathworks.com 2017 The MathWorks,
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationAccessing NVM Locally and over RDMA Challenges and Opportunities
Accessing NVM Locally and over RDMA Challenges and Opportunities Wendy Elsasser Megan Grodowitz William Wang MSST - May 2018 Emerging NVM A wide variety of technologies with varied characteristics Address
More informationRoCE vs. iwarp Competitive Analysis
WHITE PAPER February 217 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...5 Summary...6
More informationThe Legion Mapping Interface
The Legion Mapping Interface Mike Bauer 1 Philosophy Decouple specification from mapping Performance portability Expose all mapping (perf) decisions to Legion user Guessing is bad! Don t want to fight
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationHigh-Performance Training for Deep Learning and Computer Vision HPC
High-Performance Training for Deep Learning and Computer Vision HPC Panel at CVPR-ECV 18 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationTECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS. Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016
TECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016 MULTI GPU PROGRAMMING Node 0 Node 1 Node N-1 MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM
More informationPM Support in Linux and Windows. Dr. Stephen Bates, CTO, Eideticom Neal Christiansen, Principal Development Lead, Microsoft
PM Support in Linux and Windows Dr. Stephen Bates, CTO, Eideticom Neal Christiansen, Principal Development Lead, Microsoft Windows Support for Persistent Memory 2 Availability of Windows PM Support Client
More informationA Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP
A Brief Introduction of TiDB Dongxu (Edward) Huang CTO, PingCAP About me Dongxu (Edward) Huang, Cofounder & CTO of PingCAP PingCAP, based in Beijing, China. Infrastructure software engineer, open source
More informationGo Deep: Fixing Architectural Overheads of the Go Scheduler
Go Deep: Fixing Architectural Overheads of the Go Scheduler Craig Hesling hesling@cmu.edu Sannan Tariq stariq@cs.cmu.edu May 11, 2018 1 Introduction Golang is a programming language developed to target
More informationArrakis: The Operating System is the Control Plane
Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building
More informationUsing the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver
Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data
More informationFIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS
WHITE PAPER FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS Over the past 15 years, server virtualization has become the preferred method of application deployment in the enterprise datacenter.
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationChelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING
Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity
More informationMaximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD
Maximum Performance How to get it and how to avoid pitfalls Christoph Lameter, PhD cl@linux.com Performance Just push a button? Systems are optimized by default for good general performance in all areas.
More informationTensorFlow: A System for Learning-Scale Machine Learning. Google Brain
TensorFlow: A System for Learning-Scale Machine Learning Google Brain The Problem Machine learning is everywhere This is in large part due to: 1. Invention of more sophisticated machine learning models
More informationThe SNIA NVM Programming Model. #OFADevWorkshop
The SNIA NVM Programming Model #OFADevWorkshop Opportunities with Next Generation NVM NVMe & STA SNIA 2 NVM Express/SCSI Express: Optimized storage interconnect & driver SNIA NVM Programming TWG: Optimized
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU
More informationBrent Gorda. General Manager, High Performance Data Division
Brent Gorda General Manager, High Performance Data Division Legal Disclaimer Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the
More informationHETEROGENEOUS MEMORY MANAGEMENT. Linux Plumbers Conference Jérôme Glisse
HETEROGENEOUS MEMORY MANAGEMENT Linux Plumbers Conference 2018 Jérôme Glisse EVERYTHING IS A POINTER All data structures rely on pointers, explicitly or implicitly: Explicit in languages like C, C++,...
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training
More informationDesigning Next Generation FS for NVMe and NVMe-oF
Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO @liranzvibel Santa Clara, CA 1 Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationSharing High-Performance Devices Across Multiple Virtual Machines
Sharing High-Performance Devices Across Multiple Virtual Machines Preamble What does sharing devices across multiple virtual machines in our title mean? How is it different from virtual networking / NSX,
More informationIsoStack Highly Efficient Network Processing on Dedicated Cores
IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영 1 Outline Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions
More informationFarewell to Servers: Resource Disaggregation
Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang 2 Monolithic Computer OS / Hypervisor 3 Can monolithic Application Hardware servers
More informationCERN openlab & IBM Research Workshop Trip Report
CERN openlab & IBM Research Workshop Trip Report Jakob Blomer, Javier Cervantes, Pere Mato, Radu Popescu 2018-12-03 Workshop Organization 1 full day at IBM Research Zürich ~25 participants from CERN ~10
More informationProgramming Systems for Big Data
Programming Systems for Big Data CS315B Lecture 17 Including material from Kunle Olukotun Prof. Aiken CS 315B Lecture 17 1 Big Data We ve focused on parallel programming for computational science There
More informationOrphans, Corruption, Careful Write, and Logging, or Gfix says my database is CORRUPT or Database Integrity - then, now, future
Orphans, Corruption, Careful Write, and Logging, or Gfix says my database is CORRUPT or Database Integrity - then, now, future Ann W. Harrison James A. Starkey A Word of Thanks to our Sponsors And to Vlad
More informationENHANCE APPLICATION SCALABILITY AND AVAILABILITY WITH NGINX PLUS AND THE DIAMANTI BARE-METAL KUBERNETES PLATFORM
JOINT SOLUTION BRIEF ENHANCE APPLICATION SCALABILITY AND AVAILABILITY WITH NGINX PLUS AND THE DIAMANTI BARE-METAL KUBERNETES PLATFORM DIAMANTI PLATFORM AT A GLANCE Modern load balancers which deploy as
More informationSoftware and Tools for HPE s The Machine Project
Labs Software and Tools for HPE s The Machine Project Scalable Tools Workshop Aug/1 - Aug/4, 2016 Lake Tahoe Milind Chabbi Traditional Computing Paradigm CPU DRAM CPU DRAM CPU-centric computing 2 CPU-Centric
More informationKubernetes Integration with Virtuozzo Storage
Kubernetes Integration with Virtuozzo Storage A Technical OCTOBER, 2017 2017 Virtuozzo. All rights reserved. 1 Application Container Storage Application containers appear to be the perfect tool for supporting
More informationBuilding an Operating System for AI
Building an Operating System for AI How Microservices and Serverless Computing Enable the Next Generation of Machine Intelligence Diego Oppenheimer, CEO diego@algorithmia.com About Me Diego Oppenheimer
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationMessaging Overview. Introduction. Gen-Z Messaging
Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional
More informationFusion Engine Next generation storage engine for Flash- SSD and 3D XPoint storage system
Fusion Engine Next generation storage engine for Flash- SSD and 3D XPoint storage system Fei Liu, Sheng Qiu, Jianjian Huo, Shu Li Alibaba Group Santa Clara, CA 1 Software overhead become critical Legacy
More informationINTRODUCTION TO CEPH. Orit Wasserman Red Hat August Penguin 2017
INTRODUCTION TO CEPH Orit Wasserman Red Hat August Penguin 2017 CEPHALOPOD A cephalopod is any member of the molluscan class Cephalopoda. These exclusively marine animals are characterized by bilateral
More informationMesosphere and the Enterprise: Run Your Applications on Apache Mesos. Steve Wong Open Source Engineer {code} by Dell
Mesosphere and the Enterprise: Run Your Applications on Apache Mesos Steve Wong Open Source Engineer {code} by Dell EMC @cantbewong Open source at Dell EMC {code} by Dell EMC is a group of passionate open
More information238P: Operating Systems. Lecture 5: Address translation. Anton Burtsev January, 2018
238P: Operating Systems Lecture 5: Address translation Anton Burtsev January, 2018 Two programs one memory Very much like car sharing What are we aiming for? Illusion of a private address space Identical
More informationGPUfs: Integrating a File System with GPUs. Yishuai Li & Shreyas Skandan
GPUfs: Integrating a File System with GPUs Yishuai Li & Shreyas Skandan Von Neumann Architecture Mem CPU I/O Von Neumann Architecture Mem CPU I/O slow fast slower Direct Memory Access Mem CPU I/O slow
More informationNUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems
NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems Carl Pearson 1, I-Hsin Chung 2, Zehra Sura 2, Wen-Mei Hwu 1, and Jinjun Xiong 2 1 University of Illinois Urbana-Champaign, Urbana
More informationKubernetes The Path to Cloud Native
Kubernetes The Path to Cloud Native Eric Brewer VP, Infrastructure @eric_brewer August 28, 2015 ACM SOCC Cloud Na*ve Applica*ons Middle of a great transition unlimited ethereal resources in the Cloud an
More informationFeature Comparison Summary
Feature Comparison Summary, and The cloud-ready operating system Thanks to cloud technology, the rate of change is faster than ever before, putting more pressure on IT. Organizations demand increased security,
More informationArchitecture of a Real-Time Operational DBMS
Architecture of a Real-Time Operational DBMS Srini V. Srinivasan Founder, Chief Development Officer Aerospike CMG India Keynote Thane December 3, 2016 [ CMGI Keynote, Thane, India. 2016 Aerospike Inc.
More informationWindows Persistent Memory Support
Windows Persistent Memory Support Neal Christiansen Microsoft Agenda Review: Existing Windows PM Support What s New New PM APIs Large & Huge Page Support Dax aware Write-ahead LOG Improved Driver Model
More informationCisco Container Platform
Cisco Container Platform Pradnesh Patil Suhail Syed Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session in the Cisco Live Mobile App 2. Click
More informationOPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA
OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work
More informationMultiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)
Lecture 15 Multiple Processor Systems Multiple Processor Systems Multiprocessors Multicomputers Continuous need for faster computers shared memory model message passing multiprocessor wide area distributed
More informationDeploying Applications on DC/OS
Mesosphere Datacenter Operating System Deploying Applications on DC/OS Keith McClellan - Technical Lead, Federal Programs keith.mcclellan@mesosphere.com V6 THE FUTURE IS ALREADY HERE IT S JUST NOT EVENLY
More informationA Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan
LegoOS A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang Y 4 1 2 Monolithic Server OS / Hypervisor 3 Problems? 4 cpu mem Resource
More informationDesigning Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen
Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services Presented by: Jitong Chen Outline Architecture of Web-based Data Center Three-Stage framework to benefit
More informationInside Broker How Broker Leverages the C++ Actor Framework (CAF)
Inside Broker How Broker Leverages the C++ Actor Framework (CAF) Dominik Charousset inet RG, Department of Computer Science Hamburg University of Applied Sciences Bro4Pros, February 2017 1 What was Broker
More informationTensorFlowOnSpark Scalable TensorFlow Learning on Spark Clusters Lee Yang, Andrew Feng Yahoo Big Data ML Platform Team
TensorFlowOnSpark Scalable TensorFlow Learning on Spark Clusters Lee Yang, Andrew Feng Yahoo Big Data ML Platform Team What is TensorFlowOnSpark Why TensorFlowOnSpark at Yahoo? Major contributor to open-source
More informationLoad-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise)
Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise) Application vs. Pure Workloads Benchmarks that reproduce application workloads Assist in
More informationAn NVMe-based Offload Engine for Storage Acceleration Sean Gibb, Eideticom Stephen Bates, Raithlin
An NVMe-based Offload Engine for Storage Acceleration Sean Gibb, Eideticom Stephen Bates, Raithlin 1 Overview Acceleration for Storage NVMe for Acceleration How are we using (abusing ;-)) NVMe to support
More informationIntroducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile
Tegile Systems 1 Introducing Tegile Company Overview Product Overview Solutions & Use Cases Partnering with Tegile 2 Company Overview Company Overview Te gile - [tey-jile] Tegile = technology + agile Founded
More informationDPDK Summit China 2017
DPDK Summit China 2017 2 Practice of Network Monitoring and Security Technologies in Cloud Data Center Kai, Wang YunShan Networks Data center is evolving to be cloud based and software defined The monitoring
More informationMoneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010
Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed
More informationImportant new NVMe features for optimizing the data pipeline
Important new NVMe features for optimizing the data pipeline Dr. Stephen Bates, CTO Eideticom Santa Clara, CA 1 Outline Intro to NVMe Controller Memory Buffers (CMBs) Use cases for CMBs Submission Queue
More informationSystem Design for a Million TPS
System Design for a Million TPS Hüsnü Sensoy Global Maksimum Data & Information Technologies Global Maksimum Data & Information Technologies Focused just on large scale data and information problems. Complex
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationspin: High-performance streaming Processing in the Network
T. HOEFLER, S. DI GIROLAMO, K. TARANOV, R. E. GRANT, R. BRIGHTWELL spin: High-performance streaming Processing in the Network spcl.inf.ethz.ch The Development of High-Performance Networking Interfaces
More informationFast Forward I/O & Storage
Fast Forward I/O & Storage Eric Barton Lead Architect 1 Department of Energy - Fast Forward Challenge FastForward RFP provided US Government funding for exascale research and development Sponsored by 7
More information#ebpf You Cannot Stop This
#ebpf You Cannot Stop This David S. Miller, Red Hat Inc. davem_dokebi Van, The Man Ultimately, the true father of container networking and security Invented BPF in 1992 with Steven McCanne at LBL Recognized
More informationSurvey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016
Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016 VNFaaS (Virtual Network Function as a Service) In our present work, we consider the VNFaaS use-case
More informationHow Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC
How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator
More informationHow Microsoft Built MySQL, PostgreSQL and MariaDB for the Cloud. Santa Clara, California April 23th 25th, 2018
How Microsoft Built MySQL, PostgreSQL and MariaDB for the Cloud Santa Clara, California April 23th 25th, 2018 Azure Data Service Architecture Share Cluster with SQL DB Azure Infrastructure Services Azure
More informationNext-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads
Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Liran Zvibel CEO, Co-founder WekaIO @liranzvibel 1 WekaIO Matrix: Full-featured and Flexible Public or Private S3 Compatible
More informationOPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS
OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1 Why GPUs? A Tale of Numbers 100x Performance Increase Infrastructure Cost Savings Performance 100x gains over traditional
More informationPersistent Memory: The Value to HPC and the Challenges
Persistent Memory: The Value to HPC and the Challenges November 12, 2017 Andy Rudoff Principal Engineer, NVM Software Intel Corporation Data Center Group Intel Persistent Memory New Type of Memory Persistent,
More informationData Transformation and Migration in Polystores
Data Transformation and Migration in Polystores Adam Dziedzic, Aaron Elmore & Michael Stonebraker September 15th, 2016 Agenda Data Migration for Polystores: What & Why? How? Acceleration of physical data
More informationEbbRT: A Framework for Building Per-Application Library Operating Systems
EbbRT: A Framework for Building Per-Application Library Operating Systems Overview Motivation Objectives System design Implementation Evaluation Conclusion Motivation Emphasis on CPU performance and software
More informationMike Kania Truss
Mike Kania Engineer @ Truss http://truss.works/ MongoDB on AWS With Minimal Suffering + Topics Provisioning MongoDB Replica Sets on AWS Choosing storage and a storage engine Backups Monitoring Capacity
More informationMemory management. Requirements. Relocation: program loading. Terms. Relocation. Protection. Sharing. Logical organization. Physical organization
Requirements Relocation Memory management ability to change process image position Protection ability to avoid unwanted memory accesses Sharing ability to share memory portions among processes Logical
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationWhat s Wrong with the Operating System Interface? Collin Lee and John Ousterhout
What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout Goals for the OS Interface More convenient abstractions than hardware interface Manage shared resources Provide near-hardware
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More information