Clipper A Low-Latency Online Prediction Serving System
|
|
- Marcia Underwood
- 5 years ago
- Views:
Transcription
1 Clipper A Low-Latency Online Prediction Serving System Daniel Crankshaw Xin Wang, Giulio Zhou, Michael Franklin, Joseph Gonzalez, Ion Stoica NSDI 2017 March 29, 2017
2 Learning Big Data Training Complex Model
3 Learning Big Data Training Complex Model
4 Learning Produces a Trained Model Query Decision CAT Model
5 Learning Serving Query Big Data Training Decision? Model Application
6 Serving Learning Big Data Training Query Decision Model Application Prediction-Serving for interactive applications Timescale: ~10s of milliseconds
7 Prediction-Serving Raises New Challenges
8 Prediction-Serving Challenges??? Create Caffe 8 VW Support low-latency, highthroughput serving workloads Large and growing ecosystem of ML models and frameworks
9 Support low-latency, high-throughput serving workloads Models getting more complex Ø 10s of GFLOPs [1] Deployed on critical path Ø Maintain SLOs under heavy load [1] Deep Residual Learning for Image Recognition. He et al. CVPR Using specialized hardware for predictions
10 Google Translate Serving 140 billion words a day 1 Invented New Hardware! Tensor Processing Unit (TPU) 82,000 GPUs running 24/7 [1]
11 Big Companies Build One-Off Systems Problems: Ø Expensive to build and maintain Ø Highly specialized and require ML and systems expertise Ø Tightly-coupled model and application Ø Difficult to change or update model Ø Only supports single ML framework
12 Prediction-Serving Challenges??? Create Caffe 12 VW Support low-latency, highthroughput serving workloads Large and growing ecosystem of ML models and frameworks
13 Large and growing ecosystem of ML models and frameworks Fraud Detection Content Rec. Personal Asst. Robotic Control Machine Translation??? Create VW Caffe 13
14 Large and growing ecosystem of ML models and frameworks Fraud Detection Content Rec. Personal Asst. Difficult to deploy and Robotic Control Machine Translation brittle to manage Create Caffe??? Varying physical resource requirements VW
15 But most companies can t build new serving systems
16 Use existing systems: Offline Scoring Big Data Training Model Batch Analytics
17 Use existing systems: Offline Scoring Datastore Big Data Training Scoring X Y Model Batch Analytics
18 Use existing systems: Offline Scoring Look up decision in datastore X Y Query Decision Application Low-Latency Serving
19 Use existing systems: Offline Scoring Look up decision in datastore Problems: X Y Query Ø Requires full set of queries ahead of time Ø Small and bounded input domain Ø Wasted computation and space Decision Ø Can render and store unneeded predictions Ø Costly to update Ø Re-run batch job Application Low-Latency Serving
20 Prediction-Serving Challenges??? Create Caffe 20 VW Support low-latency, highthroughput serving workloads Large and growing ecosystem of ML models and frameworks
21 How does Clipper address these challenges?
22 Clipper Solutions q Simplifies deployment through layered architecture q Serves many models across ML frameworks concurrently q Employs caching, batching, scale-out for high-performance serving
23 Clipper Decouples Applications and Models Applications Predict RPC/REST Interface Feedback Clipper RPC RPC RPC RPC Model Container (MC) MC MC MC Caffe
24 Clipper Architecture Applications Predict RPC/REST Interface Clipper Observe Improve accuracy through bandit methods and ensembles, online learning, and personalization Provide a common interface to models while bounding latency and maximizing throughput. Model Selection Layer Model Abstraction Layer RPC RPC RPC RPC Model Container (MC) MC MC MC
25 Clipper Architecture Applications Predict RPC/REST Interface Clipper Observe Selection Policy Caching Adaptive Batching Model Selection Layer Model Abstraction Layer RPC RPC RPC RPC Model Container (MC) MC MC MC
26 Clipper Implementation Applications Predict RPC/REST Interface Clipper Observe Core system: 5000 lines of Rust Caching RPC: Model Abstraction Layer Adaptive Batching 100 lines of Python RPC 250 lines of Rust RPC RPC RPC 200 lines of C++ Model Container (MC) ust MC MC MC
27 Caching Adaptive Batching Model Abstraction Layer RPC Model Container (MC) RPC RPC RPC MC MC MC Caffe
28 Caching Adaptive Batching Model Abstraction Layer RPC RPC RPC RPC Model Container (MC) MC MC MC Caffe Common Interface à Simplifies Deployment: Ø Evaluate models using original code & systems
29 Container-based Model Deployment Implement Model API: class ModelContainer: def init (model_data) def predict_batch(inputs)
30 Container-based Model Deployment Implement Model API: class ModelContainer: def init (model_data) def predict_batch(inputs) Ø Implemented in many languages Ø Ø Ø Python Java C/C++
31 Container-based Model Deployment Model implementation packaged in container Model Container (MC) class ModelContainer: def init (model_data) def predict_batch(inputs)
32 Container-based Model Deployment Clipper RPC RPC RPC RPC Model Container (MC) MC MC MC Caffe
33 Caching Adaptive Batching Model Abstraction Layer RPC RPC RPC RPC Model Container (MC) MC MC MC Caffe Common Interface à Simplifies Deployment: Ø Evaluate models using original code & systems Ø Models run in separate processes as Docker containers Ø Resource isolation
34 Caching Adaptive Batching Model Abstraction Layer RPC RPC RPC RPC RPC RPC l Container (MC) MC MC MC MC MC Caffe Common Interface à Simplifies Deployment: Ø Evaluate models using original code & systems Ø Models run in separate processes as Docker containers Ø Resource isolation Ø Scale-out Problem: frameworks optimized for batch processing not latency
35 Batching to Improve Throughput Ø Why batching helps: A single page load may generate many queries Ø Optimal batch depends on: Ø hardware configuration Ø model and framework Ø system load Hardware Acceleration Helps amortize system overhead
36 Adaptive Batching to Improve Throughput Ø Why batching helps: A single page load may generate many queries Ø Optimal batch depends on: Ø hardware configuration Ø model and framework Ø system load Clipper Solution: Hardware Acceleration Helps amortize system overhead Adaptively tradeoff latency and throughput Ø Inc. batch size until the latency objective is exceeded (Additive Increase) Ø If latency exceeds SLO cut batch size by a fraction (Multiplicative Decrease)
37 Tensor Flow Conv. Net (GPU) Better Throughput (Queries Per Second) Batch Size
38 Tensor Flow Conv. Net (GPU) Better Throughput (Queries Per Second) Latency (ms) Better Batch Size
39 Tensor Flow Conv. Net (GPU) Better Throughput (Queries Per Second) Optimal Batch Size Latency (ms) Better Batch Size
40 Better 1R BatFKLng Throughput (QPS) R-2S RandRP )RreVt (6KOearn) LLnear 690 (3y6SarN) LLnear 690 (6KLearn) KerneO 690 (6KLearn) 1921 LRg RegreVVLRn (6KLearn)
41 Better AdaStLve 1R BatFKLng Throughput (QPS) R-2S andRP )RreVt (6KOearn) LLnear 690 (3y6SarN) LLnear 690 (6KLearn) KerneO 690 (6KLearn) LRg 5egreVVLRn (6KLearn)
42 Better AdaStLve 1R BatFKLng Throughput (QPS) P99 Latency (ms) 20 ms is Fast Enough Better R-2S andRP )RreVt (6KOearn) 20 0 LLnear 690 (3y6SarN) 20 0 LLnear 690 (6KLearn) 28 5 KerneO 690 (6KLearn) 20 0 LRg 5egreVVLRn (6KLearn)
43 AdaStLve 1R BatFKLng Throughput (QPS) P99 Latency (ms) Batch Size R-2S andRP )RreVt (6KOearn) LLnear 690 (3y6SarN) 1318 LLnear 690 (6KLearn) 3 KerneO 690 (6KLearn) 1224 LRg 5egreVVLRn (6KLearn)
44 Overhead of decoupled architecture Applications Predict RPC/REST Interface Feedback Clipper RPC RPC RPC RPC MC MC MC MC Caffe
45 Overhead of decoupled architecture Applications Predict RPC/REST Interface Clipper RPC RPC RPC RPC MC MC MC MC Caffe Feedback Applications Predict RPC Interface TensorFlow- Serving
46 Overhead of decoupled architecture Applications Applications Predict RPC/REST Interface Feedback Predict RPC Interface Clipper RPC RPC RPC RPC Model Container MC MC MC Caffe TensorFlow- Serving
47 Overhead of decoupled architecture Model: AlexNet trained on CIFAR-10 Better P99 Latency (ms) Throughput (QPS) Better
48 Clipper Architecture Applications Predict RPC/REST Interface Clipper Observe Improve accuracy through bandit methods and ensembles, online learning, and personalization Provide a common interface to models while bounding latency and maximizing throughput. Model Selection Layer Model Abstraction Layer RPC RPC RPC RPC Model Container (MC) MC MC MC
49 Clipper Improve accuracy through bandit methods and ensembles, online learning, and personalization Model Selection Layer Version 1 Version 2 Version 3 Periodic retraining Caffe Experiment with new models and frameworks
50 Selection Policy: Estimate confidence Version 2 Version 3 CAT CAT CAT CAT Policy CAT CONFIDENT Caffe
51 Selection Policy: Estimate confidence Caffe CAT MAN CAT SPACE Policy CAT UNSURE
52 Selection Policy: Estimate confidence Better 7Rp-5 ErrRr 5ate Image1et crniident unsure ensemele 4-agree 5-agree
53 Selection Policy: Estimate confidence Better 7Rp-5 ErrRr 5ate Image1et crniident unsure ensemele 4-agree 5-agree width is percentage of query workloads
54 Clipper Selection Policy Model Selection Layer Selection policies supported by Clipper ØExploit multiple models to estimate confidence Ø Use multi-armed bandit algorithms to learn optimal model-selection online Ø Online personalization across ML frameworks *See paper for details
55 Conclusion Ø Prediction-serving is an important and challenging area for systems research Ø Support low-latency, high-throughput serving workloads Ø Serve large and growing ecosystem of ML frameworks Ø Clipper is a first step towards addressing these challenges Ø Simplifies deployment through layered architecture Ø Serves many models across ML frameworks concurrently Ø Employs caching, adaptive batching, container scale-out to meet interactive serving workload demands Ø Beyond academic prototype to build a real, open-source system crankshaw@cs.berkeley.edu
56 GPU Cluster Scaling TKrRugKput (10K Tps) Latency (Ps) Agg 10Gbps Agg 1Gbps 0ean 10Gbps 0ean 1Gbps 0ean 10Gbps 0ean 1Gbps Gbps 399 1Gbps uPber Rf 5eplLcas
Prediction Systems. Dan Crankshaw UCB RISE Lab Seminar 10/3/2015
Prediction Systems Dan Crankshaw UCB RISE Lab Seminar 10/3/2015 Learning Big Data Training Big Model Timescale: minutes to days Systems: offline and batch optimized Heavily studied... major focus of the
More informationIntelligent Services Serving Machine Learning
Intelligent Services Serving Machine Learning Joseph E. Gonzalez jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc. Contemporary Learning Systems Big Data
More informationSCALABLE, LOW LATENCY MODEL SERVING AND MANAGEMENT WITH VELOX
THE MISSING PIECE IN COMPLEX ANALYTICS: SCALABLE, LOW LATENCY MODEL SERVING AND MANAGEMENT WITH VELOX Daniel Crankshaw, Peter Bailis, Joseph Gonzalez, Haoyuan Li, Zhao Zhang, Ali Ghodsi, Michael Franklin,
More informationLecture 12: Model Serving. CSE599W: Spring 2018
Lecture 12: Model Serving CSE599W: Spring 2018 Deep Learning Applications That drink will get you to 2800 calories for today I last saw your keys in the store room Remind Tom of the party You re on page
More informationTensorFlow: A System for Learning-Scale Machine Learning. Google Brain
TensorFlow: A System for Learning-Scale Machine Learning Google Brain The Problem Machine learning is everywhere This is in large part due to: 1. Invention of more sophisticated machine learning models
More informationEFFICIENT INFERENCE WITH TENSORRT. Han Vanholder
EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60
More informationOPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS
OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1 Why GPUs? A Tale of Numbers 100x Performance Increase Infrastructure Cost Savings Performance 100x gains over traditional
More informationDeep Learning Inference as a Service
Deep Learning Inference as a Service Mohammad Babaeizadeh Hadi Hashemi Chris Cai Advisor: Prof Roy H. Campbell Use case 1: Model Developer Use case 1: Model Developer Inference Service Use case
More informationDeploying Machine Learning Models in Practice
Deploying Machine Learning Models in Practice Nick Pentreath Principal Engineer @MLnick About @MLnick on Twitter & Github Principal Engineer, IBM CODAIT - Center for Open-Source Data & AI Technologies
More informationNetezza The Analytics Appliance
Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationAn introduction to Machine Learning silicon
An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional
More informationHPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov
HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,
More informationDeploying Deep Learning Networks to Embedded GPUs and CPUs
Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1 MATLAB Deep Learning Framework Access Data Design + Train
More informationData 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.
17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationPoseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters Hao Zhang Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jianliang Wei, Pengtao Xie,
More informationAnalytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation
Analytic Cloud with Shelly Garion IBM Research -- Haifa 2014 IBM Corporation Why Spark? Apache Spark is a fast and general open-source cluster computing engine for big data processing Speed: Spark is capable
More informationAccelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru
More informationGPU ACCELERATED DATABASE MANAGEMENT SYSTEMS
CIS 601 - Graduate Seminar Presentation 1 GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS PRESENTED BY HARINATH AMASA CSU ID: 2697292 What we will talk about.. Current problems GPU What are GPU Databases GPU
More informationDeep Learning Frameworks with Spark and GPUs
Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,
More informationDeep Learning Inference on Openshift with GPUs
Deep Learning Inference on Openshift with GPUs OpenShift Commons, Seattle, Dec 10 2018 Tripti Singhal Product Manager, NVIDIA Deep Learning Software Tushar Katarki Product Manager, AI on OpenShift AGENDA
More informationCloud, Big Data & Linear Algebra
Cloud, Big Data & Linear Algebra Shelly Garion IBM Research -- Haifa 2014 IBM Corporation What is Big Data? 2 Global Data Volume in Exabytes What is Big Data? 2005 2012 2017 3 Global Data Volume in Exabytes
More informationNo Tradeoff Low Latency + High Efficiency
No Tradeoff Low Latency + High Efficiency Christos Kozyrakis http://mast.stanford.edu Latency-critical Applications A growing class of online workloads Search, social networking, software-as-service (SaaS),
More informationBringing Data to Life
Bringing Data to Life Data management and Visualization Techniques Benika Hall Rob Harrison Corporate Model Risk March 16, 2018 Introduction Benika Hall Analytic Consultant Wells Fargo - Corporate Model
More informationReal-Time Decisions Using ML on the Google Cloud Platform. Przemysław Pastuszka & Carlos Garcia QCon London 7th March 2018
Real-Time Decisions Using ML on the Google Cloud Platform Przemysław Pastuszka & Carlos Garcia QCon London 7th March 2018 How many of you are interested in machine learning? but how many of you are running
More informationWhy AI Frameworks Need (not only) RDMA?
Why AI Frameworks Need (not only) RDMA? With Design and Implementation Experience of Networking Support on TensorFlow GDR, Apache MXNet, WeChat Amber, and Tencent Angel Bairen Yi (byi@connect.ust.hk) Jingrong
More informationDeep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications Jongsoo Park Facebook AI System SW/HW Co-design Team Sep-21 2018 Team Introduction
More informationSharing High-Performance Devices Across Multiple Virtual Machines
Sharing High-Performance Devices Across Multiple Virtual Machines Preamble What does sharing devices across multiple virtual machines in our title mean? How is it different from virtual networking / NSX,
More informationUnderstanding the latent value in all content
Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence
More informationHow Real Time Are Your Analytics?
How Real Time Are Your Analytics? Min Xiao Solutions Architect, VoltDB Table of Contents Your Big Data Analytics.... 1 Turning Analytics into Real Time Decisions....2 Bridging the Gap...3 How VoltDB Helps....4
More informationHigh Performance Computing
High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason
More informationCNN optimization. Rassadin A
CNN optimization Rassadin A. 01.2017-02.2017 What to optimize? Training stage time consumption (CPU / GPU) Inference stage time consumption (CPU / GPU) Training stage memory consumption Inference stage
More informationTowards High-Performance Prediction Serving Systems
Towards High-Performance Prediction Serving Systems Yunseong Lee Seoul National University Seoul, South Korea yunseong@snu.ac.kr Alberto Scolari Politecnico di Milano Milano, Italy alberto.scolari@polimi.it
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationOpen Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018
Copyright Khronos Group 2018 - Page 1 Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc peter.mcguinness@gobrach.com May 2018 Khronos Mission E.g. OpenGL ES provides 3D
More informationGUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning
GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning Koichi Shirahata, Youri Coppens, Takuya Fukagai, Yasumoto Tomita, and Atsushi Ike FUJITSU LABORATORIES LTD. March 27, 2018 0 Deep
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationInference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA
Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos
More informationServing deep learning models in a serverless platform
Serving deep learning models in a serverless platform Vatche Ishakian vishakian@bentley.edu Bentley University Vinod Muthusamy vmuthus@us.ibm.com IBM T.J. Watson Research Center Aleksander Slominski aslom@us.ibm.com
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationData 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.
Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020
More informationYiqi Yan. May 10, 2017
Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field
More informationShrinath Shanbhag Senior Software Engineer Microsoft Corporation
Accelerating GPU inferencing with DirectML and DirectX 12 Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Machine Learning Machine learning has become immensely popular over the last decade
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationScalable Distributed Training with Parameter Hub: a whirlwind tour
Scalable Distributed Training with Parameter Hub: a whirlwind tour TVM Stack Optimization High-Level Differentiable IR Tensor Expression IR AutoTVM LLVM, CUDA, Metal VTA AutoVTA Edge FPGA Cloud FPGA ASIC
More informationMachine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich
Machine Learning In A Snap Thomas Parnell Research Staff Member IBM Research - Zurich What are GLMs? Ridge Regression Support Vector Machines Regression Generalized Linear Models Classification Lasso Regression
More informationIn partnership with. VelocityAI REFERENCE ARCHITECTURE WHITE PAPER
In partnership with VelocityAI REFERENCE JULY // 2018 Contents Introduction 01 Challenges with Existing AI/ML/DL Solutions 01 Accelerate AI/ML/DL Workloads with Vexata VelocityAI 02 VelocityAI Reference
More informationThe Path to GPU as a Service in Kubernetes Renaud Gaubert Lead Kubernetes Engineer
The Path to GPU as a Service in Kubernetes Renaud Gaubert , Lead Kubernetes Engineer May 03, 2018 RUNNING A GPU APPLICATION Customers using DL DL Application RHEL 7.3 CUDA 8.0 Driver 375
More informationIntroduction to MATLAB application deployment
Introduction to application deployment Antti Löytynoja, Application Engineer 2015 The MathWorks, Inc. 1 Technical Computing with Products Access Explore & Create Share Options: Files Data Software Data
More informationWITH INTEL TECHNOLOGIES
WITH INTEL TECHNOLOGIES Commitment Is to Enable The Best Democratize technologies Advance solutions Unleash innovations Intel Xeon Scalable Processor Family Delivers Ideal Enterprise Solutions NEW Intel
More informationLeveraging AI on the Cloud to transform your business. Florida Business Analytics Forum 2018 at University of South Florida
Leveraging AI on the Cloud to transform your business Florida Business Analytics Forum 2018 at University of South Florida 1 My (unusual) path to Google Neural networks at NOAA 2 DNNs solved image analysis
More informationRecurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and
More informationThe Future of Analytics or The New SQL
The Future of Analytics or The New SQL Gerhard Otterbach, Sales Manager Teradata Germany Hanau, Feb. 28th, 2018 Teradata At A Glance: 39 Years Ago Teradata was big data before there was big data Donald
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationBespoKV: Application Tailored Scale-Out Key-Value Stores
BespoKV: Application Tailored Scale-Out Key-Value Stores Ali Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim, Dongyoon Lee, Fred Douglis, and Ali R. Butt BespoKV Role of Distributed KV stores in HPC
More informationMATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2
1 Senior Application Engineer The MathWorks Korea 2017 The MathWorks, Inc. 2 Data Analytics Workflow Business Systems Smart Connected Systems Data Acquisition Engineering, Scientific, and Field Business
More informationSnapdragon NPE Overview
March 2018 Linaro Connect Hong Kong Snapdragon NPE Overview Mark Charlebois Director, Engineering Qualcomm Technologies, Inc. Caffe2 Snapdragon Neural Processing Engine Efficient execution on Snapdragon
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationPRETZEL:Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL:Opening the Black Box of Machine Learning Prediction Serving Systems Presented by Qinyuan Sun Slides are modified from first author Yunseong Lee s slides 2 Outline Prediction Serving Systems Limitations
More informationIBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store
IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data IBM Db2 Event Store Disclaimer The information contained in this presentation is provided for informational purposes only.
More informationSEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES
SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou Cornell
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training
More informationBig Data Systems on Future Hardware. Bingsheng He NUS Computing
Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big
More informationDRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE
DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael Franklin, Benjamin Recht, Ion Stoica STREAMING WORKLOADS
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationBig Orange Bramble. August 09, 2016
Big Orange Bramble August 09, 2016 Overview HPL SPH PiBrot Numeric Integration Parallel Pi Monte Carlo FDS DANNA HPL High Performance Linpack is a benchmark for clusters Created here at the University
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just
More informationSelf Programming Networks
Self Programming Networks Is it possible for to Learn the control planes of networks and applications? Operators specify what they want, and the system learns how to deliver CAN WE LEARN THE CONTROL PLANE
More informationAerospike Scales with Google Cloud Platform
Aerospike Scales with Google Cloud Platform PERFORMANCE TEST SHOW AEROSPIKE SCALES ON GOOGLE CLOUD Aerospike is an In-Memory NoSQL database and a fast Key Value Store commonly used for caching and by real-time
More informationPocket: Elastic Ephemeral Storage for Serverless Analytics
Pocket: Elastic Ephemeral Storage for Serverless Analytics Ana Klimovic*, Yawen Wang*, Patrick Stuedi +, Animesh Trivedi +, Jonas Pfefferle +, Christos Kozyrakis* *Stanford University, + IBM Research 1
More informationDeep Learning mit PowerAI - Ein Überblick
Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s
More informationOnto Petaflops with Kubernetes
Onto Petaflops with Kubernetes Vishnu Kannan Google Inc. vishh@google.com Key Takeaways Kubernetes can manage hardware accelerators at Scale Kubernetes provides a playground for ML ML journey with Kubernetes
More informationVMware vsphere Clusters in Security Zones
SOLUTION OVERVIEW VMware vsan VMware vsphere Clusters in Security Zones A security zone, also referred to as a DMZ," is a sub-network that is designed to provide tightly controlled connectivity to an organization
More informationOPTIMIZING, PROFILING, AND TUNING TENSORFLOW + GPUS
OPTIMIZING, PROFILING, AND TUNING TENSORFLOW + GPUS NVIDIA GPU TECH CONF MUNICH, GERMANY OCTOBER 11, 2017 CHRIS FREGLY, FOUNDER @ PIPELINE.AI INTRODUCTIONS: ME Chris Fregly, Research Engineer @ Formerly
More informationCloud Computing with FPGA-based NVMe SSDs
Cloud Computing with FPGA-based NVMe SSDs Bharadwaj Pudipeddi, CTO NVXL Santa Clara, CA 1 Choice of NVMe Controllers ASIC NVMe: Fully off-loaded, consistent performance, M.2 or U.2 form factor ASIC OpenChannel:
More informationSpark, Shark and Spark Streaming Introduction
Spark, Shark and Spark Streaming Introduction Tushar Kale tusharkale@in.ibm.com June 2015 This Talk Introduction to Shark, Spark and Spark Streaming Architecture Deployment Methodology Performance References
More informationIntelligent System for AI. 清大資工周志遠 AII Workshop
Intelligent System for AI 清大資工周志遠 2018/5/19 @ AII Workshop 周志遠 (Jerry Chou) Email: jchou@cs.nthu.edu.tw Large-scaled System Architecture (LSA) Lab 經歷 清華大學資工系 副教授 2016~現今 清華大學資工系 助理教授 2011~2016 美國勞倫斯國家實驗室
More informationvsan Security Zone Deployment First Published On: Last Updated On:
First Published On: 06-14-2017 Last Updated On: 11-20-2017 1 1. vsan Security Zone Deployment 1.1.Solution Overview Table of Contents 2 1. vsan Security Zone Deployment 3 1.1 Solution Overview VMware vsphere
More informationSQL Server Machine Learning Marek Chmel & Vladimir Muzny
SQL Server Machine Learning Marek Chmel & Vladimir Muzny @VladimirMuzny & @MarekChmel MCTs, MVPs, MCSEs Data Enthusiasts! vladimir@datascienceteam.cz marek@datascienceteam.cz Session Agenda Machine learning
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationScaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research
Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationNew Approach to Unstructured Data
Innovations in All-Flash Storage Deliver a New Approach to Unstructured Data Table of Contents Developing a new approach to unstructured data...2 Designing a new storage architecture...2 Understanding
More informationvsan Remote Office Deployment January 09, 2018
January 09, 2018 1 1. vsan Remote Office Deployment 1.1.Solution Overview Table of Contents 2 1. vsan Remote Office Deployment 3 1.1 Solution Overview Native vsphere Storage for Remote and Branch Offices
More informationAI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management
AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management @SC Asia 2018 Rangan Sukumar, PhD Office of the CTO, Cray Inc. Safe Harbor Statement This presentation
More informationAnalytics Platform for ATLAS Computing Services
Analytics Platform for ATLAS Computing Services Ilija Vukotic for the ATLAS collaboration ICHEP 2016, Chicago, USA Getting the most from distributed resources What we want To understand the system To understand
More informationDeep Learning: Transforming Engineering and Science The MathWorks, Inc.
Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA
More informationOptimizing Efficiency of Deep Learning Workloads through GPU Virtualization
Optimizing Efficiency of Deep Learning Workloads through GPU Virtualization Presenters: Tim Kaldewey Performance Architect, Watson Group Michael Gschwind Chief Engineer ML & DL, Systems Group David K.
More informationAdaptable Computing The Future of FPGA Acceleration. Dan Gibbons, VP Software Development June 6, 2018
Adaptable Computing The Future of FPGA Acceleration Dan Gibbons, VP Software Development June 6, 2018 Adaptable Accelerated Computing Page 2 Three Big Trends The Evolution of Computing Trend to Heterogeneous
More informationVarys. Efficient Coflow Scheduling. Mosharaf Chowdhury, Yuan Zhong, Ion Stoica. UC Berkeley
Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley Communication is Crucial Performance Facebook analytics jobs spend 33% of their runtime in communication 1 As in-memory
More informationDelivering Deep Learning to Mobile Devices via Offloading
Delivering Deep Learning to Mobile Devices via Offloading Xukan Ran*, Haoliang Chen*, Zhenming Liu 1, Jiasi Chen* *University of California, Riverside 1 College of William and Mary Deep learning on mobile
More informationParallel Systems. Part 7: Evaluation of Computers and Programs. foils by Yang-Suk Kee, X. Sun, T. Fahringer
Parallel Systems Part 7: Evaluation of Computers and Programs foils by Yang-Suk Kee, X. Sun, T. Fahringer How To Evaluate Computers and Programs? Learning objectives: Predict performance of parallel programs
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 Arm K.K. Senior FAE Ryuji Tanaka Overview Today we will talk about applied machine learning (ML) on Arm. My aim for
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationSpatial Localization and Detection. Lecture 8-1
Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday
More informationUNIFY DATA AT MEMORY SPEED. Haoyuan (HY) Li, Alluxio Inc. VAULT Conference 2017
UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li, CEO @ Alluxio Inc. VAULT Conference 2017 March 2017 HISTORY Started at UC Berkeley AMPLab In Summer 2012 Originally named as Tachyon Rebranded to Alluxio in
More informationNative vsphere Storage for Remote and Branch Offices
SOLUTION OVERVIEW VMware vsan Remote Office Deployment Native vsphere Storage for Remote and Branch Offices VMware vsan is the industry-leading software powering Hyper-Converged Infrastructure (HCI) solutions.
More informationIBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE
IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE Choosing IT infrastructure is a crucial decision, and the right choice will position your organization for success. IBM Power Systems provides an innovative platform
More informationAzure SQL Database for Gaming Industry Workloads Technical Whitepaper
Azure SQL Database for Gaming Industry Workloads Technical Whitepaper Author: Pankaj Arora, Senior Software Engineer, Microsoft Contents 1 Introduction... 2 2 Proven Platform... 2 2.1 Azure SQL Database
More information