Designing Hybrid Data Processing Systems for Heterogeneous Servers
|
|
- Archibald Spencer
- 5 years ago
- Views:
Transcription
1 Designing Hybrid Data Processing Systems for Heterogeneous Servers Peter Pietzuch Large-Scale Distributed Systems (LSDS) Group Imperial College London University of Cambridge Cambridge, United Kingdom November
2 Data is the New Oil Many new sources of data become available Most data is produced continuously Internet services, web sites Social feeds IoT devices Cameras RFID tags Mobile devices Scientific instruments Data repositories Data powers plethora of new and personalised services Peter Pietzuch - Imperial College London 2
3 Data-Intensive Systems Data analytics over web click streams How to maximise user experience with relevant content? How to analyse click paths to trace most common user routes? Uniquely Identified Visitors Unique Visitors Visits Page Views Hits Volume of Available Data Machine learning models for online prediction E.g. serving adverts on search engines Solution: AdPredictor Bayesian learning algorithm ranks adverts according to click probabilities f n update f 1 predict y E { 1,1} Peter Pietzuch - Imperial College London 3
4 Throughout and Result Freshness Matter High-throughput processing Data-intensive system Low-latency results Facebook Insights: Aggregates 9 GB/s < 10 sec latency Feedzai: 40K credit card transactions/s < 25 ms latency Google Zeitgeist: 40K user queries/s < 1 ms latency NovaSparks: 150M trade options/s < 1 ms latency Peter Pietzuch - Imperial College London 4
5 Design Space for Data-Intensive Systems Data amount TBs GBs MBs Hard for machine learning algorithms Easy for most algorithms Hard for all algorithms Tension between performance and algorithmic complexity 10s 1s 100ms 10ms 1ms Result latency Peter Pietzuch - Imperial College London 5
6 Algorithmic Complexity Increases T1 T2 T3 T1(a, b, c) T2(c, d, e) T3(g, i, h) highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed Pre-process Share state Parallelize Aggregate Iterate Topicbased filtering Contentbased filtering Complex pattern matching Stream queries Online machine learning, data mining Publish/Subscribe Complex Event Processing (CEP) Stream processing Peter Pietzuch - Imperial College London 6
7 Scale Out Model in Data Centres Peter Pietzuch - Imperial College London 7
8 Task Parallelism vs. Data Parallelism Input data... Servers in data centre Results select distinct W.cid From select distinct Payments W.cid [range 300 seconds] as W, From select distinct Payments Payments W.cid [partition-by [range seconds] row] as Las W, From select highway, where W.cid Payments Payments segment, = L.cid[partition-by [range direction, 300 and W.region 1 seconds] AVG(speed)!= row] L.region as Las W, from Vehicles[range 5 seconds slide 1 second] where W.cid Payments = L.cid[partition-by and W.region 1!= row] L.region as L group by highway, segment, direction where W.cid = L.cid and W.region!= L.region having avg < 40 select highway, segment, direction, AVG(speed) from Vehicles[range 5 seconds slide 1 second] group by highway, segment, direction having avg < 40 Task parallelism: Multiple data processing jobs Data parallelism: Single data processing job Peter Pietzuch - Imperial College London 8
9 Distributed Dataflow Systems Idea: Execute data-parallel tasks on cluster nodes parallelism degree 2 Tasks organised as dataflow graph parallelism degree 3 Almost all big data systems do this: Apache Hadoop, Apache Spark, Apache Storm, Apache Flink, Google TensorFlow,... Peter Pietzuch - Imperial College London 9
10 Nobody Ever Got Fired For Using Hadoop/Spark 2012 study of MapReduce workloads Microsoft: median job size < 14 GB Yahoo: median job size < 12.5 GB Facebook: 90% of jobs < 100 GB (A. Rowstron, D. Narayanan, A. Donnely, G. O Shea, A. Douglas, HotCDP 12) Many data-intensive jobs easily fit into memory One server cheaper/more efficient than compute cluster Peter Pietzuch - Imperial College London 10
11 Parallelism of Heterogeneous Servers Servers have many parallel CPU cores Heterogeneous servers with GPUs common PCIe Bus Command Queue SMX 1... SMX N 10s of CPU cores Socket 1 C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 Socket 2 C 1 C 2 C 3 C 4 C 5 C 6 C 7 C s of GPU cores L3 L3 L2 Cache DRAM DMA DRAM New types of compute accelerators: Xeon Phi, Google's TPUs, FPGAs,... Peter Pietzuch - Imperial College London 11
12 Servers Are Becoming Increasingly Heterogeneous Slide courtesy of Torsten Hoefler (Systems Group, ETH Zürich) E How can Data-Intensive Systems Exploit Heterogeneous Hardware? Peter Pietzuch - Imperial College London 12
13 Roadmap SABER: Hybrid stream processing engine for heterogeneous servers [SIGMOD 16] (1) How to parallelise computation on modern hardware? (2) How to utilise heterogeneous servers? (3) Experimental performance results Peter Pietzuch - Imperial College London 13
14 Analytics with Window-based Stream Queries Real-time analytics over data streams Windows define finite data amount for processing highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed highway segment direction speed window now Time-based window with size τ at current time t [t - τ : t] Vehicles[Range τ seconds] Count-based window with size n: last n tuples Vehicles[Rows n] Peter Pietzuch - Imperial College London 14
15 Defining Stream Query Semantics Windows convert data streams to dynamic relations (database table) Window specification Streams Relations Any relational query (select, project, join, group by, etc) Stream operators: Istream, Dstream, Rstream Peter Pietzuch - Imperial College London 15
16 SQL Stream Queries SQL provides well-defined declarative semantics for queries Based on relational algebra (select, project, join, ) Example: Identify slow moving traffic on highway Input stream: Vehicles(highway, segment, direction, speed) Find highway segments with average speed below 40 km/h Input data select highway, segment, direction, AVG(speed) as avg from Vehicles[range 5 sec slide 1 sec] group by highway, segment, direction having avg < 40 Output Operators Peter Pietzuch - Imperial College London 16
17 (1) How to Parallelise Computation? Perform query evaluation across sliding windows in parallel Exploit data parallelism across stream size: slide: 4 sec 1 sec w 1 w 2 w 3 w 4 Peter Pietzuch - Imperial College London 17
18 How to use GPUs with Stream Queries? Naive strategy parallelises computation along window boundaries size: 4 sec slide: 1 sec Task T 1 Combine partial results Task T 2 E Window-based parallelism results in redundant computation Peter Pietzuch - Imperial College London 18
19 How to use GPUs with Stream Queries? Parallel processing of non-overlapping window data? size: 4 sec slide: 1 sec wt 1 wt 2 Combine partial results T w 3 3 T 5 T 4 w 4 E Slide-based parallelism limits degree of parallelism Peter Pietzuch - Imperial College London 19
20 Apache Spark: Small Slides à Low Throughput Throughput (10 6 tuples/s) select AVG(S.1) from S [rows 1024 slide x] Window slide (10 6 tuples) Spark relates window slide to micro-batch size used for parallelisation E Avoid coupling system parameters with query definition Peter Pietzuch - Imperial College London 20
21 SABER: Parallel Window Processing Idea: Parallelise using task size that is best for hardware T 3 T 2 T 1 5 tuples/task w 5 w 4 w 3 w 2 w 1 size: 7 tuples slide: 2 tuples Task contains one or more window fragments Peter Pietzuch - Imperial College London 21
22 SABER: Window Fragment Processing Process window fragments in parallel Reassemble partial results to obtain overall result Worker A: T 1 w 1 w 2 w 3 w 1 w 2 result Empty Empty result w 5 w 4 w 1 w 2 w 3 Slot 2 Slot 1 Result stage Output result circular buffer Worker B: T 2 Partial result reassembly must also be done in parallel Peter Pietzuch - Imperial College London 22
23 API for Operator Implementation T 2 T 1 5 tuples/task Fragment function f f Processes window fragments w 2 w 1 size: 7 tuples slide: 2 tuples f f f f f f f f f b Assembly function f a Merges partial window results w 2 results w 1 results f a f a output Batch function f b Composes fragment functions within task Allows incremental processing Peter Pietzuch - Imperial College London 23
24 SABER: Performance of Window-based Queries 8 select AVG(S.1) from S [rows 1024 slide x] 0.2 Throughput (GB/s) Latency (sec) Window slide (tuples) 0 E Performance of window-based queries remains predictable Peter Pietzuch - Imperial College London 24
25 How to Pick the Task Size? Problem: Small data transfers over PCIe bus costly Example: select * from S where p1 [rows 1 slide 1] 8 Throughput (GB/s) CPU-only processing GPU-only processing Limited by dispatcher and thread contention Limited by data movement Task Size (KB) Peter Pietzuch - Imperial College London 25
26 Roadmap SABER: Hybrid stream processing engine for heterogeneous servers [SIGMOD 16] (1) How to parallelise computation on modern hardware? E Avoid coupling system parameters with processing semantics (2) How to utilise heterogeneous servers? Peter Pietzuch - Imperial College London 26
27 (2) How to Utilise Heterogeneous Servers? Hard to decide acceleration potential of heterogeneous processors Depends on operator semantics, window definition, data distribution, GPU is faster CPU is faster Throughput (GB/s) CPU execution GPU execution Aggregation Group-by θ-join E Don t leave decision about heterogeneous processors to users Peter Pietzuch - Imperial College London 27
28 SABER: Hybrid Execution Model Idea: Execute tasks on all heterogeneous processors (CPUs, GPUs,...) Task queue CPU worker T 10 T 9 T 8 T 7 T 6 T 5 T 4 T 3 T 2 T 1 GPU worker Q A Q A Q A Q A CPU T 3 T 2 T 7 T 10 T 8 T 9 GPU T 1 T 2 T 4 T 5 T 6 Fully utilise all hardware parallelism available in dedicated servers Peter Pietzuch - Imperial College London 28
29 Static Task Scheduling using Cost Model? Profile tasks to obtain cost model Assign tasks to processor with shortest execution time CPU GPU Task queue CPU workers Q A 3 ms 2 ms 3 ms 1 ms T 10 Q A T 9 Q A T 8 T 7 Q A T 6 T 5 T 4 T 3 T 2 T 1 Q A GPU worker Static: Q A on CPU and on GPU CPU GPU T 1 T 2 T 3 T 4 T 5 T 6 T 8 T 7 T 9 T 10 E Static scheduling under-utilises processors Peter Pietzuch - Imperial College London 29
30 First-Come First-Serve Task Scheduling? Assign tasks to processors first-come, first-serve CPU/GPU execute both Q A and tasks CPU GPU Q A 3 ms 2 ms T 10 T 9 T 8 T 7 T 6 T 5 T 4 T 3 Task queue T 2 T 1 CPU workers GPU worker 3 ms 1 ms Q A Q A Q A Q A First-Come First-Served CPU GPU T 1 T 4 T 8 T 2 T 3 T 5 T 6 T 7 T 9 T 10 E FCFS ignores effectiveness of processors for given task Peter Pietzuch - Imperial College London 30
31 Heterogeneous Lookahead Scheduling (HLS) Idea: Scheduler assigns tasks to idle processors dynamically Skips tasks that could be executed faster by another processor CPU GPU Q A 3 ms 2 ms T 10 T 9 T 8 T 7 T 6 T 5 T 4 T 3 Task queue T 2 T 1 CPU workers GPU worker 3 ms 1 ms Q A Q A Q A Q A HLS CPU T 7 T 10 GPU T 1 T 3 T 2 T 4 T 5 T 6 T 8 T 9 TFinally Skip 12 3 is slower T 4 skip, T 5 and T on 8 and CPU: T 6 and T 9 but Skip and select already select T 7 have T 10 3 ms of work for GPU E HLS achieves aggregate throughput of all heterogeneous processors Peter Pietzuch - Imperial College London 31
32 SABER Hybrid Stream Processing Engine Java 15K LOC C & OpenCL 4K LOC T 1 T 2 op CPU T 1 T 2 T 2 T 1 GPU op α Dispatching stage Dispatch fixed-size tasks Scheduling & execution stage Dequeue tasks based on HLS Result stage Merge & forward partial window results Peter Pietzuch - Imperial College London 32
33 Roadmap SABER: Hybrid stream processing engine for heterogeneous servers [SIGMOD 16] (1) How to parallelise computation on modern hardware? E Avoid coupling system parameters with processing semantics (2) How to utilise heterogeneous servers? E Hybrid execution utilises all heterogeneous processors (3) Experimental performance results Peter Pietzuch - Imperial College London 33
34 Experimental Evaluation PCIe 3.0 x16 10 Gbps NIC Intel Xeon 2.6 GHz 16 cores 64 GB RAM NVIDIA Quadro K5200 2,304 cores 8 GB RAM Ubuntu Linux NVIDIA driver Google Cluster Data 144M jobs events from Google infrastructure SmartGrid Measurements 974M plug measurements from houses Linear Road Benchmark 11M car positions and speed on highway Peter Pietzuch - Imperial College London 34
35 What is SABER s Performance? select group-by avg group-by cnt group-by avg aggr avg group-by avg select group-by cnt Throughput (10 6 tuples/s) CM2 SG1 SG2 LRB3 LRB4 SABER (CPU contrib.) Intel Xeon 2.6 GHz 16 cores SABER (GPU contrib.) NVIDIA Quadro K5200 2,304 cores Cluster Mgmt. Smart Grid LRB E SABER exploits both CPUs and GPUs effectively for different queries Peter Pietzuch - Imperial College London 35
36 Is Hybrid Throughput Additive? GPU is faster CPU is faster Not additive due to queue contention Throughput (GB/s) SABER (CPU only) SABER (GPU only) 0.2 SABER 0.1 Aggregation Group-by θ-join 0 E Aggregate throughput of CPU + GPU always highest Peter Pietzuch - Imperial College London 36
37 What is the Trade-Off between CPUs and GPUs? Hybrid processing model benefits from GPU's ability to process complex predicates fast SABER (CPU only) SABER (GPU only) SABER Throughput (GB/s) Dispatch bound # selection predicates # join predicates [rows 1024, slide 1024] [rows 1024, slide 1024] Peter Pietzuch - Imperial College London 37
38 Does SABER Adapt to Workload Changes? HLS periodically uses idle, non-preferred processor to run tasks to update query task throughput matrix Throughput (GB/s) Selectivity Series1 SABER SABER Series2 (GPU contribution) Time (seconds) E Higher selectivity à more predicates evaluated à GPU preferred Peter Pietzuch - Imperial College London 38
39 Summary Heterogeneous servers have huge impact on data-intensive systems Shift from scale out to scale up model Need new general-purpose system designs for heterogeneous servers SABER: Hybrid Stream Processing Engine for CPUs & GPUs (1) Parallelise computation to fit hardware capabilities E Decouple hardware/system parameters from processing semantics (2) Fully utilise all heterogeneous processors independently of workload E Hybrid processing model to achieve aggregate CPU/GPU throughput Peter Pietzuch - Imperial College London 39
40 Acknowledgement: LSDS Group at Imperial College London We're Hiring! Post-docs, PhDs Thank you! Any Questions? Peter Pietzuch - Imperial College London Peter Pietzuch <prp@imperial.ac.uk> 40
Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation
Large-Scale Data & Systems Group Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, Holger Pirk Large-Scale Data & Systems (LSDS)
More informationPebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees
PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research
More informationStreamBox: Modern Stream Processing on a Multicore Machine
StreamBox: Modern Stream Processing on a Multicore Machine Hongyu Miao and Heejin Park, Purdue ECE; Myeongjae Jeon and Gennady Pekhimenko, Microsoft Research; Kathryn S. McKinley, Google; Felix Xiaozhu
More informationSoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research
SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research 1 The world s most valuable resource Data is everywhere! May. 2017 Values from Data! Need infrastructures for
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training
More informationProgramming Systems for Big Data
Programming Systems for Big Data CS315B Lecture 17 Including material from Kunle Olukotun Prof. Aiken CS 315B Lecture 17 1 Big Data We ve focused on parallel programming for computational science There
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationDATABASES AND THE CLOUD. Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland
DATABASES AND THE CLOUD Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland AVALOQ Conference Zürich June 2011 Systems Group www.systems.ethz.ch Enterprise Computing Center
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationApache Spark Graph Performance with Memory1. February Page 1 of 13
Apache Spark Graph Performance with Memory1 February 2017 Page 1 of 13 Abstract Apache Spark is a powerful open source distributed computing platform focused on high speed, large scale data processing
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationAddressing the Memory Wall
Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the
More informationWarehouse- Scale Computing and the BDAS Stack
Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,
More informationCamdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa
Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationReal-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments
Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments Nikos Zacheilas, Vana Kalogeraki Department of Informatics Athens University of Economics and Business 1 Big Data era has arrived!
More informationPractical Big Data Processing An Overview of Apache Flink
Practical Big Data Processing An Overview of Apache Flink Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de bbdc.berlin rabl@tu-berlin.de With slides from Volker Markl and data artisans 1 2013
More informationA Cost Model for Data Stream Processing on Modern Hardware Constantin Pohl, Philipp Götze, Kai-Uwe Sattler
Processing on Modern Hardware Constantin Pohl, Philipp Götze, Kai-Uwe Sattler 31.08.17 Motivation and Introduction Main goals on Data Stream Processing Queries: High throughput & low latency Responsibility:
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationBig Data Systems on Future Hardware. Bingsheng He NUS Computing
Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big
More informationBIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE
BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationGraph Data Management
Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of
More informationBlueDBM: An Appliance for Big Data Analytics*
BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting
More informationUnderstanding and Improving the Cost of Scaling Distributed Event Processing
Understanding and Improving the Cost of Scaling Distributed Event Processing Shoaib Akram, Manolis Marazakis, and Angelos Bilas shbakram@ics.forth.gr Foundation for Research and Technology Hellas (FORTH)
More informationBig Data on AWS. Peter-Mark Verwoerd Solutions Architect
Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing
More informationAccelerating Analytical Workloads
Accelerating Analytical Workloads Thomas Neumann Technische Universität München April 15, 2014 Scale Out in Big Data Analytics Big Data usually means data is distributed Scale out to process very large
More informationPutting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21
Big Processing -Parallel Computation COS 418: Distributed Systems Lecture 21 Michael Freedman 2 Ex: Word count using partial aggregation Putting it together 1. Compute word counts from individual files
More informationOracle Exadata: Strategy and Roadmap
Oracle Exadata: Strategy and Roadmap - New Technologies, Cloud, and On-Premises Juan Loaiza Senior Vice President, Database Systems Technologies, Oracle Safe Harbor Statement The following is intended
More informationEfficient On-Demand Operations in Distributed Infrastructures
Efficient On-Demand Operations in Distributed Infrastructures Steve Ko and Indranil Gupta Distributed Protocols Research Group University of Illinois at Urbana-Champaign 2 One-Line Summary We need to design
More informationRack-scale Data Processing System
Rack-scale Data Processing System Jana Giceva, Darko Makreshanski, Claude Barthels, Alessandro Dovis, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich Rack-scale Data Processing
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationSoftware and Tools for HPE s The Machine Project
Labs Software and Tools for HPE s The Machine Project Scalable Tools Workshop Aug/1 - Aug/4, 2016 Lake Tahoe Milind Chabbi Traditional Computing Paradigm CPU DRAM CPU DRAM CPU-centric computing 2 CPU-Centric
More informationPREDICTIVE DATACENTER ANALYTICS WITH STRYMON
PREDICTIVE DATACENTER ANALYTICS WITH STRYMON Vasia Kalavri kalavriv@inf.ethz.ch QCon San Francisco 14 November 2017 Support: ABOUT ME Postdoc at ETH Zürich Systems Group: https://www.systems.ethz.ch/ PMC
More informationBuilding an Internet-Scale Publish/Subscribe System
Building an Internet-Scale Publish/Subscribe System Ian Rose Mema Roussopoulos Peter Pietzuch Rohan Murty Matt Welsh Jonathan Ledlie Imperial College London Peter R. Pietzuch prp@doc.ic.ac.uk Harvard University
More informationClearSpeed Visual Profiler
ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are
More informationData Processing on Emerging Hardware
Data Processing on Emerging Hardware Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland 3 rd International Summer School on Big Data, Munich, Germany, 2017 www.systems.ethz.ch
More informationThe Evolution of Big Data Platforms and Data Science
IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationStaged Memory Scheduling
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:
More informationAlternative GPU friendly assignment algorithms. Paul Richmond and Peter Heywood Department of Computer Science The University of Sheffield
Alternative GPU friendly assignment algorithms Paul Richmond and Peter Heywood Department of Computer Science The University of Sheffield Graphics Processing Units (GPUs) Context: GPU Performance Accelerated
More informationNowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?
Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Haicheng Wu 1, Gregory Diamos 2, Tim Sheard 3, Molham Aref 4, Sean Baxter 2, Michael Garland 2, Sudhakar Yalamanchili 1 1. Georgia
More informationHYRISE In-Memory Storage Engine
HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationWorkload Characterization and Optimization of TPC-H Queries on Apache Spark
Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO Albis Pocket
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationAdaptive Parallel Compressed Event Matching
Adaptive Parallel Compressed Event Matching Mohammad Sadoghi 1,2 Hans-Arno Jacobsen 2 1 IBM T.J. Watson Research Center 2 Middleware Systems Research Group, University of Toronto April 2014 Mohammad Sadoghi
More informationHybrid Implementation of 3D Kirchhoff Migration
Hybrid Implementation of 3D Kirchhoff Migration Max Grossman, Mauricio Araya-Polo, Gladys Gonzalez GTC, San Jose March 19, 2013 Agenda 1. Motivation 2. The Problem at Hand 3. Solution Strategy 4. GPU Implementation
More informationKartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18
Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation
More informationPart XVII. Staircase Join Tree-Aware Relational (X)Query Processing. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440
Part XVII Staircase Join Tree-Aware Relational (X)Query Processing Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440 Outline of this part 1 XPath Accelerator Tree aware relational
More informationNFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC
Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data
More informationS-Store: Streaming Meets Transaction Processing
S-Store: Streaming Meets Transaction Processing H-Store is an experimental database management system (DBMS) designed for online transaction processing applications Manasa Vallamkondu Motivation Reducing
More informationR-Storm: A Resource-Aware Scheduler for STORM. Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell
R-Storm: A Resource-Aware Scheduler for STORM Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell Introduction STORM is an open source distributed real-time data stream processing system
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationMapReduce. Stony Brook University CSE545, Fall 2016
MapReduce Stony Brook University CSE545, Fall 2016 Classical Data Mining CPU Memory Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationRecovering Disk Storage Metrics from low level Trace events
Recovering Disk Storage Metrics from low level Trace events Progress Report Meeting May 05, 2016 Houssem Daoud Michel Dagenais École Polytechnique de Montréal Laboratoire DORSAL Agenda Introduction and
More informationPlay2SDG: Bridging the Gap between Serving and Analytics in Scalable Web Applications
Play2SDG: Bridging the Gap between Serving and Analytics in Scalable Web Applications Panagiotis Garefalakis M.Res Thesis Presentation, 7 September 2015 Outline Motivation Challenges Scalable web app design
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Georgia Institute of Technology: Haicheng Wu, Ifrah Saeed, Sudhakar Yalamanchili LogicBlox Inc.: Daniel Zinn, Martin Bravenboer,
More informationData Analytics with HPC. Data Streaming
Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant
More informationWhen MPPDB Meets GPU:
When MPPDB Meets GPU: An Extendible Framework for Acceleration Laura Chen, Le Cai, Yongyan Wang Background: Heterogeneous Computing Hardware Trend stops growing with Moore s Law Fast development of GPU
More informationSDA: Software-Defined Accelerator for general-purpose big data analysis system
SDA: Software-Defined Accelerator for general-purpose big data analysis system Jian Ouyang(ouyangjian@baidu.com), Wei Qi, Yong Wang, Yichen Tu, Jing Wang, Bowen Jia Baidu is beyond a search engine Search
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationSub-millisecond Stateful Stream Querying over Fast-evolving Linked Data
Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data Yunhao Zhang, Rong Chen, Haibo Chen Institute of Parallel and Distributed Systems (IPADS) Shanghai Jiao Tong University Stream Query
More information15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses
More informationApache Flink. Alessandro Margara
Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate
More informationQuery Compilation of Dataflow Programs for Heterogeneous Platforms. Felix Beier & Kai-Uwe Sattler Ilmenau
Programs for Heterogeneous Platforms Felix Beier & Kai-Uwe Sattler DBIS@TU Ilmenau www.tu-ilmenau.de/dbis Outline 1. Requirements of Engineering Applications 2. Dataflow Programming in PipeFlow 3. Query
More informationVisual Analytics Sandbox: A big data platform for processing network traffic
Visual Analytics Sandbox: A big data platform for processing network traffic Raju Gottumukkala, Ph.D. Director of Research, Informatics Research Institute Site Director, NSF Center for Visual and Decision
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationClash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics
Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Presented by: Dishant Mittal Authors: Juwei Shi, Yunjie Qiu, Umar Firooq Minhas, Lemei Jiao, Chen Wang, Berthold Reinwald and Fatma
More informationCascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching
Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value
More informationdavidklee.net gplus.to/kleegeek linked.com/a/davidaklee
@kleegeek davidklee.net gplus.to/kleegeek linked.com/a/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture
More informationReal-time data processing with Apache Flink
Real-time data processing with Apache Flink Gyula Fóra gyfora@apache.org Flink committer Swedish ICT Stream processing Data stream: Infinite sequence of data arriving in a continuous fashion. Stream processing:
More informationData Stream Processing in the Cloud
Department of Computing Data Stream Processing in the Cloud Evangelia Kalyvianaki ekalyv@imperial.ac.uk joint work with Raul Castro Fernandez, Marco Fiscato, Matteo Migliavacca and Peter Pietzuch Peter
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationMap Reduce. Yerevan.
Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate
More informationGPU Architecture. Alan Gray EPCC The University of Edinburgh
GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From
More informationYCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores
YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie
More informationResource and Performance Distribution Prediction for Large Scale Analytics Queries
Resource and Performance Distribution Prediction for Large Scale Analytics Queries Prof. Rajiv Ranjan, SMIEEE School of Computing Science, Newcastle University, UK Visiting Scientist, Data61, CSIRO, Australia
More informationMixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage
More informationService Oriented Performance Analysis
Service Oriented Performance Analysis Da Qi Ren and Masood Mortazavi US R&D Center Santa Clara, CA, USA www.huawei.com Performance Model for Service in Data Center and Cloud 1. Service Oriented (end to
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationLarge-Scale GPU programming
Large-Scale GPU programming Tim Kaldewey Research Staff Member Database Technologies IBM Almaden Research Center tkaldew@us.ibm.com Assistant Adjunct Professor Computer and Information Science Dept. University
More informationCrescando: Predictable Performance for Unpredictable Workloads
Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing
More informationTable 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti
Solution Overview Cisco UCS Integrated Infrastructure for Big Data with the Elastic Stack Cisco and Elastic deliver a powerful, scalable, and programmable IT operations and security analytics platform
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationSnailTrail Generalizing Critical Paths for Online Analysis of Distributed Dataflows
SnailTrail Generalizing Critical Paths for Online Analysis of Distributed Dataflows Moritz Hoffmann, Andrea Lattuada, John Liagouris, Vasiliki Kalavri, Desislava Dimitrova, Sebastian Wicki, Zaheer Chothia,
More informationWhere We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)
More informationFusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic
WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationRay Tracing. Computer Graphics CMU /15-662, Fall 2016
Ray Tracing Computer Graphics CMU 15-462/15-662, Fall 2016 Primitive-partitioning vs. space-partitioning acceleration structures Primitive partitioning (bounding volume hierarchy): partitions node s primitives
More informationContainer 2.0. Container: check! But what about persistent data, big data or fast data?!
@unterstein @joerg_schad @dcos @jaxdevops Container 2.0 Container: check! But what about persistent data, big data or fast data?! 1 Jörg Schad Distributed Systems Engineer @joerg_schad Johannes Unterstein
More information