Bohr: Similarity Aware Geo-distributed Data Analytics. Hangyu Li, Hong Xu, Sarana Nutanong City University of Hong Kong
|
|
- Everett Cook
- 5 years ago
- Views:
Transcription
1 Bohr: Similarity Aware Geo-distributed Data Analytics Hangyu Li, Hong Xu, Sarana Nutanong City University of Hong Kong 1
2 Big Data Analytics Analysis Generate 2
3 Data are geo-distributed Frankfurt US Oregon Tokyo US California US Virginia Singapore Sydney San Paulo 3
4 Centralizing data is infeasible Moving data through wide area networks(wans) can be extremely slow. Data sovereignty laws. Frankfurt US Oregon US Virginia Tokyo US California 1. Vulimiri et al., NSDI Pu et al., SIGCOMM Viswanathan et al., OSDI Hsieh et al., NSDI 17 Singapore Sydney San Paulo 4
5 Processing queries in-place Bottleneck site exists. Frankfurt US Oregon Tokyo US California US Virginia Singapore Sydney San Paulo 5
6 Processing queries in-place System Workload Task / Data Placement Optimize aspect JetStream streaming no WAN bandwidth usage Geode batch data WAN bandwidth usage Iridium batch both query response time 6
7 Processing queries in-place Iridium (Pu et al. SIGCOMM 15) Recurring queries, abundant computation resource. Transfer data out from the bottleneck site. Re-schedule reducer tasks. 7
8 Iridium can be improved similarity oblivious data reduction ratio is constant Tokyo Input: Map2 (with combiner) Reduce2 Intermedia: UrlB,2 Intermed Site -iate record Tokyo 3 Input: Map1 (with combiner) Intermedia: UrlA,3 Input: Map3 (with combiner) Intermedia: UrlC,2 Oregon 1 Seoul 2 Reduce1 Reduce3 Total 6 Oregon Seoul 8
9 Iridium can be improved similarity oblivious data reduction ratio is constant Tokyo () Oregon Input: Map2 (with combiner) Reduce2 Intermedia: () Tokyo to Seoul () Tokyo to Oregon () Seoul Intermed Site -iate record Tokyo 2 Input: Map1 (with combiner) Reduce1 Intermedia: UrlA,3 Input: Map3 (with combiner) Reduce3 Intermedia: UrlC,2 Oregon 2 Seoul 3 Total 7 Oregon Seoul 9
10 Iridium can be improved similarity oblivious data reduction ratio is constant Tokyo () Oregon Input: Map2 (with combiner) Reduce2 Intermedia: () Tokyo to Seoul () Tokyo to Oregon () Seoul Intermed Site -iate record Tokyo 2 Input: Map1 (with combiner) Reduce1 Intermedia: UrlA,4 Input: Map3 (with combiner) Reduce3 Intermedia: UrlC,3 Oregon 1 Seoul 2 Total 5 Oregon Seoul 10
11 Bohr Design Key ideas: Recurring queries, abundant computation resource. Similarity aware data transfer. Measuring and predicting the data reduction ratio. 11
12 Bohr Design Site1 OLAP Cube Generation OLAP Cube1 Map1 Combine enabled probe similar dataset Site2 OLAP Cube Generation OLAP Cube2 Map2 Combine enabled Reduce1 Reduce2 final result query tasks Global Manager Job Queue 12
13 Why using OLAP Cube? Sorting dataset according to the attribute. Using record level similarity score. 13
14 Why using OLAP Cube? Advantages: Format the unformatted raw data. Multiple dimension cube for one dataset. 14
15 Similarity Search & Check Search: OLAP instruction(eg. dice, slice, roll up) to retrieve the needed attributes. site 1 site 2 Check: Cross site check: simple probes. The probe components. OLAP Cube1 Probe value(1) value(2) value(k) OLAP Cube2 15
16 Reduce tasks scheduling t : data transfer time. I : input data size. R: data reduction ratio. U: uplink bandwidth. D: downlink bandwidth. r : reduce tasks percentage. Goal : Minimize the data transfer time Time to upload data from site i. Time to download data from other sites. 16
17 System Implementation Based on Spark v2.1.0 and Iridium. Utilize Apache Kylin OLAP data cube to deal with all the OLAP operation. Use Gurobi Optimizer for optimize the LP function. 17
18 Evaluation Methodology Workloads: TPC-DS. Amplab big data benchmark. Hardware platform: A real EC2 deployment with 10 sites(singapore,tokyo ). 18
19 Evaluation Methodology Baseline: Iridium(Pu et al. Sigcomm 15). Performance metrics: Query completion time. Data reduction ratio. 19
20 Data Reduction Ratio Data reduction compared to in-place (%) Singapore Tokyo Oregon Virginia Ohio FrankFurt Seoul Sydney Iridium Bohr London Ireland Bohr can increase the data reductio ratio in all sites. 20
21 Query Completion Time Query completion time (secs) Iridium Bohr 0 Big data (scan) Big data (UDF) Big data (aggr) TPC-DS Bohr saves 30% QCT compare to Iridium. 21
22 Conclusion We propose Bohr: Similarity aware data transfer. Predicting and measuring data reduction ratio for each site. Bohr is 30% faster than Iridium. 22
23 Thank you! 23
24 Future work 1. Generalize the basic idea to more workloads. 2. Dealing with the overhead bring by operate OLAP cubes. 3. Break recurring batch workload condition 4. Multiple types of queries accessing one dataset. 24
Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds
Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Kevin Hsieh Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, Onur Mutlu Machine Learning and Big
More informationMarkLogic Cloud Service Pricing & Billing Effective: October 1, 2018
MarkLogic Cloud Service Pricing & Billing Effective: October 1, 2018 MARKLOGIC DATA HUB SERVICE PRICING COMPUTE AND QUERY CAPACITY MarkLogic Data Hub Service capacity is measured in MarkLogic Capacity
More informationHotCloud 17. Lube: Mitigating Bottlenecks in Wide Area Data Analytics. Hao Wang* Baochun Li
HotCloud 17 Lube: Hao Wang* Baochun Li Mitigating Bottlenecks in Wide Area Data Analytics iqua Wide Area Data Analytics DC Master Namenode Workers Datanodes 2 Wide Area Data Analytics Why wide area data
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationTo Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai, Mosharaf Chowdhury, Harsha Madhyastha
To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai, Mosharaf Chowdhury, Harsha Madhyastha Background Over 40 Data Centers (DCs) on EC2, Azure, Google Cloud A geographically denser set of DCs across
More informationSparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica Outline The Spark scheduling bottleneck Sparrow s fully distributed, fault-tolerant technique
More informationA Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics
A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics Shuhao Liu*, Li Chen, Baochun Li, Aiden Carnegie University of Toronto April 17, 2018 Graph Analytics What is Graph Analytics? 2
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationCloud and Storage. Transforming IT with AWS and Zadara. Doug Cliche, Storage Solutions Architect June 5, 2018
Cloud and Storage Transforming IT with AWS and Zadara Doug Cliche, Storage Solutions Architect June 5, 2018 What sets AWS apart? Security Fine-grained control Service Breadth & Depth; pace of innovation
More informationJanus. Consolidating Concurrency Control and Consensus for Commits under Conflicts. Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li
Janus Consolidating Concurrency Control and Consensus for Commits under Conflicts Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li New York University, University of Southern California State of the Art
More informationAmazon Web Services and Feb 28 outage. Overview presented by Divya
Amazon Web Services and Feb 28 outage Overview presented by Divya Amazon S3 Amazon S3 : store and retrieve any amount of data, at any time, from anywhere on web. Amazon S3 service: Create Buckets Create
More informationExpeditus: Congestion-Aware Load Balancing in Clos Data Center Networks
Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Peng Wang, Hong Xu, Zhixiong Niu, Dongsu Han, Yongqiang Xiong ACM SoCC 2016, Oct 5-7, Santa Clara Motivation Datacenter networks
More informationAmazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect
Amazon Web Services Foundational Services for Research Computing Mike Kuentz, WWPS Solutions Architect April 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Global Infrastructure
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
LHC2384BU VMware Cloud on AWS A Technical Deep Dive Ray Budavari @rbudavari Frank Denneman - @frankdenneman #VMworld #LHC2384BU Disclaimer This presentation may contain product features that are currently
More informationSecurity & Compliance in the AWS Cloud. Amazon Web Services
Security & Compliance in the AWS Cloud Amazon Web Services Our Culture Simple Security Controls Job Zero AWS Pace of Innovation AWS has been continually expanding its services to support virtually any
More informationGaia: Geo-Distributed Machine Learning Approaching LAN Speeds
Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Kevin Hsieh Aaron Harlap Nandita Vijaykumar Dimitris Konomis Gregory R. Ganger Phillip B. Gibbons Onur Mutlu Carnegie Mellon University EH
More informationMPROXY. Bloomberg's Proprietary Multicast Server Solution. 20 February 2004 Version: 1.5
MPROXY Bloomberg's Proprietary Multicast Server Solution 20 February 2004 Version: 1.5 1 Introduction... 2 Terminology... 2 How does MProxy Work?...3 Requirements...4 Installation and Configuration...
More informationSecurity & Compliance in the AWS Cloud. Vijay Rangarajan Senior Cloud Architect, ASEAN Amazon Web
Security & Compliance in the AWS Cloud Vijay Rangarajan Senior Cloud Architect, ASEAN Amazon Web Services @awscloud www.cloudsec.com #CLOUDSEC Security & Compliance in the AWS Cloud TECHNICAL & BUSINESS
More informationSecurity: Michael South Americas Regional Leader, Public Sector Security & Compliance Business Acceleration
Security: A Driving Force Behind Moving to the Cloud Michael South Americas Regional Leader, Public Sector Security & Compliance Business Acceleration 2017, Amazon Web Services, Inc. or its affiliates.
More informationDocAve Online 3. Release Notes
DocAve Online 3 Release Notes Service Pack 16, Cumulative Update 1 Issued May 2017 New Features and Improvements Added support for new storage regions in Amazon S3 type physical devices, including: US
More informationMicron and Hortonworks Power Advanced Big Data Solutions
Micron and Hortonworks Power Advanced Big Data Solutions Flash Energizes Your Analytics Overview Competitive businesses rely on the big data analytics provided by platforms like open-source Apache Hadoop
More informationGlobal Analytics in the Face of Bandwidth and Regulatory Constraints
Global Analytics in the Face of Bandwidth and Regulatory Constraints Ashish Vulimiri, Carlo Curino, Brighten Godfrey, Thomas Jungblut, Jitu Padhye, George Varghese NSDI 15 Presenter: Sarthak Grover Motivation
More informationLecture: Benchmarks, Pipelining Intro. Topics: Performance equations wrap-up, Intro to pipelining
Lecture: Benchmarks, Pipelining Intro Topics: Performance equations wrap-up, Intro to pipelining 1 Measuring Performance Two primary metrics: wall clock time (response time for a program) and throughput
More informationSDA: Software-Defined Accelerator for general-purpose big data analysis system
SDA: Software-Defined Accelerator for general-purpose big data analysis system Jian Ouyang(ouyangjian@baidu.com), Wei Qi, Yong Wang, Yichen Tu, Jing Wang, Bowen Jia Baidu is beyond a search engine Search
More informationYuval Bachar. Principal Engineer, Global Infrastructure & Strategy at President, Open19 Foundation
Yuval Bachar Principal Engineer, Global Infrastructure & Strategy at President, Open19 Foundation Open19 Foundation Thank you! Open19 Foundation Our mission: A community based organization for open source
More informationInfiniswap. Efficient Memory Disaggregation. Mosharaf Chowdhury. with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin
Infiniswap Efficient Memory Disaggregation Mosharaf Chowdhury with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow
More informationQLogic/Lenovo 16Gb Gen 5 Fibre Channel for Database and Business Analytics
QLogic/ Gen 5 Fibre Channel for Database Assessment for Database and Business Analytics Using the information from databases and business analytics helps business-line managers to understand their customer
More informationUNIFY DATA AT MEMORY SPEED. Haoyuan (HY) Li, Alluxio Inc. VAULT Conference 2017
UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li, CEO @ Alluxio Inc. VAULT Conference 2017 March 2017 HISTORY Started at UC Berkeley AMPLab In Summer 2012 Originally named as Tachyon Rebranded to Alluxio in
More informationCoflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan
Coflow Recent Advances and What s Next? Mosharaf Chowdhury University of Michigan Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow Networking Open Source Apache Spark Open
More informationYOU SUN JEONG DATA ANALYTICS WITH DRUID
YOU SUN JEONG DATA ANALYTICS WITH DRUID 2 WHO AM I? Senior Software Engineer of SK Telecom Commercial Products Big Data Discovery Solution (~ 16) Hadoop DW (~ 15) PaaS(CloudFoundry) (~ 13) Iaas (OpenStack)
More informationAggregation and Degradation in JetStream: Streaming analytics in the wide area
NSDI 2014 Aggregation and Degradation in JetStream: Streaming analytics in the wide area Ariel Rabkin, Matvey Arye, Siddhartha Sen, Vivek S. Pai, and Michael J. Freedman Princeton University 报告人 : 申毅杰
More informationUsing Alluxio to Improve the Performance and Consistency of HDFS Clusters
ARTICLE Using Alluxio to Improve the Performance and Consistency of HDFS Clusters Calvin Jia Software Engineer at Alluxio Learn how Alluxio is used in clusters with co-located compute and storage to improve
More informationDatabase Acceleration Solution Using FPGAs and Integrated Flash Storage
Database Acceleration Solution Using FPGAs and Integrated Flash Storage HK Verma, Xilinx Inc. August 2017 1 FPGA Analytics in Flash Storage System In-memory or Flash storage based DB reduce disk access
More informationQLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics
QLogic 16Gb Gen 5 Fibre Channel for Database Assessment for Database and Business Analytics Using the information from databases and business analytics helps business-line managers to understand their
More informationExpected Learning Outcomes Introduction To AWS
Introduction To AWS Expected Learning Outcomes Introduction To AWS Understand What Cloud Computing Is Discover Why Companies Are Adopting AWS Understand How AWS Can Help Your Explore AWS Services Apply
More informationEssentials of Database Management
Essentials of Database Management Jeffrey A. Hoffer University of Dayton Heikki Topi Bentley University V. Ramesh Indiana University PEARSON Boston Columbus Indianapolis New York San Francisco Upper Saddle
More informationTPCX-BB (BigBench) Big Data Analytics Benchmark
TPCX-BB (BigBench) Big Data Analytics Benchmark Bhaskar D Gowda Senior Staff Engineer Analytics & AI Solutions Group Intel Corporation bhaskar.gowda@intel.com 1 Agenda Big Data Analytics & Benchmarks Industry
More informationBuilding modern media services with Windows Azure. Karl Ots, Symbio
t Building modern media services with Windows Azure Karl Ots, Symbio About the the presenter Karl Ots, Technical Consultant at Symbio Windows Phone 8 and Windows 8 trainer Windows Azure Insider Co-founder
More informationGlobal analy)cs in the face of bandwidth and regulatory constraints
Global analy)cs in the face of bandwidth and regulatory constraints Ashish Vulimiri u Carlo Curino m Brighten Godfrey u Thomas Jungblut m Jitu Padhye m George Varghese m u UIUC m MicrosoA Massive data
More informationOptimizing Apache Spark with Memory1. July Page 1 of 14
Optimizing Apache Spark with Memory1 July 2016 Page 1 of 14 Abstract The prevalence of Big Data is driving increasing demand for real -time analysis and insight. Big data processing platforms, like Apache
More informationThe Value of Urban Data Petr Suska, Fraunhofer IAO, 30th OCT, 2017 New Delhi
The Value of Urban Data Petr Suska, Fraunhofer IAO, 30th OCT, 2017 New Delhi Fraunhofer 1 Profile of the Fraunhofer Society Founded: 1949 about 24,000 staff 66 institutes and research units Fraunhofer
More informationWorkload Characterization and Optimization of TPC-H Queries on Apache Spark
Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research
More informationSiphon: Expediting Inter-Datacenter Coflows in Wide-Area Data Analytics. Shuhao Liu, Li Chen, Baochun Li University of Toronto July 12, 2018
Siphon: Expediting Inter-Datacenter Coflows in Wide-Area Data Analytics Shuhao Liu, Li Chen, Baochun Li University of Toronto July 12, 2018 What is a Coflow? One stage in a data analytic job Map 1 Reduce
More informationData Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 7: Schemas Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database schema A Database Schema captures: The concepts represented Their attributes
More informationGlobal Analytics in the Face of Bandwidth and Regulatory Constraints
Global Analytics in the Face of Bandwidth and Regulatory Constraints Ashish Vulimiri, University of Illinois at Urbana-Champaign; Carlo Curino, Microsoft; P. Brighten Godfrey, University of Illinois at
More informationHiTune. Dataflow-Based Performance Analysis for Big Data Cloud
HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241
More informationEfficient Process Mapping in Geo-Distributed Cloud Data Centers
Efficient Process Mapping in Geo-Distributed Cloud Data Centers ABSTRACT Amelie Chi Zhou * Shenzhen University Bingsheng He National University of Singapore Recently, various applications including data
More informationResearch in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis
Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis Gagan Agrawal The Ohio State University (Joint work with Yi Wang, Yu Su, Tekin Bicer and others) Outline Middleware
More informationOptimizing Shuffle in Wide-Area Data Analytics
Optimizing Shuffle in Wide-Area Data Analytics Shuhao Liu, Hao Wang, Baochun Li Department of Electrical and Computer Engineering University of Toronto Toronto, Canada {shuhao, haowang, bli}@ece.toronto.edu
More informationApigee Edge Cloud. Supported browsers:
Apigee Edge Cloud Description Apigee Edge Cloud is an API management platform to securely deliver and manage all APIs. Apigee Edge Cloud manages the API lifecycle with capabilities that include, but are
More informationBLOOMBERG VAULT FOR FILES. Administrator s Guide
BLOOMBERG VAULT FOR FILES Administrator s Guide INTRODUCTION 01 Introduction 02 Package Installation 02 Pre-Installation Requirement 02 Installation Steps 06 Initial (One-Time) Configuration 06 Bloomberg
More informationPNDA.io: when BGP meets Big-Data
PNDA.io: when BGP meets Big-Data Let s go back in time 26 th April 2017 The Internet is very much alive Millions of BGP events occurring every day 15 Routers Monitored 410 active peers (both IPv4 and IPv6)
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationony Gaddis Haywood Community College STARTING OUT WITH PEARSON Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto
STARTING OUT WITH J^"* 1 Ti * ony Gaddis Haywood Community College PEARSON Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris
More informationIBM Education Assistance for z/os V2R2
IBM Education Assistance for z/os V2R2 Item: RSM Scalability Element/Component: Real Storage Manager Material current as of May 2015 IBM Presentation Template Full Version Agenda Trademarks Presentation
More informationTraffic Steering using RUM DNS. Abhijeet Rastogi Senior Linkedin (Edge Performance & Traffic)
Traffic Steering using RUM DNS Abhijeet Rastogi Senior SRE @ Linkedin (Edge Performance & Traffic) My Team Public DNS CDN Operations Load balancer & reverse proxy at PoPs and DCs Largest Professional Network
More informationData Transformation and Migration in Polystores
Data Transformation and Migration in Polystores Adam Dziedzic, Aaron Elmore & Michael Stonebraker September 15th, 2016 Agenda Data Migration for Polystores: What & Why? How? Acceleration of physical data
More informationEngage with ESRI in the AWS Cloud. Teresa Carlson, VP of Global Public Sector
Engage with ESRI in the AWS Cloud Teresa Carlson, VP of Global Public Sector On Premise Infrastructure is Costly & Complex Large Capital Expenditures Patching Software Scaling down as needed Contract negotiation
More informationMDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete
MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0
More informationA Network-aware Scheduler in Data-parallel Clusters for High Performance
A Network-aware Scheduler in Data-parallel Clusters for High Performance Zhuozhao Li, Haiying Shen and Ankur Sarker Department of Computer Science University of Virginia May, 2018 1/61 Data-parallel clusters
More informationApigee Edge Cloud - Bundles Spec Sheets
Apigee Edge Cloud - Bundles Spec Sheets Description Apigee Edge Cloud is an API management platform to securely deliver and manage all APIs. Apigee Edge Cloud manages the API lifecycle with capabilities
More informationGetting started with AWS security
Getting started with AWS security Take a prescriptive approach Stella Lee Manager, Enterprise Business Development $ 2 0 B + R E V E N U E R U N R A T E (Annualized from Q4 2017) 4 5 % Y / Y G R O W T
More informationDATABASES AND THE CLOUD. Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland
DATABASES AND THE CLOUD Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland AVALOQ Conference Zürich June 2011 Systems Group www.systems.ethz.ch Enterprise Computing Center
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
LHC2673BU Clearing Cloud Confusion Nick King and Neal Elinski #VMworld #LHC2673BU Disclaimer This presentation may contain product features that are currently under development. This overview of new technology
More informationBuilt for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations
Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Table of contents Faster Visualizations from Data Warehouses 3 The Plan 4 The Criteria 4 Learning
More informationMAINVIEW Batch Optimizer. Data Accelerator Andy Andrews
MAINVIEW Batch Optimizer Data Accelerator Andy Andrews Can I push more workload through my existing hardware configuration? Batch window problems can often be reduced down to two basic problems:! Increasing
More informationVisual C# Tony Gaddis. Haywood Community College STARTING OUT WITH. Piyali Sengupta. Third Edition. Global Edition contributions by.
STARTING OUT WITH Visual C# 2012 Third Edition Global Edition Tony Gaddis Haywood Community College Global Edition contributions by Piyali Sengupta PEARSON Boston Columbus Indianapolis New York San Francisco
More informationThe Red Belly Blockchain Experiments
The Red Belly Blockchain Experiments Concurrent Systems Research Group, University of Sydney, Data-CSIRO Abstract In this paper, we present the largest experiment of a blockchain system to date. To achieve
More informationWide-Area Spark Streaming: Automated Routing and Batch Sizing
Wide-Area Spark Streaming: Automated Routing and Batch Sizing Wenxin Li, Di Niu, Yinan Liu, Shuhao Liu, Baochun Li University of Toronto & Dalian University of Technology University of Alberta University
More informationCS15-319: Cloud Computing. Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud
CS15-319: Cloud Computing Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud Lecture Outline Discussion On Course Project Amazon Web Services 2 Course Project Course Project Phase I-A
More informationVMware Cloud on AWS Adoption in the Enterprise
HYP3920BUS VMware Cloud on AWS Adoption in the Enterprise Kit Colbert, VMware, Inc. Sandy Carter, Amazon Web Services Chuck Hoppenrath, Discovery Communications #vmworld #HYP3920BUS Disclaimer This presentation
More informationDynamic Resource Allocation for Distributed Dataflows. Lauritz Thamsen Technische Universität Berlin
Dynamic Resource Allocation for Distributed Dataflows Lauritz Thamsen Technische Universität Berlin 04.05.2018 Distributed Dataflows E.g. MapReduce, SCOPE, Spark, and Flink Used for scalable processing
More informationApache Kylin. OLAP on Hadoop
Apache Kylin OLAP on Hadoop Agenda What s Apache Kylin? Tech Highlights Performance Roadmap Q & A http://kylin.io What s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite
More informationBig Data Using Hadoop
IEEE 2016-17 PROJECT LIST(JAVA) Big Data Using Hadoop 17ANSP-BD-001 17ANSP-BD-002 Hadoop Performance Modeling for JobEstimation and Resource Provisioning MapReduce has become a major computing model for
More informationExploiting Inter-Flow Relationship for Coflow Placement in Data Centers. Xin Sunny Huang, T. S. Eugene Ng Rice University
Exploiting Inter-Flow Relationship for Coflow Placement in Data Centers Xin Sunny Huang, T S Eugene g Rice University This Work Optimizing Coflow performance has many benefits such as avoiding application
More informationIs Explicit Congestion Notification usable with UDP? Stephen McQuistin and Colin Perkins
Is Explicit Congestion Notification usable with UDP? Stephen McQuistin and Colin Perkins Introduction Explicit congestion notification (ECN) ECN with TCP/IP long established: RFC 3168 Extends IP to mark
More informationStatistics Driven Workload Modeling for the Cloud
UC Berkeley Statistics Driven Workload Modeling for the Cloud Archana Ganapathi, Yanpei Chen Armando Fox, Randy Katz, David Patterson SMDB 2010 Data analytics are moving to the cloud Cloud computing economy
More informationComplete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City
The Complete Reference Christopher Adamson Mc Grauu LlLIJBB New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto Contents Acknowledgments
More informationDatabase Concepts. David M. Kroenke UNIVERSITATSBIBLIOTHEK HANNOVER
Database Concepts Fifth Edition David M. Kroenke David J. Auer ^111 I ii i.111 111 n.n jiiim^ TECHNISCHE INFORMATIOMSBiBLIOTHEK UNIVERSITATSBIBLIOTHEK HANNOVER j TIB/UB Hannover Prentice Hall Boston Columbus
More informationIT has now become commonly accepted that the volume of
1 Time- and Cost- Efficient Task Scheduling Across Geo-Distributed Data Centers Zhiming Hu, Member, IEEE, Baochun Li, Fellow, IEEE, and Jun Luo, Member, IEEE Abstract Typically called big data processing,
More informationPopularity Prediction of Facebook Videos for Higher Quality Streaming
Popularity Prediction of Facebook Videos for Higher Quality Streaming Linpeng Tang Qi Huang, Amit Puntambekar Ymir Vigfusson, Wyatt Lloyd, Kai Li 1 Videos are Central to Facebook 8 billion views per day
More informationConcurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration
Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Berni Schiefer schiefer@ca.ibm.com Tuesday March 17th at 12:00
More informationMULTI-THREADED QUERIES
15-721 Project 3 Final Presentation MULTI-THREADED QUERIES Wendong Li (wendongl) Lu Zhang (lzhang3) Rui Wang (ruiw1) Project Objective Intra-operator parallelism Use multiple threads in a single executor
More informationTPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage
TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage Performance Study of Microsoft SQL Server 2016 Dell Engineering February 2017 Table of contents
More informationScalable Tools - Part I Introduction to Scalable Tools
Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session
More informationPocket: Elastic Ephemeral Storage for Serverless Analytics
Pocket: Elastic Ephemeral Storage for Serverless Analytics Ana Klimovic*, Yawen Wang*, Patrick Stuedi +, Animesh Trivedi +, Jonas Pfefferle +, Christos Kozyrakis* *Stanford University, + IBM Research 1
More informationHyper scale Infrastructure is the enabler
Hyper scale Infrastructure is the enabler 100+ Datacenters across 34 Regions Worldwide US DoD West TBD US Gov Iowa West US California Central US Iowa South Central US Texas North Central US Illinois Canada
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationV Conclusions. V.1 Related work
V Conclusions V.1 Related work Even though MapReduce appears to be constructed specifically for performing group-by aggregations, there are also many interesting research work being done on studying critical
More informationUnit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP
Unit 7: Basics in MS Power BI for Excel M7-5: OLAP Outline: Introduction Learning Objectives Content Exercise What is an OLAP Table Operations: Drill Down Operations: Roll Up Operations: Slice Operations:
More informationCloud Spanner. Rohit Gupta, Solutions
Cloud Spanner Rohit Gupta, Solutions Engineer @rohitforcloud Today s goals Provide a brief history of Spanner at Google Provide an explanation of Cloud Spanner Do a demo! Built on the same infrastructure
More informationReal-Time Systems and Programming Languages
Real-Time Systems and Programming Languages Ada, Real-Time Java and C/Real-Time POSIX Fourth Edition Alan Burns and Andy Wellings University of York * ADDISON-WESLEY An imprint of Pearson Education Harlow,
More informationBuilding Interconnection 2017 Steps Taken & 2018 Plans
Building Interconnection 2017 Steps Taken & 2018 Plans 2017 Equinix Inc. 2017 Key Highlights Expansion - new markets Launch - Flexible DataCentre Hyperscaler edge Rollout - IXEverywhere - SaaS, IoT & Ecosystems
More informationData Collection, Preprocessing and Implementation
Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,
More informationOptimizing Grouped Aggregation in Geo- Distributed Streaming Analytics
Optimizing Grouped Aggregation in Geo- Distributed Streaming Analytics Benjamin Heintz, Abhishek Chandra University of Minnesota Ramesh K. Sitaraman UMass Amherst & Akamai Technologies Wide- Area Streaming
More informationImproving the MapReduce Big Data Processing Framework
Improving the MapReduce Big Data Processing Framework Gistau, Reza Akbarinia, Patrick Valduriez INRIA & LIRMM, Montpellier, France In collaboration with Divyakant Agrawal, UCSB Esther Pacitti, UM2, LIRMM
More informationPROBLEM SOLVING USING JAVA WITH DATA STRUCTURES. A Multimedia Approach. Mark Guzdial and Barbara Ericson PEARSON. College of Computing
PROBLEM SOLVING WITH DATA STRUCTURES USING JAVA A Multimedia Approach Mark Guzdial and Barbara Ericson College of Computing Georgia Institute of Technology PEARSON Boston Columbus Indianapolis New York
More informationIntroduction to Amazon Lumberyard and GameLift
Introduction to Amazon Lumberyard and GameLift Peter Chapman, Solutions Architect chappete@amazon.com 3/7/2017 A Free AAA Game Engine Deeply Integrated with AWS and Twitch Lumberyard Vision A free, AAA
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationCloud Transformation and Significance of Security
Cloud Transformation and Significance of Security Mohit Sharma, Chief Architect & Cloud Evangelist @onlinesince2009 www.cloudsec.com Datacenter Management Change Management Policy Physical Network Management
More information