History-Based Harvesting of Spare Cycles and Storage in Large-Scale Datacenters
|
|
- Maurice Williams
- 5 years ago
- Views:
Transcription
1 History-Based Harvesting of Spare Cycles and Storage in Large-Scale Datacenters Yunqi Zhang, George Prekas, Giovanni Matteo Fumarola, Marcus Fontoura, Íñigo Goiri, Ricardo Bianchini
2 Datacenters are underutilized Datacenters are massive Overprovision resources Low tail latency requirement Provisioned for peak load Unexpected load spikes and failures Underutilization wastes money Server Utilization Distribution of a Google Cluster. 2
3 Harvesting spare resources Interactive services + batch Low priority batch tasks Find safe co-locations Cluster-level Performance isolation Server-level 3
4 Challenges Interactive services own the servers Resource availability dynamics Utilization Task killing Time Data storage co-location Data unavailable Data loss 4
5 Challenges Utilization Interactive services own the servers Resource availability dynamics Task killing Time Data storage co-location Data unavailable Data loss Distributed data analytics across servers 4
6 Goals Improve efficiency without sacrificing QoS Minimize the probability of killing batch tasks Maximize data availability and durability 5
7 Batch task scheduling Utilization Can we learn anything from history? Time 6
8 Batch task scheduling Periodic Daily pattern Constant Fourier Transform Unpredictable 7
9 History-based task scheduling Long Jobs Constant 1 MAX(Peak, Current) Utilization Time headroom Medium Jobs Periodic 1 MAX(Average, Current) Utilization Time headroom Short Jobs Unpredictable 1 Current Utilization Time headroom 8
10 Data storage co-location Data availability Data durability Diverse in utilization pattern. Diverse in reimaging pattern. 9
11 History-based replica placement (Data Availability) Peak Utilization Disk Reimage Rate (Data Durability) 10
12 History-based replica placement (Data Availability) Peak Utilization Disk Reimage Rate (Data Durability) 10
13 History-based replica placement (Data Availability) Peak Utilization Disk Reimage Rate (Data Durability) 10
14 System implementation Clustering service Extract utilization and reimaging patterns YARN-H Protect interactive services by killing batch tasks Tez-H History-based batch task scheduling HDFS-H History-based replica placement Protect interactive services by denying accesses 11
15 Evaluation Real-system deployment 102-server cluster Interactive service: Lucene with utilization trace Batch task: TPC-DS queries on Hive Large-scale simulation Trace from 10 production datacenters at Microsoft Full datacenters for one month Production environment deployment Data replica placement 12
16 Batch task scheduling -- real system Degrading interactive service 13
17 Batch task scheduling -- real system Kill batch tasks 13
18 Batch task scheduling -- real system 21% improvement on average 13
19 Batch task scheduling -- simulation Up to 90% improvement 32% improvement on average 14
20 Replica placement -- durability >2 orders of magnitude improvement Higher durability with fewer replicas Deployed to thousands of production servers for almost a year Eliminated data losses except minor bugs and not enough diversity 15
21 Lessons learned from deployment Placement diversity and disk space utilization Synchronous operations and unavailability Simplicity is critical in production systems More lessons in the paper 16
22 Conclusion History-based resource harvesting Resource utilization dynamics Data storage co-location Complex data analytics distributed across servers Significantly improve datacenter efficiency Deployed in production datacenters Contributed to open-source community 17
23 History-Based Harvesting of Spare Cycles and Storage in Large-Scale Datacenters Yunqi Zhang, George Prekas, Giovanni Matteo Fumarola, Marcus Fontoura, Íñigo Goiri, Ricardo Bianchini
Scaling Distributed File Systems in Resource-Harvesting Datacenters
Scaling Distributed File Systems in Resource-Harvesting Datacenters Pulkit A. Misra Íñigo Goiri Jason Kace Ricardo Bianchini Duke University Microsoft Research Abstract Datacenters can use distributed
More informationTowards Energy Proportionality for Large-Scale Latency-Critical Workloads
Towards Energy Proportionality for Large-Scale Latency-Critical Workloads David Lo *, Liqun Cheng *, Rama Govindaraju *, Luiz André Barroso *, Christos Kozyrakis Stanford University * Google Inc. 2012
More informationThe Elasticity and Plasticity in Semi-Containerized Colocating Cloud Workload: a view from Alibaba Trace
The Elasticity and Plasticity in Semi-Containerized Colocating Cloud Workload: a view from Alibaba Trace Qixiao Liu* and Zhibin Yu Shenzhen Institute of Advanced Technology Chinese Academy of Science @SoCC
More informationGreenSlot: Scheduling Energy Consumption in Green Datacenters
GreenSlot: Scheduling Energy Consumption in Green Datacenters Íñigo Goiri, Kien Le, Md. E. Haque, Ryan Beauchea, Thu D. Nguyen, Jordi Guitart, Jordi Torres, and Ricardo Bianchini Motivation Datacenters
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationIntelligent Placement of Datacenters for Internet Services. Íñigo Goiri, Kien Le, Jordi Guitart, Jordi Torres, and Ricardo Bianchini
Intelligent Placement of Datacenters for Internet Services Íñigo Goiri, Kien Le, Jordi Guitart, Jordi Torres, and Ricardo Bianchini 1 Motivation Internet services require thousands of servers Use multiple
More informationPacking Tasks with Dependencies. Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, Janardhan Kulkarni
Packing Tasks with Dependencies Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, Janardhan Kulkarni The Cluster Scheduling Problem Jobs Goal: match tasks to resources Tasks 2 The Cluster Scheduling
More informationConfiguring and Deploying Hadoop Cluster Deployment Templates
Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page
More informationBenchmarks Prove the Value of an Analytical Database for Big Data
White Paper Vertica Benchmarks Prove the Value of an Analytical Database for Big Data Table of Contents page The Test... 1 Stage One: Performing Complex Analytics... 3 Stage Two: Achieving Top Speed...
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationEverything You Ever Wanted To Know About Resource Scheduling... Almost
logo Everything You Ever Wanted To Know About Resource Scheduling... Almost Tim Hockin Senior Staff Software Engineer, Google @thockin Who is thockin? Founding member of Kubernetes
More informationInternet Services and Search Engines. Amin Vahdat CSE 123b May 2, 2006
Internet Services and Search Engines Amin Vahdat CSE 123b May 2, 2006 Midterm: May 9 Annoucements Second assignment due May 15 Lessons from Giant-Scale Services Service Replication Service Partitioning
More informationArmon HASHICORP
Nomad Armon Dadgar @armon Distributed Optimistically Concurrent Scheduler Nomad Distributed Optimistically Concurrent Scheduler Nomad Schedulers map a set of work to a set of resources Work (Input) Resources
More informationQunar Performs Real-Time Data Analytics up to 300x Faster with Alluxio
CASE STUDY Qunar Performs Real-Time Data Analytics up to 300x Faster with Alluxio Xueyan Li, Lei Xu, and Xiaoxu Lv Software Engineers at Qunar At Qunar, we have been running Alluxio in production for over
More informationImplementing a Software Defined Datacenter
Implementing a Software Defined Datacenter Duration: 5 Days Course Code: M20745 Version: B Overview: This five-day course explains how to implement and manage virtualization infrastructure in a software-defined
More informationInfiniswap. Efficient Memory Disaggregation. Mosharaf Chowdhury. with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin
Infiniswap Efficient Memory Disaggregation Mosharaf Chowdhury with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow
More informationDON T CRY OVER SPILLED RECORDS Memory elasticity of data-parallel applications and its application to cluster scheduling
DON T CRY OVER SPILLED RECORDS Memory elasticity of data-parallel applications and its application to cluster scheduling Călin Iorgulescu (EPFL), Florin Dinu (EPFL), Aunn Raza (NUST Pakistan), Wajih Ul
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationImproving efficiency of Twitter Infrastructure using Chargeback
Improving efficiency of Twitter Infrastructure using Chargeback @vinucharanya @micheal AGENDA Brief History Problem Chargeback Engineering Challenges The product Impact Future Getty Images from http://www.fifa.com/worldcup/news/y=2010/m=7/news=pride-for-africa-spain-strike-gold-2247372.html
More informationAbstract /10/$26.00 c 2010 IEEE
Abstract Clustering solutions are frequently used in large enterprise and mission critical applications with high performance and availability requirements. This is achieved by deploying multiple servers
More informationVolley: Automated Data Placement for Geo-Distributed Cloud Services
Volley: Automated Data Placement for Geo-Distributed Cloud Services Authors: Sharad Agarwal, John Dunagen, Navendu Jain, Stefan Saroiu, Alec Wolman, Harbinder Bogan 7th USENIX Symposium on Networked Systems
More informationPocket: Elastic Ephemeral Storage for Serverless Analytics
Pocket: Elastic Ephemeral Storage for Serverless Analytics Ana Klimovic*, Yawen Wang*, Patrick Stuedi +, Animesh Trivedi +, Jonas Pfefferle +, Christos Kozyrakis* *Stanford University, + IBM Research 1
More informationProvisioning IT at the Speed of Need with Microsoft Azure. Presented by Mark Gordon and Larry Kuhn Hashtag: #HAND5
Provisioning IT at the Speed of Need with Microsoft Azure Presented by Mark Gordon and Larry Kuhn Hashtag: #HAND5 Presenters: Mark Gordon Cloud Architect Aptera - markgo@apterainc.com Larry Kuhn Account
More informationImplementing a Software-Defined DataCenter
Course 20745: Implementing a Software-Defined DataCenter Page 1 of 6 Implementing a Software-Defined DataCenter Course 20745: 4 days; Instructor-Led Introduction This four-day course explains how to implement
More informationBig Data Facebook
Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo Outline Big Data @ Facebook - Scope & Scale Evolution of Big Data Architectures @ FB Past, Present and Future Questions Big Data @ FB: Scale
More informationHadoop, Yarn and Beyond
Hadoop, Yarn and Beyond 1 B. R A M A M U R T H Y Overview We learned about Hadoop1.x or the core. Just like Java evolved, Java core, Java 1.X, Java 2.. So on, software and systems evolve, naturally.. Lets
More informationData Center Performance
Data Center Performance George Porter CSE 124 Feb 15, 2017 *Includes material taken from Barroso et al., 2013, UCSD 222a, and Cedric Lam and Hong Liu (Google) Part 1: Partitioning work across many servers
More informationHadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationLeveraging the power of Flash to Enable IT as a Service
Leveraging the power of Flash to Enable IT as a Service Steve Knipple CTO / VP Engineering August 5, 2014 In summary Flash in the datacenter, simply put, solves numerous problems. The challenge is to use
More informationApache HAWQ (incubating)
HADOOP NATIVE SQL What is HAWQ? Apache HAWQ (incubating) Is an elastic parallel processing SQL engine that runs native in Apache Hadoop to directly access data for advanced analytics. Why HAWQ? Hadoop
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationPreemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization
Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Wei Chen, Jia Rao*, and Xiaobo Zhou University of Colorado, Colorado Springs * University of Texas at Arlington Data Center
More informationTowards a Real- time Processing Pipeline: Running Apache Flink on AWS
Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges
More informationSparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica Outline The Spark scheduling bottleneck Sparrow s fully distributed, fault-tolerant technique
More informationEBOOK. FROM DISASTER RECOVERY TO ACTIVE-ACTIVE: NuoDB AND MULTI-DATA CENTER DEPLOYMENTS
FROM DISASTER RECOVERY TO ACTIVE-ACTIVE: NuoDB AND MULTI-DATA CENTER DEPLOYMENTS INTRODUCTION Traditionally, multi-data center strategies were deployed primarily to address disaster recovery scenarios.
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationSEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES
SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou Cornell
More informationSCALABLE DISTRIBUTED DEEP LEARNING
SEOUL Oct.7, 2016 SCALABLE DISTRIBUTED DEEP LEARNING Han Hee Song, PhD Soft On Net 10/7/2016 BATCH PROCESSING FRAMEWORKS FOR DL Data parallelism provides efficient big data processing: data collecting,
More informationTuning Intelligent Data Lake Performance
Tuning Intelligent Data Lake Performance 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without
More informationChase Wu New Jersey Institute of Technology
CS 644: Introduction to Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Institute of Technology Some of the slides were provided through the courtesy of Dr. Ching-Yung Lin at Columbia
More informationLinux Automation.
Linux Automation Using Red Hat Enterprise Linux to extract maximum value from IT infrastructure www.redhat.com Table of contents Summary statement Page 3 Background Page 4 Creating a more efficient infrastructure:
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationThe Emergence of the Datacenter Developer. Tobi Knaup, Co-Founder & CTO at
The Emergence of the Datacenter Developer Tobi Knaup, Co-Founder & CTO at Mesosphere @superguenter A Brief History of Operating Systems 2 1950 s Mainframes Punchcards No operating systems Time Sharing
More informationKonstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu } Introduction } Architecture } File
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationScheduling Applications at Scale
Scheduling Applications at Scale Meeting Tomorrow's Application Needs, Today http://1stchoicesportsrehab.com/wp-content/uploads/2012/05/calendar.jpg SETH VARGO @sethvargo Globally Distributed Optimistically
More informationThe Road to a Complete Tweet Index
The Road to a Complete Tweet Index Yi Zhuang Staff Software Engineer @ Twitter Outline 1. Current Scale of Twitter Search 2. The History of Twitter Search Infra 3. Complete Tweet Index 4. Search Engine
More informationEvolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo
Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo About Me Currently Co-founder/CEO of Qubole Ran the Data Infrastructure Team at Facebook till 2011
More informationArmon HASHICORP
Nomad Armon Dadgar @armon Cluster Manager Scheduler Nomad Cluster Manager Scheduler Nomad Schedulers map a set of work to a set of resources Work (Input) Resources Web Server -Thread 1 Web Server -Thread
More informationImplementing a Software-Defined DataCenter (20745)
Implementing a Software-Defined DataCenter (20745) Duration: 5 Days Price: $895 Delivery Option: Attend via MOC On-Demand Students Will Learn Explaining the different virtualization options Installing
More informationA BigData Tour HDFS, Ceph and MapReduce
A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!
More information20745B: Implementing a Software- Defined DataCenter Using System Center Virtual Machine Manager
20745B: Implementing a Software- Defined DataCenter Using System Center Virtual Machine Manager Duration: 5 days; Instructor-led Familiarity with Windows Server and Windows Server administration An understanding
More informationBuilding A Data-center Scale Analytics Platform. Sriram Rao Scientist/Manager, CISL
Building A Data-center Scale Analytics Platform Sriram Rao Scientist/Manager, CISL CISL: Cloud and Information Services Lab Started in May 2012 Mission Statement: Applied research lab working on Systems
More informationArchitecture and Design of MySQL Powered Applications. Peter Zaitsev CEO, Percona Highload Moscow, Russia 31 Oct 2014
Architecture and Design of MySQL Powered Applications Peter Zaitsev CEO, Percona Highload++ 2014 Moscow, Russia 31 Oct 2014 About Percona 2 Open Source Software for MySQL Ecosystem Percona Server Percona
More informationBigDataBench-MT: Multi-tenancy version of BigDataBench
BigDataBench-MT: Multi-tenancy version of BigDataBench Gang Lu Beijing Academy of Frontier Science and Technology BigDataBench Tutorial, ASPLOS 2016 Atlanta, GA, USA n Software perspective Multi-tenancy
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationJinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)
Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University) Background: Memory Caching Two orders of magnitude more reads than writes
More informationAmbry: LinkedIn s Scalable Geo- Distributed Object Store
Ambry: LinkedIn s Scalable Geo- Distributed Object Store Shadi A. Noghabi *, Sriram Subramanian +, Priyesh Narayanan +, Sivabalan Narayanan +, Gopalakrishna Holla +, Mammad Zadeh +, Tianwei Li +, Indranil
More informationSWsoft ADVANCED VIRTUALIZATION AND WORKLOAD MANAGEMENT ON ITANIUM 2-BASED SERVERS
SWsoft ADVANCED VIRTUALIZATION AND WORKLOAD MANAGEMENT ON ITANIUM 2-BASED SERVERS Abstract Virtualization and workload management are essential technologies for maximizing scalability, availability and
More informationBuilding Consistent Transactions with Inconsistent Replication
Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems
More informationData Storage Infrastructure at Facebook
Data Storage Infrastructure at Facebook Spring 2018 Cleveland State University CIS 601 Presentation Yi Dong Instructor: Dr. Chung Outline Strategy of data storage, processing, and log collection Data flow
More informationEfficient Memory Disaggregation with Infiniswap. Juncheng Gu, Youngmoon Lee, Yiwen Zhang, MosharafChowdhury, Kang G. Shin
Efficient Memory Disaggregation with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, MosharafChowdhury, Kang G. Shin Agenda Motivation and related work Design and system overview Implementation and evaluation
More informationData Access 3. Managing Apache Hive. Date of Publish:
3 Managing Apache Hive Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents ACID operations... 3 Configure partitions for transactions...3 View transactions...3 View transaction locks... 4
More informationUNIFY DATA AT MEMORY SPEED. Haoyuan (HY) Li, Alluxio Inc. VAULT Conference 2017
UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li, CEO @ Alluxio Inc. VAULT Conference 2017 March 2017 HISTORY Started at UC Berkeley AMPLab In Summer 2012 Originally named as Tachyon Rebranded to Alluxio in
More informationTAIL LATENCY AND PERFORMANCE AT SCALE
TAIL LATENCY AND PERFORMANCE AT SCALE George Porter May 21, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license
More informationHBase Solutions at Facebook
HBase Solutions at Facebook Nicolas Spiegelberg Software Engineer, Facebook QCon Hangzhou, October 28 th, 2012 Outline HBase Overview Single Tenant: Messages Selection Criteria Multi-tenant Solutions
More informationDesigning elastic storage architectures leveraging distributed NVMe. Your network becomes your storage!
Designing elastic storage architectures leveraging distributed NVMe Your network becomes your storage! Your hosts from Excelero 2 Yaniv Romem CTO & Co-founder Josh Goldenhar Vice President Product Management
More informationLecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it
Lecture 10.1 A real SDN implementation: the Google B4 case Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it WAN WAN = Wide Area Network WAN features: Very expensive (specialized high-end
More informationMicrosoft Implementing a Software-Defined DataCenter
1800 ULEARN (853 276) www.ddls.com.au Microsoft 20745 - Implementing a Software-Defined DataCenter Length 5 days Price $4290.00 (inc GST) Version A Overview This five-day course explains how to implement
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationCSE 124: TAIL LATENCY AND PERFORMANCE AT SCALE. George Porter November 27, 2017
CSE 124: TAIL LATENCY AND PERFORMANCE AT SCALE George Porter November 27, 2017 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative
More informationiocontrol Reference Architecture for VMware Horizon View 1 W W W. F U S I O N I O. C O M
1 W W W. F U S I O N I O. C O M iocontrol Reference Architecture for VMware Horizon View iocontrol Reference Architecture for VMware Horizon View Introduction Desktop management at any scale is a tedious
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationCloud Bursting: Top Reasons Your Organization will Benefit. Scott Jeschonek Director of Cloud Products Avere Systems
Cloud Bursting: Top Reasons Your Organization will Benefit Scott Jeschonek Director of Cloud Products Avere Systems Agenda Define Cloud Bursting Benefits of using Cloud Bursting Identify Cloud Bursting
More informationBuilding Durable Real-time Data Pipeline
Building Durable Real-time Data Pipeline Apache BookKeeper at Twitter @sijieg Twitter Background Layered Architecture Agenda Design Details Performance Scale @Twitter Q & A Publish-Subscribe Online services
More informationPOSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PostgresConf US
POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN PostgresConf US 2018 2018-04-20 ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech Email: alexander.kukushkin@zalando.de
More informationSCALING LIKE TWITTER WITH APACHE MESOS
Philip Norman & Sunil Shah SCALING LIKE TWITTER WITH APACHE MESOS 1 MODERN INFRASTRUCTURE Dan the Datacenter Operator Alice the Application Developer Doesn t sleep very well Loves automation Wants to control
More informationManaging Performance Variance of Applications Using Storage I/O Control
Performance Study Managing Performance Variance of Applications Using Storage I/O Control VMware vsphere 4.1 Application performance can be impacted when servers contend for I/O resources in a shared storage
More informationDeployment Planning Guide
Deployment Planning Guide Community 1.5.1 release The purpose of this document is to educate the user about the different strategies that can be adopted to optimize the usage of Jumbune on Hadoop and also
More informationUsing Virtualization to Reduce Cost and Improve Manageability of J2EE Application Servers
WHITEPAPER JANUARY 2006 Using Virtualization to Reduce Cost and Improve Manageability of J2EE Application Servers J2EE represents the state of the art for developing component-based multi-tier enterprise
More informationStorage in combined service/product data infrastructures. Craig Dunwoody CTO, GraphStream Incorporated
Storage in combined service/product data infrastructures Craig Dunwoody CTO, GraphStream Incorporated SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise
More informationLarge-Scale Data Engineering. Overview and Introduction
Large-Scale Data Engineering Overview and Introduction Administration Blackboard Page Announcements, also via email (pardon html formatting) Practical enrollment, Turning in assignments, Check Grades Contact:
More informationMaximize the Speed and Scalability of Your MuleSoft ESB with Solace
Maximize the Speed and Scalability of MuleSoft s Mule ESB enterprise service bus software makes information and interactive services accessible to a wide range of applications and users by intelligently
More informationOpenManage Power Center Demo Guide for https://demos.dell.com
OpenManage Power Center Demo Guide for https://demos.dell.com Contents Introduction... 3 Lab 1 Demo Environment... 6 Lab 2 Change the default settings... 7 Lab 3 Discover the devices... 8 Lab 4 Group Creation
More informationMOC 10748A: Deploying System Center 2012 Configuration Manager
MOC 10748A: Deploying System Center 2012 Configuration Manager Course Overview This course describes how to plan and deploy a Microsoft System Center 2012 Configuration Manager hierarchy, including the
More informationDriveScale-DellEMC Reference Architecture
DriveScale-DellEMC Reference Architecture DellEMC/DRIVESCALE Introduction DriveScale has pioneered the concept of Software Composable Infrastructure that is designed to radically change the way data center
More informationLecture 8: Internet and Online Services. CS 598: Advanced Internetworking Matthew Caesar March 3, 2011
Lecture 8: Internet and Online Services CS 598: Advanced Internetworking Matthew Caesar March 3, 2011 Demands of modern networked services Old approach: run applications on local PC Now: major innovation
More informationvolley: automated data placement for geo-distributed cloud services
volley: automated data placement for geo-distributed cloud services sharad agarwal, john dunagan, navendu jain, stefan saroiu, alec wolman, harbinder bhogan very rapid pace of datacenter rollout April
More informationAn Empirical Model for Predicting Cross-Core Performance Interference on Multicore Processors
An Empirical Model for Predicting Cross-Core Performance Interference on Multicore Processors Jiacheng Zhao Institute of Computing Technology, CAS In Conjunction with Prof. Jingling Xue, UNSW, Australia
More informationMulti-tenancy version of BigDataBench
Multi-tenancy version of BigDataBench Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Multi-tenancy
More informationWarehouse- Scale Computing and the BDAS Stack
Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,
More informationCSE6331: Cloud Computing
CSE6331: Cloud Computing Leonidas Fegaras University of Texas at Arlington c 2019 by Leonidas Fegaras Cloud Computing Fundamentals Based on: J. Freire s class notes on Big Data http://vgc.poly.edu/~juliana/courses/bigdata2016/
More informationOracle R Technologies
Oracle R Technologies R for the Enterprise Mark Hornick, Director, Oracle Advanced Analytics @MarkHornick mark.hornick@oracle.com Safe Harbor Statement The following is intended to outline our general
More information