Mesos: A Pla+orm for Fine- Grained Resource Sharing in the Data Center
|
|
- Piers Mathews
- 5 years ago
- Views:
Transcription
1 Mesos: A Pla+orm for Fine- Grained Resource Sharing in the Data Center Ion Stoica, UC Berkeley Joint work with: A. Ghodsi, B. Hindman, A. Joseph, R. Katz, A. Konwinski, S. Shenker, and M. Zaharia
2 Challenge Rapid innovaeon in cloud compueng Dryad Hypertable Cassandra Pregel No single framework opemal for all applicaeons Running each framework on its dedicated cluster Expensive Hard to share data Need to run multiple frameworks on same cluster 2
3 Computa?on Model Each framework (e.g., Hadoop, Torque) manages and runs one or more jobs A job consists of one or more tasks A task (e.g., map, reduce) consists of one or more processes running on same machine Today s, a framework assume it controls the cluster Implement sophisecated schedulers, e.g., Quincy, LATE, Fair sharing, Maui, 3
4 Computa?on Environment Typical workload Task granularity Fine grain scheduling Cloud computing HPC Data intensive Computation intensive Small (median < 1min) Large Grid Computation intensive Large Hardware Commodity Specialized Commodity Data Distributed across nodes (e.g., 10TB per node) Need data locality Partitioned (usually, comp. & storage different) Partitioned 4
5 Where We Want to Go uniprogrammed Today: static partitioning multiprogrammed Mesos: dynamic sharing Hadoop Pregel MPI Shared cluster
6 Solu?on Mesos is a common resource sharing layer over which diverse frameworks can run Hadoop MPI Hadoop MPI Mesos Node Node Node Node Node Node Node Node 6
7 Mesos Goals High u?liza?on of resources Support diverse frameworks (current & future) Scalability to 10,000 s of nodes Reliability in face of failures Resulting design: Small microkernel-like core that pushes scheduling logic to frameworks
8 Mesos Provides Fine grained sharing of resource across frameworks Unit of scheduling: task ExecuEon environment for tasks IsolaEon Focus of this talk: resource management and scheduling 8
9 Approach: Global Scheduler Organization policies Resource availability Fwk/Job requirements Job/tasks estimated running times Global Scheduler Task schedule Advantages: can achieve opemal schedule Disadvantages: Complexity hard to scale and ensure resilience Hard to anecipate future s frameworks requirements Need to refactor exiseng frameworks 9
10 Our Approach: Distributed Scheduler Organization policies Resource availability Master (Global Scheduler) Fwk schedule Framework Framework scheduler scheduler Task schedule Advantages: Simple easier to scale and make resilient Easy to port exiseng frameworks, support new ones Disadvantages: Distributed scheduling decision not opemal 10
11 Resource Offers Unit of allocaeon: resource offer Vector of available resources on a node E.g., m1: <1 cpu, 1GB>, m2: <4 cpu, 16GB> Master sends resource offers to frameworks Frameworks select which offers to accept and which tasks to run 11
12 Mesos Architecture Slaves conenuously send status updates about resources Slave 1 task 1 Hadoop Executor task MPI 1 executor Slave 2 task 2 Hadoop Executor Slave 3 Status Framework executors launch tasks and may persist across tasks Launch Hadoop task 2 Mesos Master Resource offers AllocaEon Module Pluggable Scheduler to pick framework to send an offer to Framework Scheduler selects resources and provides tasks Framework Scheduler Hadoop JobTracker Framework Scheduler MPI Scheduler 12
13 Outline Global scheduler: Dominant Resource Fairness AnalyEc evaluaeon of distributed scheduler efficiency Experimental results 13
14 Single Resource: Fair Sharing n users want to share a resource (e.g. CPU) SoluEon: allocate each 1/n of the shared resource Generalized by max- min fairness Handles if a user wants less than its fair share E.g. user 1 wants no more than 20% Generalized by weighted max- min fairness Give weights to users according to importance User 1 gets weight 1, user 2 weight 2 100% 50% 0% 100% 50% 0% 100% 50% 0% CPU 33% 33% 33% 20% 40% 40% 33% 66% 14
15 Why Max- Min Fairness? Weighted Fair Sharing / Propor>onal Shares User 1 gets weight 2, user 2 weight 1 Priori>es Give user 1 weight 1000, user 2 weight 1 Reverva>ons Ensure user 1 gets 10% of a resource Give user 1 weight 10, sum weights 100 Isola>on Policy Users cannot affect others beyond their fair share Widely used: OS: RR, prop. sharing, logery, Linux s cfs, Networking: wfq, wf2q, sfq, drr, csfq,... Datacenters: Hadoop s fair sched, capacity sched, Qunincy 15
16 Proper?es of Max- Min Fairness Share guarantee Each user gets at least 1/n of the resource But will get less if her demand is less Strategy- proof Users are not beger off by asking for more than they need Users have no reason to lie Max- min fairness is the only reasonable mechanism with these two properees 16
17 Why is Max- Min Fairness not Enough? Job scheduling in datacenters is not only about a single resource Jobs consume CPU, memory, disk, and I/O What are task demands today? 17
18 Heterogeneous Resource Demands Some tasks are CPU- intensive Most task need ~ <2 CPU, 2 GB RAM> Some tasks are memory- intensive 2000-node Hadoop Cluster at Facebook (Oct 2010)! 18
19 Problem 100% Single resource example 50% 1 resource: CPU 50% User 1 wants <1 CPU> per task User 2 wants <3 CPU> per task Mul;- resource example 2 resources: CPUs & mem 100% 0% 50% CPU User 1 wants <1 CPU, 4 GB> per task User 2 wants <3 CPU, 1 GB> per task What s a fair alloca>on? 50%?? 0% CPU 19 mem
20 A Natural Policy Asset Fairness Equalize each user s sum of resource shares Problem: Cluster violate with 70 share CPUs, guarantee 70 GB RAM User 1 Uhas 1 needs < 50% <2 of CPU, both 2 GB CPUs RAM> and per RAM task Be_er Uoff 2 needs in a separate <1 CPU, 2 cluster GB RAM> with per 50% task of the resources Asset fairness yields U 1 : 15 tasks: U 2 : 20 tasks: 30 CPUs, 30 GB ( =60) 20 CPUs, 40 GB ( =60) 100% 50% 0% User 1 User 2 43% 43% 57% 28% CPU RAM
21 Challenge Can we find a fair sharing policy that provides Share guarantee Strategy- proofness Max- min fairness for a single resource had these properees Can we generalize max- min fairness to muleple resources? 21
22 Chea?ng the Scheduler Users willing to game the system to get more resources Real- life examples A cloud provider had quotas on map and reduce slots Some users found out that the map- quota was low Users implemented maps in the reduce slots! A search company provided dedicated machines to users that could ensure certain level of uelizaeon (e.g. 80%) Users used busy- loops to inflate u?liza?on 22
23 Dominant Resource Fairness A user s dominant resource is the resource user has the biggest share of Example: Total resources: <10 CPU, 4 GB> User 1 s allocaeon: <2 CPU, 1 GB> Dominant resource is memory as 1/4 > 2/10 (1/5) A user s dominant share is the fraceon of the dominant resource she is allocated User 1 s dominant share is 25% (1/4) 23
24 Dominant Resource Fairness (2) Apply max- min fairness to dominant shares Equalize the dominant share of the users Example: Total resources: <9 CPU, 18 GB> User 1 demand: <1 CPU, 4 GB> dom res: mem User 2 demand: <3 CPU, 1 GB> dom res: CPU 100% 3 CPUs 12 GB User 1 50% 66% 66% User 2 0% 6 CPUs 2 GB CPU (9 total) mem (18 total) 24
25 Online DRF Scheduler Whenever there are available resources and tasks to run: Schedule a task to the user with smallest dominant share O(log n) Eme per decision using binary heaps 25
26 Why not Use Pricing? Approach Set prices for each good Let users buy what they want Problem How do we determine the right prices for different goods? 26
27 How Would an Economist Solve It? Let the market determine the prices Compe;;ve Equilibrium from Equal Incomes (CEEI) Give each user 1/n of every resource Let users trade in a perfectly compeeeve market Not strategy- proof! 27
28 DRF vs CEEI User 1: <1 CPU, 4 GB> User 2: <3 CPU, 1 GB> DRF more fair, CEEI better utilization 100% Dominant Resource Fairness 100% Compe??ve Equilibrium from Equal Incomes 100% Dominant Resource Fairness 100% Compe??ve Equilibrium from Equal Incomes 50% 66% 66% 50% 55% 91% user 1 user 2 50% 66% 66% 50% 60% 80% 0% 0% 0% 0% CPU mem CPU mem CPU mem CPU mem User 1: <1 CPU, 4 GB> User 2: <3 CPU, 2 GB> User 2 increased her share of both CPU and memory by lying! 28
29 Gaming U?liza?on- Op?mal Schedulers Cluster with <100 CPU, 100 GB> 2 users, each demanding <1 CPU, 2 GB> per task 100% 100% 50% User 1 User 2 50% 50% 95% 50% 0% CPU mem 0% CPU mem User 1 lies and demands <2 CPU, 2 GB> U?liza?on- Op?mal scheduler prefers user 1 29
30 Proper?es of Policies Property Asset CEEI DRF Share guarantee Strategy-proofness Pareto efficiency Assuming non-zero demands, DRF is the only allocation that is strategy proof and provides sharing incentive (Eric Friedman, Cornell) Envy-freeness Single resource fairness Bottleneck res. fairness Population monotonicity Resource monotonicity 30
31 Outline Global scheduler: Dominant Resource Fairness AnalyEc evaluaeon of distributed scheduler efficiency Experimental results 31
32 How well does Distributed Scheduler Work? Organization policies Resource availability Master (scheduler) Fwk schedule Framework Framework scheduler scheduler Task schedule Simplified model for workload Evaluate main performance metrics Job waieng Eme CompleEon Eme UElizaEon Fwk/Job requirements Job/tasks estimated running times 32
33 Workload Characteris?cs Elas?c vs. rigid jobs: Elas?c: can dynamically adjust allocaeon during execueon (e.g., Hadoop, Spark) Rigid: cannot run before they acquire all resources (e.g., MPI) Picky vs. non- picky jobs Picky: can only run on a subset of resources (e.g., need GPU, public IP address) Homogeneous vs. heterogeneous task duraeons 33
34 Metrics Job ramp- up?me: Eme it takes a new framework to achieve its target allocaeon (e.g., fair share); Job comple?on?me: Eme it takes a job to complete, assuming one job per framework; Cluster u?liza?on 34
35 Simplified System Model One job per framework AllocaEon granularity: slot Each framework has a target alloca?on, determined by allocaeon policy E.g., fair sharing, reservaeon, priority 35
36 Ideal System Job gets its target allocaeon instantaneously k = target allocaeon (slots) T = average task duraeon m = number of tasks in a job Job compleeon Eme (m/k) T target allocation (m/k) T # of slots in system k time job ready/starts job ends 36
37 Elas?c Framework Can scale allocaeon up and down during job execueon Longer compleeon Eme: take Eme to ramp- up to target allocaeon High uelizaeon: can use a slot as soon as it becomes available # slots in system target allocation time job ready/starts job ends 37
38 Rigid Framework Cannot start before achieving target allocaeon Higher compleeon Eme Lower uelizaeon: cannot use slots as soon as it acquire them # slots in system target allocation wasted resources time job ready/starts job ends 38
39 Analysis Result Summary T mean task duraeon k target allocaeon of job (in slots) m number of job s tasks Good performance β T Within job compleeon T of ideal Eme in ideal only system, for larger where jobs, β m/k completion time and only if β >> ln k Ramp-up time Completion time Elastic Framework Rigid Framework Const. dist Exp. dist Const. dist Exp. dist T (ln k) T T (ln k) T (1/2 + β) T (1 + β) T (1 + β) T (ln k + β) T 39 Utilization 1 1 β / (1/2 + β) β /(ln k + β-1)
40 Modeling Picky Jobs k number of slots with property P requested by job n number of slots in system with property P Examples of property P: A slot running on a node storing a job s input block A slot running on a node with GPU Job pickiness, p = k/n 40
41 Picky Jobs: Wai?ng Time Assume only one framework request slots with property P E.g., all other frameworks have achieved target allocaeon Property: waiting time of a job with pickiness factor p p-percentile of task duration distribution Example: if p = 0.9, waiting time is equal to 90-th percentile of T 41
42 Summary Elastic Picky Homogeneous Performance Examples Y N Y Best MapReduce, Spark Y N N Very Good Web framework Y Y Y Very Good GPU-based ML frameworks Y Y N Good N N Y Good storage services (e.g., memcached, dist. file systems) N N N Good MPI N Y Y Good long running services with special needs (e.g., DNS) N Y N Worst 42
43 Outline Global scheduler: Dominant Resource Fairness AnalyEc evaluaeon of distributed scheduler efficiency Experimental results 43
44 Implementa?on Stats 20,000 lines of C++ Master failover using ZooKeeper Frameworks ported: Hadoop, MPI, Torque New specialized framework: Spark, for iteraeve jobs (up to 20 faster than Hadoop) Open source in Apache Incubator
45 Users Twi_er uses Mesos on > 100 nodes to run ~12 produceon services (mostly stream processing) Berkeley machine learning researchers are running several algorithms at scale on Spark Conviva is using Spark for data analyecs UCSF medical researchers are using Mesos to run Hadoop and eventually non- Hadoop apps 45
46 Dynamic Resource Sharing 100 node cluster 46
47 Mesos vs Static Partitioning Compared performance with staecally pareeoned cluster where each framework gets 25% of nodes Framework Speedup on Mesos Facebook Hadoop Mix 1.14 Large Hadoop Mix 2.10 Spark 1.26 Torque / MPI
48 Scalability 50,000 simulated nodes using 200 VMs 200 frameworks with 30 second tasks Measured: Eme to launch + get status update Task Launch Overhead (seconds) Number of Nodes 48
49 Related Work HPC schedulers (e.g. Torque, LSF, Sun Grid Engine) Coarse- grained sharing for rigid jobs (e.g. MPI) Virtual machine clouds Coarse- grained sharing similar to HPC Condor Centralized scheduler based on matchmaking Next- GeneraEon Hadoop Redesign of Hadoop to have per- applicaeon masters Also aims to support non- MapReduce jobs Based on resource request language with locality prefs 49
50 Conclusion Mesos shares clusters efficiently among diverse frameworks thanks to two design elements Fine- grained sharing at the level of tasks Resource offers, a scalable mechanism for applicaeon- controlled scheduling Enable co- existence of current frameworks and development of new specialized ones In use at Twiger, UC Berkeley, Conviva and UCSF h_p:// 50
Mesos: Mul)programing for Datacenters
Mesos: Mul)programing for Datacenters Ion Stoica Joint work with: Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, ScoC Shenker, UC BERKELEY Mo)va)on Rapid innovaeon
More informationA Platform for Fine-Grained Resource Sharing in the Data Center
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica University of California,
More informationKey aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling
Key aspects of cloud computing Cluster Scheduling 1. Illusion of infinite computing resources available on demand, eliminating need for up-front provisioning. The elimination of an up-front commitment
More informationKey aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling
Key aspects of cloud computing Cluster Scheduling 1. Illusion of infinite computing resources available on demand, eliminating need for up-front provisioning. The elimination of an up-front commitment
More informationAli Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica. University of California, Berkeley nsdi 11
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica University of California, Berkeley nsdi 11
More informationToday s Papers. Composability is Essential. The Future is Parallel Software. EECS 262a Advanced Topics in Computer Systems Lecture 13
EECS 262a Advanced Topics in Computer Systems Lecture 13 Resource allocation: Lithe/DRF October 16 th, 2012 Today s Papers Composing Parallel Software Efficiently with Lithe Heidi Pan, Benjamin Hindman,
More informationwhat is cloud computing?
what is cloud computing? (Private) Cloud Computing with Mesos at Twi9er Benjamin Hindman @benh scalable virtualized self-service utility managed elastic economic pay-as-you-go what is cloud computing?
More informationMesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman Andrew Konwinski Matei Zaharia Ali Ghodsi Anthony D. Joseph Randy H. Katz Scott Shenker Ion Stoica Electrical Engineering
More informationMesos: A Platform for Fine-Grained Resource Sharing in the Data Center
: A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica University of California,
More informationThe Datacenter Needs an Operating System
UC BERKELEY The Datacenter Needs an Operating System Anthony D. Joseph LASER Summer School September 2013 My Talks at LASER 2013 1. AMP Lab introduction 2. The Datacenter Needs an Operating System 3. Mesos,
More informationFairRide: Near-Optimal Fair Cache Sharing
UC BERKELEY FairRide: Near-Optimal Fair Cache Sharing Qifan Pu, Haoyuan Li, Matei Zaharia, Ali Ghodsi, Ion Stoica 1 Caches are crucial 2 Caches are crucial 2 Caches are crucial 2 Caches are crucial 2 Cache
More informationPage 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24
Goals for Today" CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing" Distributed systems Cloud Computing programming paradigms Cloud Computing OS December 2, 2013 Anthony
More informationResilient Distributed Datasets
Resilient Distributed Datasets A Fault- Tolerant Abstraction for In- Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin,
More informationBuilding a Data-Friendly Platform for a Data- Driven Future
Building a Data-Friendly Platform for a Data- Driven Future Benjamin Hindman - @benh 2016 Mesosphere, Inc. All Rights Reserved. INTRO $ whoami BENJAMIN HINDMAN Co-founder and Chief Architect of Mesosphere,
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationScale your Docker containers with Mesos
Scale your Docker containers with Mesos Timothy Chen tim@mesosphere.io About me: - Distributed Systems Architect @ Mesosphere - Lead Containerization engineering - Apache Mesos, Drill PMC / Committer
More informationSparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica Outline The Spark scheduling bottleneck Sparrow s fully distributed, fault-tolerant technique
More informationCS Advanced Operating Systems Structures and Implementation Lecture 18. Dominant Resource Fairness Two-Level Scheduling.
Goals for Today CS194-24 Advanced Operating Systems Structures and Implementation Lecture 18 Dominant Resource Fairness Two-Level Scheduling April 7 th, 2014 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs194-24
More informationA Whirlwind Tour of Apache Mesos
A Whirlwind Tour of Apache Mesos About Herdy Senior Software Engineer at Citadel Technology Solutions (Singapore) The eternal student Find me on the internet: _hhandoko hhandoko hhandoko https://au.linkedin.com/in/herdyhandoko
More informationMultidimensional Scheduling (Polytope Scheduling Problem) Competitive Algorithms from Competitive Equilibria
Multidimensional Scheduling (Polytope Scheduling Problem) Competitive Algorithms from Competitive Equilibria Sungjin Im University of California, Merced (UC Merced) Janardhan Kulkarni (MSR) Kamesh Munagala
More informationNexus: A Common Substrate for Cluster Computing
Nexus: A Common Substrate for Cluster Computing Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Scott Shenker, Ion Stoica University of California, Berkeley {benh,andyk,matei,alig,adj,shenker,stoica}@cs.berkeley.edu
More informationSAMPLE CHAPTER IN ACTION. Roger Ignazio. FOREWORD BY Florian Leibert MANNING
SAMPLE CHAPTER IN ACTION Roger Ignazio FOREWORD BY Florian Leibert MANNING Mesos in Action by Roger Ignazio Chapter 1 Copyright 2016 Manning Publications brief contents PART 1 HELLO, MESOS...1 1 Introducing
More informationChoosy: Max-Min Fair Sharing for Datacenter Jobs with Constraints
Choosy: Max-Min Fair Sharing for Datacenter Jobs with Constraints Ali Ghodsi Matei Zaharia Scott Shenker Ion Stoica UC Berkeley {alig,matei,shenker,istoica}@cs.berkeley.edu Abstract Max-Min Fairness is
More informationSCALING LIKE TWITTER WITH APACHE MESOS
Philip Norman & Sunil Shah SCALING LIKE TWITTER WITH APACHE MESOS 1 MODERN INFRASTRUCTURE Dan the Datacenter Operator Alice the Application Developer Doesn t sleep very well Loves automation Wants to control
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationPractical Considerations for Multi- Level Schedulers. Benjamin
Practical Considerations for Multi- Level Schedulers Benjamin Hindman @benh agenda 1 multi- level scheduling (scheduler activations) 2 intra- process multi- level scheduling (Lithe) 3 distributed multi-
More informationHierarchical Scheduling for Diverse Datacenter Workloads
Hierarchical Scheduling for Diverse Datacenter Workloads Arka A. Bhattacharya 1, David Culler 1, Eric Friedman 2, Ali Ghodsi 1, Scott Shenker 1, and Ion Stoica 1 1 University of California, Berkeley 2
More informationCoflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan
Coflow Recent Advances and What s Next? Mosharaf Chowdhury University of Michigan Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow Networking Open Source Apache Spark Open
More informationExploiting Heterogeneity in the Public Cloud for Cost-Effective Data Analytics
Exploiting Heterogeneity in the Public Cloud for Cost-Effective Data Analytics Gunho Lee, Byung-Gon Chun, Randy H. Katz University of California, Berkeley, Intel Labs Berkeley Abstract Data analytics are
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationCloud Computing CS
Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part
More informationLocality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Virtual Clusters on Cloud } Private cluster on public cloud } Distributed
More informationElastic Efficient Execution of Varied Containers. Sharma Podila Nov 7th 2016, QCon San Francisco
Elastic Efficient Execution of Varied Containers Sharma Podila Nov 7th 2016, QCon San Francisco In other words... How do we efficiently run heterogeneous workloads on an elastic pool of heterogeneous resources,
More informationPreemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization
Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Wei Chen, Jia Rao*, and Xiaobo Zhou University of Colorado, Colorado Springs * University of Texas at Arlington Data Center
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationa Spark in the cloud iterative and interactive cluster computing
a Spark in the cloud iterative and interactive cluster computing Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica UC Berkeley Background MapReduce and Dryad raised level of
More informationBuilding/Running Distributed Systems with Apache Mesos
Building/Running Distributed Systems with Apache Mesos Philly ETE April 8, 2015 Benjamin Hindman @benh $ whoami 2007-2012 2009-2010 - 2014 my other computer is a datacenter my other computer is a datacenter
More informationWarehouse- Scale Computing and the BDAS Stack
Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,
More informationSpark. In- Memory Cluster Computing for Iterative and Interactive Applications
Spark In- Memory Cluster Computing for Iterative and Interactive Applications Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker,
More informationAnalytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation
Analytic Cloud with Shelly Garion IBM Research -- Haifa 2014 IBM Corporation Why Spark? Apache Spark is a fast and general open-source cluster computing engine for big data processing Speed: Spark is capable
More information@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS
@unterstein @dcos @bedcon #bedcon Operating microservices with Apache Mesos and DC/OS 1 Johannes Unterstein Software Engineer @Mesosphere @unterstein @unterstein.mesosphere 2017 Mesosphere, Inc. All Rights
More informationAn Enhanced Approach for Resource Management Optimization in Hadoop
An Enhanced Approach for Resource Management Optimization in Hadoop R. Sandeep Raj 1, G. Prabhakar Raju 2 1 MTech Student, Department of CSE, Anurag Group of Institutions, India 2 Associate Professor,
More informationCloud Computing. What is cloud computing. CS 537 Fall 2017
Cloud Computing CS 537 Fall 2017 What is cloud computing Illusion of infinite computing resources available on demand Scale-up for most apps Elimination of up-front commitment Small initial investment,
More informationPaaS and Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University
PaaS and Hadoop Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University laiping@tju.edu.cn 1 Outline PaaS Hadoop: HDFS and Mapreduce YARN Single-Processor Scheduling Hadoop Scheduling
More informationBig Data 7. Resource Management
Ghislain Fourny Big Data 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage
More informationPerformance evaluation of job schedulers on Hadoop YARN
Performance evaluation of job schedulers on Hadoop YARN Jia-Chun Lin Department of Informatics, University of Oslo Gaustadallèen 23 B, Oslo, N-0373, Norway kellylin@ifi.uio.no Ming-Chang Lee Department
More informationPOWERING THE INTERNET WITH APACHE MESOS
Neil Conway, Niklas Nielsen, Greg Mann & Sunil Shah POWERING THE INTERNET WITH APACHE MESOS 1 MESOS: ORIGINS 2 THE BIRTH OF MESOS TWITTER TECH TALK APACHE INCUBATION The grad students working on Mesos
More informationProcessing of big data with Apache Spark
Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT
More informationMapReduce. Cloud Computing COMP / ECPE 293A
Cloud Computing COMP / ECPE 293A MapReduce Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters, In Proceedings of the 6th conference on Symposium on Opera7ng Systems
More informationMassive Online Analysis - Storm,Spark
Massive Online Analysis - Storm,Spark presentation by R. Kishore Kumar Research Scholar Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Kharagpur-721302, India (R
More information6.888 Lecture 8: Networking for Data Analy9cs
6.888 Lecture 8: Networking for Data Analy9cs Mohammad Alizadeh ² Many thanks to Mosharaf Chowdhury (Michigan) and Kay Ousterhout (Berkeley) Spring 2016 1 Big Data Huge amounts of data being collected
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationToday s content. Resilient Distributed Datasets(RDDs) Spark and its data model
Today s content Resilient Distributed Datasets(RDDs) ------ Spark and its data model Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing -- Spark By Matei Zaharia,
More informationRe- op&mizing Data Parallel Compu&ng
Re- op&mizing Data Parallel Compu&ng Sameer Agarwal Srikanth Kandula, Nicolas Bruno, Ming- Chuan Wu, Ion Stoica, Jingren Zhou UC Berkeley A Data Parallel Job can be a collec/on of maps, A Data Parallel
More informationMATE-EC2: A Middleware for Processing Data with Amazon Web Services
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering
More informationPROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP
ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge
More informationCONTINUOUS DELIVERY WITH MESOS, DC/OS AND JENKINS
APACHE MESOS NYC MEETUP SEPTEMBER 22, 2016 CONTINUOUS DELIVERY WITH MESOS, DC/OS AND JENKINS WHO WE ARE ROGER IGNAZIO SUNIL SHAH Tech Lead at Mesosphere @rogerignazio Product Manager at Mesosphere @ssk2
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationJumbo: Beyond MapReduce for Workload Balancing
Jumbo: Beyond Reduce for Workload Balancing Sven Groot Supervised by Masaru Kitsuregawa Institute of Industrial Science, The University of Tokyo 4-6-1 Komaba Meguro-ku, Tokyo 153-8505, Japan sgroot@tkl.iis.u-tokyo.ac.jp
More information@joerg_schad Nightmares of a Container Orchestration System
@joerg_schad Nightmares of a Container Orchestration System 2017 Mesosphere, Inc. All Rights Reserved. 1 Jörg Schad Distributed Systems Engineer @joerg_schad Jan Repnak Support Engineer/ Solution Architect
More information2/4/2019 Week 3- A Sangmi Lee Pallickara
Week 3-A-0 2/4/2019 Colorado State University, Spring 2019 Week 3-A-1 CS535 BIG DATA FAQs PART A. BIG DATA TECHNOLOGY 3. DISTRIBUTED COMPUTING MODELS FOR SCALABLE BATCH COMPUTING SECTION 1: MAPREDUCE PA1
More informationResearch on Load Balancing in Task Allocation Process in Heterogeneous Hadoop Cluster
2017 2 nd International Conference on Artificial Intelligence and Engineering Applications (AIEA 2017) ISBN: 978-1-60595-485-1 Research on Load Balancing in Task Allocation Process in Heterogeneous Hadoop
More informationA Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System
A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal
More informationEhsan Totoni Babak Behzad Swapnil Ghike Josep Torrellas
Ehsan Totoni Babak Behzad Swapnil Ghike Josep Torrellas 2 Increasing number of transistors on chip Power and energy limited Single- thread performance limited => parallelism Many opeons: heavy mulecore,
More informationAlgorithms, Games, and Networks March 28, Lecture 18
Algorithms, Games, and Networks March 28, 2013 Lecturer: Ariel Procaccia Lecture 18 Scribe: Hanzhang Hu 1 Strategyproof Cake Cutting All cake cutting algorithms we have discussed in previous lectures are
More informationSpark. In- Memory Cluster Computing for Iterative and Interactive Applications
Spark In- Memory Cluster Computing for Iterative and Interactive Applications Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker,
More informationUsing DC/OS for Continuous Delivery
Using DC/OS for Continuous Delivery DevPulseCon 2017 Elizabeth K. Joseph, @pleia2 Mesosphere 1 Elizabeth K. Joseph, Developer Advocate, Mesosphere 15+ years working in open source communities 10+ years
More informationNetwork Requirements for Resource Disaggregation
Network Requirements for Resource Disaggregation Peter Gao (Berkeley), Akshay Narayan (MIT), Sagar Karandikar (Berkeley), Joao Carreira (Berkeley), Sangjin Han (Berkeley), Rachit Agarwal (Cornell), Sylvia
More informationPrincipal Software Engineer Red Hat Emerging Technology June 24, 2015
USING APACHE SPARK FOR ANALYTICS IN THE CLOUD William C. Benton Principal Software Engineer Red Hat Emerging Technology June 24, 2015 ABOUT ME Distributed systems and data science in Red Hat's Emerging
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationReal-Time Internet of Things
Real-Time Internet of Things Chenyang Lu Cyber-Physical Systems Laboratory h7p://www.cse.wustl.edu/~lu/ Internet of Things Ø Convergence of q Miniaturized devices: integrate processor, sensors and radios.
More informationTransparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching
Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching Bogdan Nicolae (IBM Research, Ireland) Pierre Riteau (University of Chicago, USA) Kate Keahey (Argonne National
More informationHADOOP 3.0 is here! Dr. Sandeep Deshmukh Sadepach Labs Pvt. Ltd. - Let us grow together!
HADOOP 3.0 is here! Dr. Sandeep Deshmukh sandeep@sadepach.com Sadepach Labs Pvt. Ltd. - Let us grow together! About me BE from VNIT Nagpur, MTech+PhD from IIT Bombay Worked with Persistent Systems - Life
More informationBatch Processing Basic architecture
Batch Processing Basic architecture in big data systems COS 518: Distributed Systems Lecture 10 Andrew Or, Mike Freedman 2 1 2 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 3
More informationAdvanced Continuous Delivery Strategies for Containerized Applications Using DC/OS
Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS ContainerCon @ Open Source Summit North America 2017 Elizabeth K. Joseph @pleia2 1 Elizabeth K. Joseph, Developer Advocate
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationMixApart: Decoupled Analytics for Shared Storage Systems
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto, NetApp Abstract Data analytics and enterprise applications have very
More informationBeyond Beyond Dominant Resource Fairness : Indivisible Resource Allocation In Clusters
Beyond Beyond Dominant Resource Fairness : Indivisible Resource Allocation In Clusters Abstract Christos-Alexandros Psomas alexpsomi@gmail.com Jarett Schwartz jarett@cs.berkeley.edu Resource allocation
More informationLecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter
Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)
More informationAGILE DEVELOPMENT AND PAAS USING THE MESOSPHERE DCOS
Sunil Shah AGILE DEVELOPMENT AND PAAS USING THE MESOSPHERE DCOS 1 THE DATACENTER OPERATING SYSTEM (DCOS) 2 DCOS INTRODUCTION The Mesosphere Datacenter Operating System (DCOS) is a distributed operating
More informationTowards Makespan Minimization Task Allocation in Data Centers
Towards Makespan Minimization Task Allocation in Data Centers Kangkang Li, Ziqi Wan, Jie Wu, and Adam Blaisse Department of Computer and Information Sciences Temple University Philadelphia, Pennsylvania,
More informationWhy Study Multimedia? Operating Systems. Multimedia Resource Requirements. Continuous Media. Influences on Quality. An End-To-End Problem
Why Study Multimedia? Operating Systems Operating System Support for Multimedia Improvements: Telecommunications Environments Communication Fun Outgrowth from industry telecommunications consumer electronics
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this
More informationWebinar Series TMIP VISION
Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing
More informationOutline. CS-562 Introduction to data analysis using Apache Spark
Outline Data flow vs. traditional network programming What is Apache Spark? Core things of Apache Spark RDD CS-562 Introduction to data analysis using Apache Spark Instructor: Vassilis Christophides T.A.:
More informationContainer Orchestration on Amazon Web Services. Arun
Container Orchestration on Amazon Web Services Arun Gupta, @arungupta Docker Workflow Development using Docker Docker Community Edition Docker for Mac/Windows/Linux Monthly edge and quarterly stable
More informationPredictable Time-Sharing for DryadLINQ Cluster. Sang-Min Park and Marty Humphrey Dept. of Computer Science University of Virginia
Predictable Time-Sharing for DryadLINQ Cluster Sang-Min Park and Marty Humphrey Dept. of Computer Science University of Virginia 1 DryadLINQ What is DryadLINQ? LINQ: Data processing language and run-time
More informationNowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?
Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/
More informationBig data systems 12/8/17
Big data systems 12/8/17 Today Basic architecture Two levels of scheduling Spark overview Basic architecture Cluster Manager Cluster Cluster Manager 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores
More informationCopyright 2014 Splunk Inc. Splunk for VMware. Architecture & Design. Michael Donnelly, Sr. Sales Engineer
Copyright 2014 Splunk Inc. Splunk for VMware Architecture & Design Michael Donnelly, Sr. Sales Engineer Disclaimer During the course of this presentaeon, we may make forward looking statements regarding
More informationFair Multi-Resource Allocation with External Resource for Mobile Edge Computing
Fair Multi-Resource Allocation with External Resource for Mobile Edge Computing Erfan Meskar and Ben Liang Department of Electrical and Computer Engineering, University of Toronto emeskar, liang@ece.utoronto.ca
More informationOperating System Support for Multimedia. Slides courtesy of Tay Vaughan Making Multimedia Work
Operating System Support for Multimedia Slides courtesy of Tay Vaughan Making Multimedia Work Why Study Multimedia? Improvements: Telecommunications Environments Communication Fun Outgrowth from industry
More informationMixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage
More informationBig Data and IoT. Baris Aksanli 02/10/2016
Big Data and IoT Baris Aksanli 02/10/2016 Why is there big data? Number of devices increasing exponeneally They conenuously generate data For example, on average, 72 hours of videos are uploaded to YouTube
More informationSurvey on Incremental MapReduce for Data Mining
Survey on Incremental MapReduce for Data Mining Trupti M. Shinde 1, Prof.S.V.Chobe 2 1 Research Scholar, Computer Engineering Dept., Dr. D. Y. Patil Institute of Engineering &Technology, 2 Associate Professor,
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationCONTINUOUS DELIVERY WITH DC/OS AND JENKINS
SOFTWARE ARCHITECTURE NOVEMBER 15, 2016 CONTINUOUS DELIVERY WITH DC/OS AND JENKINS AGENDA Presentation Introduction to Apache Mesos and DC/OS Components that make up modern infrastructure Running Jenkins
More informationComputations with Bounded Errors and Response Times on Very Large Data
Computations with Bounded Errors and Response Times on Very Large Data Ion Stoica UC Berkeley (joint work with: Sameer Agarwal, Ariel Kleiner, Henry Milner, Barzan Mozafari, Ameet Talwalkar, Purnamrita
More informationBig Data for Engineers Spring Resource Management
Ghislain Fourny Big Data for Engineers Spring 2018 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models
More information