The Datacenter Needs an Operating System

Size: px

Start display at page:

Download "The Datacenter Needs an Operating System"

Ruby Dawson
5 years ago
Views:

1 UC BERKELEY The Datacenter Needs an Operating System Anthony D. Joseph LASER Summer School September 2013

2 My Talks at LASER AMP Lab introduction 2. The Datacenter Needs an Operating System 3. Mesos, part one 4. Dominant Resource Fairness 5. Mesos, part two 6. Spark 2

3 Collaborators Matei Zaharia Benjamin Hindman Andy Konwinski Ali Ghodsi Randy Katz Scott Shenker Ion Stoica 3

4 Machines: Background Clusters of commodity servers have become a major computing platform in industry and academia (100 s 10,000 s of machines) Driven by data volumes outpacing the processing capabilities of single machines big data and science Democratized by cloud computing 4

5 Machines: Background Some have declared that the datacenter is the new computer Our claim: this new computer increasingly needs an operating system Not necessarily a new host OS, but a common software layer that manages resources and provides shared services for the whole datacenter, like an OS does for one host 5

6 Why Datacenters need an OS Growing diversity of applications» Computing frameworks: MapReduce, Dryad, Pregel, Percolator, Dremel, MR Online, Spark» Storage systems: GFS, BigTable, Dynamo, SCADS» Web apps and supporting services Dryad Hypertable Cassandra Pregel 6

Why Datacenters need an OS Growing diversity of applications» Computing frameworks: MapReduce, Dryad, Pregel, Percolator, Dremel, MR Online, Spark» Storage systems: GFS, BigTable,

7 Why Datacenters need an OS Growing diversity of applications» Computing frameworks: MapReduce, Dryad, Pregel, Percolator, Dremel, MR Online, Spark» Storage systems: GFS, BigTable, Dynamo, SCADS» Web apps and supporting services Growing diversity of users» 200+ Hive users at Facebook, running near-interactive ad hoc queries Same reasons computers needed one! 7

8 What Operating Systems Provide Resource Sharing time-sharing, virtual memory, Data Sharing files, pipes, IPC, Programming Abstractions libraries, languages Debugging & Monitoring ptrace, DTrace, top, 8

ecosystem that we now take for granted files, pipes, IPC, Programming

9 What Operating Systems Provide Data Sharing Resource Sharing time-sharing, virtual memory, Most importantly: enables a highly interoperable software ecosystem that we now take for granted files, pipes, IPC, Programming Abstractions libraries, languages Debugging & Monitoring ptrace, DTrace, top, 9

10 Example A scientist analyzing data on one machine can pipe it through a variety of tools, write new tools that interface with these through standard APIs, and trace across the stack In the future, the scientist should be able to launch a cluster on EC2 and do the same things:» Mix and combine a variety of apps & programming models» Write new parallel programs that talk to these» Get a unified interface for managing the cluster» Debug and trace across all these components 10

11 Today s Datacenter OS Hadoop MapReduce as common execution and resource sharing platform» Means jobs have to compile to MapReduce» Inter-user resource sharing, but at the level of MR jobs Hadoop InputFormat API for data sharing what happens with the next hot platform after Hadoop? 11

12 Today s Datacenter OS Abstractions for productivity programmers, but not for system builders Difficult to debug, especially across layers Other examples:» Amazon/Azure services» Google internal stack and Google Compute Engine» Hadoop YARN 12

Today s Datacenter OS Abstractions for productivity programmers, but not for system builders The problems motivating a datacenter OS are well recognized, but solutions are narrowly targeted

13 Today s Datacenter OS Abstractions for productivity programmers, but not for system builders The problems motivating a datacenter OS are well recognized, but solutions are narrowly targeted Difficult to debug, especially across layers Other examples: Can researchers take a longer-term view?» Amazon/Azure services» Google internal stack and Google Compute Engine» Hadoop YARN 13

14 Tomorrow s Datacenter OS Resource Sharing time-sharing, virtual memory, Data Sharing files, pipes, IPC, Programming Abstractions libraries, languages Debugging & Monitoring ptrace, DTrace, top, 14

15 Resource Sharing To solve these interaction problems we would like to have a computer made simultaneously available to many users in a manner somewhat like a telephone exchange. Each user would be able to use a console at his own pace and without concern for the activity of others using the system. Fernando J. Corbató,

16 Today s Resource Sharing Today, cluster apps are built to run independently and assume they own a fixed set of nodes Result: inefficient static partitioning What s the right interface for dynamic sharing? App 1 33% 17% 0% 100% App 2 App 3 33% 17% 0% 50% 33% 0% 17% 0% 16

17 Tomorrow s Datacenter OS Resource sharing:» Lower-level interfaces for fine-grained sharing Mesos and Hadoop YARN are first steps in this direction» Optimization for a variety of metrics (e.g., energy)» Integration with network scheduling mechanisms (e.g., Seawall [NSDI 11], NOX, Orchestra)» Others: Azure Fabric Controller 17

18 Tomorrow s Datacenter OS Resource Sharing time-sharing, virtual memory, Data Sharing files, pipes, IPC, Programming Abstractions libraries, languages Debugging & Monitoring ptrace, DTrace, top, 18

19 Tomorrow s Datacenter OS Persistent data sharing many design issues addressed» Placement/Locality» Reliability» Availability» Consistency» Bandwidth/Latency» Software versioning 19

20 Tomorrow s Datacenter OS Persistent data sharing:» Standard interfaces for cluster file systems, key-value stores, etc.» Lineage instead of replication for reliability (Spark RDDs)» Application frameworks self-manage versioning Many possibilities:» Amazon Elastic Block Store and S3» HDFS» Azure storage services 20

21 Tomorrow s Datacenter OS Transient data sharing many design issues addressed» Failures on either side» Consistency» Timeliness 21

22 Tomorrow s Datacenter OS Transient data sharing:» In-memory data sharing (e.g. Spark, DFS cache), and a unified system to manage this memory DFS cache for MapReduce cluster could serve 90% of jobs at Facebook (HotOS 11)» Streaming data abstractions (analogous to pipes) Many possibilities:» Amazon/Azure message queues» Percolator 22

23 Tomorrow s Datacenter OS Resource Sharing time-sharing, virtual memory, Data Sharing files, pipes, IPC, Programming Abstractions libraries, languages Debugging & Monitoring ptrace, DTrace, top, 23

24 Tomorrow s Datacenter OS Programming abstractions:» Many new distributed application programming models, abstractions, and languages» Tools for programming for distributed coordination and fault-tolerance (e.g., Apache Zookeeper)» New tools that can be used to build the next MapReduce / BigTable in a week (e.g., BOOM)» Efficient implementations of communication primitives (e.g. shuffle, broadcast) 24

25 Tomorrow s Datacenter OS Resource Sharing time-sharing, virtual memory, Data Sharing files, pipes, IPC, Programming Abstractions libraries, languages Debugging & Monitoring ptrace, DTrace, top, 25

26 Tomorrow s Datacenter OS Debugging and Monitoring facilities:» Tracing and debugging tools that work across the cluster software stack (e.g. X-Trace, Dapper, Magpie, Hystrix)» Replay debugging that takes advantage of limited languages / computational models» Unified monitoring infrastructure and APIs (e.g., Hystrix) 26

27 Putting it Together A successful datacenter OS might let users:» Build a Hadoop-like software stack in a week using the OS s APIs, while gaining other benefits (e.g. cross-stack replay debugging)» Share data efficiently between independently written apps and programming frameworks» Understand cluster behavior without having to log into individual nodes» Dynamically share the cluster with other users 27

28 How Researchers can Help Focus on paradigms, not performance» Industry is tackling performance but lacks luxury to take long-term view towards abstractions Explore clean-slate approaches» Likelier to have greater impact here than in a real OS because datacenter software changes quickly! Bring cluster computing to non-experts» Most impactful (datacenter as the new workstation)» Much harder and more rewarding than big users 28

29 Berkeley Data Analytics Stack Shark BlinkDB SQL Spark Streaming GraphX MLBase Apache Spark HDFS / Hadoop Storage / Tachyon Apache Mesos / YARN Resource Manager 29

executor Dryad executor MPI executor 100%

30 Apache Mesos Cluster Operating System Efficiently shares resources among diverse parallel applications Hadoop scheduler Mesos slave Dryad scheduler Mesos master Mesos slave MPI scheduler Mesos slave Hadoop executor Dryad executor MPI executor Dryad executor MPI executor 100% 90% task task task task task 80% Share of Cluster 70% 60% 50% 40% 30% MPI Hadoop Spark 20% 10% 0% Time (s) 30

new abstractions and services Datacenter OS (e.g.

31 Machines Make datacenter a real computer! Share datacenter between multiple cluster computing apps Provide new abstractions and services Datacenter OS (e.g., Apache Mesos) AMP stack Node OS (e.g. Linux) Node OS (e.g. Windows) Node OS (e.g. Linux) Existing stack 31

32 Machines Make datacenter a real computer! Hive Hadoop MPI Hypertbale Cassandra Support existing cluster computing apps Datacenter OS (e.g., Apache Mesos) AMP stack Node OS (e.g. Linux) Node OS (e.g. Windows) Node OS (e.g. Linux) Existing stack 32

33 Machines Make datacenter a real computer! Support interactive and iterative data analysis Hive (e.g., ML algorithms) Hadoop Node OS (e.g. Linux) MPI Hypertbale Cassandra Node OS (e.g. Windows) Spark Consistency adjustable data store Datacenter OS (e.g., Apache Mesos) PIQL SCADS Predictive & insightful query language Node OS (e.g. Linux) AMP stack Existing stack 33

34 Machines Make datacenter a real computer! Applications, tools Hive Hadoop MPI Hypertbale Cassandra Advanced ML algorithms Spark Datacenter OS (e.g., Apache Mesos) PIQL Interactive data SCADS mining Collaborative visualization AMP stack Node OS (e.g. Linux) Node OS (e.g. Windows) Node OS (e.g. Linux) Existing stack 34

35 Milestones 2010: Mesos in Apache incubator 2010: Spark open sourced 2012: Shark (SQL) open sourced Feb 2013: Spark Streaming alpha open sourced Mar 2013: Tachyon alpha open sourced Jun 2013: Spark entered Apache Incubator Aug 2013: Machine Learning library for Spark 35

36 BDAS Users (partial list) 36

37 BDAS Buzz 37

38 Big Data Landscape Our Corner 38

39 MLbase Meet Up at Twitter (13 Aug 2013) 39

40 BDAS Contributors 70+ public contributors on GitHub» US, China, India, UK, Canada, Vietnam» Startups and large multinationals: Intel, Yahoo, Ooyala, Quantifind, ClearStory, Palantir, Foursquare, Groupon 40

41 Researchers Using BDAS UC Berkeley IBM Almaden Cornell Duke Tsinghua Purdue 41

42 What is fueling the traction? Superior technologies J» Fast and expressive» It works! Integration with existing Hadoop ecosystem» HDFS» HBase» Hive 42

43 BDAS Future Directions Future data analytics need to support» Fast SQL» Approximate queries» Machine learning» GraphX» Streaming» Crowdsourcing!!! Mix and match all of the above 43

44 Conclusion Datacenters need an OS-like software stack for same reasons as single computers: manageability, efficiency, programmability, and thriving software ecosystem Multiple DCOS already emerging in ad-hoc ways Researchers can help by taking a long-term systems view towards these problems 44

45 My Talks at LASER AMP Lab introduction 2. The Datacenter Needs an Operating System 3. Mesos, part one 4. Dominant Resource Fairness 5. Mesos, part two 6. Spark 45

A Platform for Fine-Grained Resource Sharing in the Data Center

Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica University of California,