The Stratosphere Platform for Big Data Analytics Hongyao Ma Franco Solleza April 20, 2015
Stratosphere
Stratosphere
Stratosphere
Big Data Analytics BIG Data Heterogeneous datasets: structured / unstructured / semi-structured Users have different needs for declarativity and expressivity
What we have covered so far Polybase Shark MLBase SharedDB BlinkDB
The Promises Declarative, high-level language In situ data analysis Richer set of primitives than MapReduce Treat UDFs at first-class citizens Automated parallelization and optimization Support for iterative programs Includes external memory query processing algorithms to support arbitrarily long programs
Outline Meteor & Sopremo PACT Nephele Experiment Results Future work & Discussions
Sopremo
Meteor Script Declarative interface High level script
Meteor Translates To Sopremo Output Group Join Compute Revenue Filter Lineitem Supplier
Sopremo Modular and extensible Composable
Sopremo compiled to PACT Output Group Join Compute Revenue Filter Lineitem Supplier
PACT
PACT Programmer makes a pact with system Uses one of 5 functions
PACT Programmer makes a pact with system Uses one of 5 functions Map Reduce Match Cross Co-group
PACT Programmer makes a pact with system Uses one of 5 functions Map Reduce Match Cross Co-group
PACT Programmer makes a pact with system Uses one of 5 functions Map Reduce Match Cross Co-group
PACT Programmer makes a pact with system Uses one of 5 functions Map Reduce Match Cross Co-group
What s a PACT? Data and a function Specifies how data are partitioned across the system An atomic(?) operation on all specified data
Iterative PACT Programs
Iterative PACT Programs Implicitly, iteration mutates state
Iterative PACT Programs Implicitly, iteration mutates state How to do iteration without explicit mutation of state?
Iterative PACT Programs Bulk iteration
Iterative PACT Programs Bulk iteration Starts with a solution set
Iterative PACT Programs Bulk iteration Sends group by label to neighbors
Iterative PACT Programs Bulk iteration Find minimum among those neighbors
Iterative PACT Programs Bulk iteration Outputs an incremental solution set
Iterative PACT Programs Bulk iteration Incremental solution set becomes input to next iteration
Iterative PACT Programs Bulk iteration
Iterative PACT Programs Incremental iteration
Iterative PACT Programs Incremental iteration Starts with a work set, and a solution set
Iterative PACT Programs Incremental iteration Calculates the min for a group
Iterative PACT Programs Incremental iteration Merges work set with solution set and checks if label changed
Iterative PACT Programs Incremental iteration If the label is new, it becomes part of the delta set..
Iterative PACT Programs Incremental iteration Which gets sent back to the next iteration
Iterative PACT Programs Incremental iteration If changed, also gets matched to the neighbors...
Iterative PACT Programs Incremental iteration And those matches become the new workset
Iterative PACT Programs Incremental iteration
PACT Optimization
PACT Optimization
PACT Optimization
PACT Optimization
PACT Optimization
PACT Optimization
PACT Optimization
Nephele
Nephele Execution
Nephele Execution Tasks, channels, scheduling
Nephele Execution Tasks, channels, scheduling Tasks with all local pipelines associated with that task are pushed by to slaves
Nephele Execution Tasks, channels, scheduling Tasks can request to send data over network (only when necessary or ready)
Nephele Execution Fault tolerance
Nephele Execution Fault tolerance Conceptually, follows the same concept as lineage (RDDs) but...
Nephele Execution Fault tolerance Intermediate Blocking operator model
Nephele Execution Fault tolerance Intermediate Non- Blocking operator model
Nephele Execution Runtime operators
Does it deliver?
Does it deliver? Maybe - what do the experiments say? What s old? A lot of things What s new? second-order functions that abstract parallelization optimization in a UDF-heavy environment Integrate iterative processing an extensible query language and underlying operator model
Experimental Evaluation
Experimental Setup Setup: 1 master + 25 slave machines 16 cores @ 2.0Hz with 32GB of RAM (29GB of operating memory) 80TB HDFS in plain ASCII, 4 SATA drives at 500MB/s read/write per node 8 parallel tasks per slave, total DOP 40-200 Comparison with Hadoop Vanilla MapReduce engine Apache Hive Apache Giraph
Summary of Results Stratosphere achieves linear speedup and similar performance to Hadoop for simple tasks (TeraSort, Word Count) Stratosphere beats Hive and Hadoop by 5 times for complicated tasks like TPC-H and triangle enumeration, though no gain from increasing DOP Stratosphere performed worse on Connected Components than Giraph due to the better tuned implementation of the latter Checkpointing adds little overhead and saves much time when failure occurs
TeraSort --- Stratosphere v.s. Hadoop Stratosphere achieves similar performance as Hadoop and Linear Speedup
Word Count --- Stratosphere v.s. Hadoop Stratosphere is 20% faster than Hadoop and achieves linear speedup
Triangle Enumeration: Reducer 1
Triangle Enumeration: Reducer 2
Triangle Enumeration: PACT
Triangle Enumeration Stratosphere is 5x faster than Hadoop, though parallelism does not help
TPC-H Query
TPC-H --- Stratosphere v.s. Hive Parallelism does not seem to help, however, Stratosphere is 5x faster
Connected Components Giraph is faster, due to better tuned implementation
CC --- Execution time per superstep
Fault Tolerance Checkpointing adds little overhead and saves much time when failure occurs
What Else Do We Want to See? For presented experiments: Breakdown of execution time to distinguish bottlenecks What happens with even smaller DOP? What happens with more/less tasks on each core? Further: What happens with even larger data? Current size does fit into RAM Comparison with MPP, or split query processing systems like Polybase, or Shark given the size of the tested data
The Promises? Declarative, high-level language In situ data analysis Richer set of primitives than MapReduce Treat UDFs at first-class citizens Automated parallelization and optimization Support for iterative programs Includes external memory query processing algorithms to support arbitrarily long programs
Ongoing and Future Work One-pass optimizer unifying PACT and sopremo layers Strengthening fault-tolerant capabilities Improving scalability and efficiency of Nephele Design, compilation and optimization of higher-level languages Scalable, efficient, and adaptive algorithms and architecture Stateful systems for fast ingestion and low-latency data analysis
Discussions and Questions Declarativity - expressiveness tradeoff More declarative -> less expressive, but easier to optimize Run-time optimization is the way to go? Skewed data distribution may become a bottleneck for such systems Detecting performance bottleneck on the fly
QED THANKS!