Sparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

Outline The Spark scheduling bottleneck Sparrow s fully distributed, fault-tolerant technique Sparrow s near-optimal performance

Spark Today User 1 User 2 User 3 Spark Context Query Compilation Storage Scheduling

2004: MapReduce batch job 2010: Dremel Query 2009: Hive query 2012: Impala query 2010: In-memory Spark query 2013: Spark streaming 10 min. 10 sec. 100 ms 1 ms Job Latencies Rapidly Decreasing

Job latencies rapidly decreasing

Job latencies rapidly decreasing + Spark deployments growing in size Scheduling bottleneck!

Spark scheduler throughput: 1500 tasks / second Task Duration Cluster size (# 16-core machines) 10 second 1000 1 second 100 100 ms 10

Optimizing the Spark 0.8: Monitoring code moved off critical path 0.8.1: Result deserialization moved off critical path Future improvements may yield 2-3x higher throughput

Is the scheduler the bottleneck in my cluster? tinyurl.com/sparkdemo

Task launch Cluster Task completion tinyurl.com/sparkdemo

Task launch delay Cluster Task completion tinyurl.com/sparkdemo

Spark Today User 1 User 2 User 3 Spark Context Query Compilation Storage Scheduling

Future Spark User 1 Query compilation Benefits: User 2 Query compilation High throughput Fault tolerance User 3 Query compilation

Future Spark User 1 Query compilation Storage: User 2 Query compilation Tachyon User 3 Query compilation

Scheduling with Sparrow Stage

Batch Sampling Stage 4 probes (d = 2) Place m tasks on the least loaded of 2m workers

Queue length poor predictor of wait time 80 ms 155 ms 530 ms Poor performance on heterogeneous workloads

Late Binding Stage 4 probes (d = 2) Place m tasks on the least loaded of d m workers

Late Binding Stage requests task Place m tasks on the least loaded of d m workers

What about constraints?

Per-Task Constraints Stage Probe separately for each task

Technique Recap Batch sampling + Late binding + Constraints

How well does Sparrow perform?

How does Sparrow compare to Spark s native scheduler? )*+,-.+*!/01*!21+3!("""!'"""!&"""!%"""!$""" :,485!.490;*!+<=*>7?*8!#""" :,488-@ A>*4?!"!("""!'"""!&"""!%"""!$""" /4+5!678490-.!21+3!#"""!" 100 16-core EC2 nodes, 10 tasks/job, 10 schedulers, 80% load

TPC-H Queries: Background TPC-H: Common benchmark for analytics workloads Shark: SQL execution engine Spark Sparrow

*+,-./,+!012+!32,4!'"""!&#""!&"""!%#""!%"""!$#""!$"""!#""!" TPC-H Queries *:/6.2 ;-:<<.= >6+:? '%$5!32+674 #&8)!32+674 599$!32+674 (& (' () ($% 100 16-core EC2 nodes, 10 schedulers, 80% load Percentiles 95 Within 12% of ideal Median queuing delay of 9ms 75 50 25 5

Policy Enforcement Priorities Serve queues based on strict priorities Fair Shares Serve queues using weighted fair queuing High Priority Low Priority User A (75%) User B (25%)

Weighted Fair Sharing ()**+*,!-./0/!'""!&#"!&""!%#"!%""!$#"!$""!#"!" 5/26!" 5/26!$!"!$"!%"!&"!'"!#" -+12!3/4

Fault Tolerance Spark Client 1 Spark Client 2 1 2 01,23!2,.456.,!7*+,!-+./!&"""!%"""!$"""!#"""!"!&"""!%"""!$"""!#"""!" Timeout: 100ms Failover: 5ms Re-launch queries: 15ms 89*:12, ;492<!=:*,67!# ;492<!=:*,67!$!"!#"!$"!%"!&"!'"!(" )*+,!-./

Making Sparrow feature-complete Interfacing with UI Delay scheduling Speculation

(1) Diagnosing a Spark scheduling bottleneck (2) Distributed, faulttolerant scheduling with Sparrow www.github.com/radlab/sparrow