Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
Outline The Spark scheduling bottleneck Sparrow s fully distributed, fault-tolerant technique Sparrow s near-optimal performance
Spark Today User 1 User 2 User 3 Spark Context Query Compilation Storage Scheduling
Spark Today User 1 User 2 User 3 Spark Context Query Compilation Storage Scheduling
2004: MapReduce batch job 2010: Dremel Query 2009: Hive query 2012: Impala query 2010: In-memory Spark query 2013: Spark streaming 10 min. 10 sec. 100 ms 1 ms Job Latencies Rapidly Decreasing
Job latencies rapidly decreasing
Job latencies rapidly decreasing + Spark deployments growing in size Scheduling bottleneck!
Spark scheduler throughput: 1500 tasks / second Task Duration Cluster size (# 16-core machines) 10 second 1000 1 second 100 100 ms 10
Optimizing the Spark 0.8: Monitoring code moved off critical path 0.8.1: Result deserialization moved off critical path Future improvements may yield 2-3x higher throughput
Is the scheduler the bottleneck in my cluster? tinyurl.com/sparkdemo
Task launch Cluster Task completion tinyurl.com/sparkdemo
Task launch Cluster Task completion tinyurl.com/sparkdemo
Task launch delay Cluster Task completion tinyurl.com/sparkdemo
Spark Today User 1 User 2 User 3 Spark Context Query Compilation Storage Scheduling
Future Spark User 1 Query compilation Benefits: User 2 Query compilation High throughput Fault tolerance User 3 Query compilation
Future Spark User 1 Query compilation Storage: User 2 Query compilation Tachyon User 3 Query compilation
Scheduling with Sparrow Stage
Batch Sampling Stage 4 probes (d = 2) Place m tasks on the least loaded of 2m workers
Queue length poor predictor of wait time 80 ms 155 ms 530 ms Poor performance on heterogeneous workloads
Late Binding Stage 4 probes (d = 2) Place m tasks on the least loaded of d m workers
Late Binding Stage 4 probes (d = 2) Place m tasks on the least loaded of d m workers
Late Binding Stage requests task Place m tasks on the least loaded of d m workers
What about constraints?
Per-Task Constraints Stage Probe separately for each task
Technique Recap Batch sampling + Late binding + Constraints
How well does Sparrow perform?
How does Sparrow compare to Spark s native scheduler? )*+,-.+*!/01*!21+3!("""!'"""!&"""!%"""!$""" :,485!.490;*!+<=*>7?*8!#""" :,488-@ A>*4?!"!("""!'"""!&"""!%"""!$""" /4+5!678490-.!21+3!#"""!" 100 16-core EC2 nodes, 10 tasks/job, 10 schedulers, 80% load
TPC-H Queries: Background TPC-H: Common benchmark for analytics workloads Shark: SQL execution engine Spark Sparrow
*+,-./,+!012+!32,4!'"""!&#""!&"""!%#""!%"""!$#""!$"""!#""!" TPC-H Queries *:/6.2 ;-:<<.= >6+:? '%$5!32+674 #&8)!32+674 599$!32+674 (& (' () ($% 100 16-core EC2 nodes, 10 schedulers, 80% load Percentiles 95 Within 12% of ideal Median queuing delay of 9ms 75 50 25 5
Policy Enforcement Priorities Serve queues based on strict priorities Fair Shares Serve queues using weighted fair queuing High Priority Low Priority User A (75%) User B (25%)
Weighted Fair Sharing ()**+*,!-./0/!'""!&#"!&""!%#"!%""!$#"!$""!#"!" 5/26!" 5/26!$!"!$"!%"!&"!'"!#" -+12!3/4
Fault Tolerance Spark Client 1 Spark Client 2 1 2 01,23!2,.456.,!7*+,!-+./!&"""!%"""!$"""!#"""!"!&"""!%"""!$"""!#"""!" Timeout: 100ms Failover: 5ms Re-launch queries: 15ms 89*:12, ;492<!=:*,67!# ;492<!=:*,67!$!"!#"!$"!%"!&"!'"!(" )*+,!-./
Making Sparrow feature-complete Interfacing with UI Delay scheduling Speculation
(1) Diagnosing a Spark scheduling bottleneck (2) Distributed, faulttolerant scheduling with Sparrow www.github.com/radlab/sparrow