Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan

Size: px

Start display at page:

Download "Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan"

Phoebe Cain
6 years ago
Views:

1 Coflow Recent Advances and What s Next? Mosharaf Chowdhury University of Michigan

2 Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow Networking Open Source Apache Spark Open Source Cluster File System Facebook Proactive Analytics Before You Think! Resource Allocation DAG Scheduling Cluster Caching Microsoft Apache YARN Alluxio Fast Analytics Over the WAN

3 Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing < 0.01 ms ~ 1 ms > 100 ms

4 Big Data The volume of data businesses want to make sense of is increasing Increasing variety of sources Web, mobile, wearables, vehicles, scientific, Cheaper disks, SSDs, and memory Stalling processor speeds

5 Big Datacenters for Massive Parallelism BlinkDB Storm Pregel DryadLINQ MapReduce 2005 Hadoop Dryad GraphLab Spark Spark-Streaming GraphX Dremel Hive Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, NSDI

6 Distributed Data-Parallel Applications Multi-stage dataflow Computation interleaved with communication Computation Stage (e.g., Map, Reduce) Distributed across many machines Tasks run in parallel Communication Stage (e.g., Shuffle) Reduce Stage A communication stage cannot complete until all the data have been transferred Between successive computation stages Map Stage

7 Communication is Crucial Performance Facebook jobs spend ~25% of runtime on average in intermediate comm. 1 As SSD-based and in-memory systems proliferate, the network is likely to become the primary bottleneck 1. Based on a month-long trace with 20,000 jobs and 150 Million tasks, collected from a 000-machine Facebook production MapReduce cluster.

8 Flow Transfers data from a source to a destination Independent unit of allocation, sharing, load balancing, and/or prioritization Faster Communication Stages: Traditional Networking Approach

9 Existing Solutions WFQ CSFQ D DeTail PDQ pfabric GPS RED ECN XCP RCP DCTCP D 2 TCP FCP 1980s 1990s 2000s Per-Flow Fairness Flow Completion Time Independent flows cannot capture the collective communication behavior common in data-parallel applications

10 Why Do They Fall Short? r 1 r s 1 s 2 s 2 Datacenter Fabric 2 Input Links Output Links

11 Why Do They Fall Short? r 1 r 2 s r 1 s 1 s 2 s s 2 2 s 1 s 2 s Datacenter 2 r 2 s Fabric

12 Why Do They Fall Short? s r 1 s r 2 s Datacenter Fabric Link to r 1 Link to r 2 Per-Flow Fair Sharing time 5 5 Shuffle Completion Time = 5 Avg. Flow Completion Time =.66 Solutions focusing on flow completion time cannot further decrease the shuffle completion time

13 Improve Application-Level Performance 1 s 1 s 2 s 1 2 Datacenter Fabric 1 2 r 1 r 2 Slow down faster flows to accelerate slower flows Link to r 1 Per-Flow Fair Sharing 5 Shuffle Completion Time = 5 Link to r 1 Data-Proportional Per-Flow Fair Sharing Allocation Shuffle Completion Time = 4 Link to r time 5 Avg. Flow Completion Time =.66 Link to r time Avg. Flow Completion Time = 4 1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM 2011.

14 Coflow Communication abstraction for data-parallel applications to express their performance goals 1. Size of each flow; 2. Total number of flows;. Endpoints of individual flows; 4. Dependencies between coflows;

15 Broadcast Aggregation All-to-All Single Flow Shuffle Parallel Flows

16 1 1 for faster #1 completion of coflows? How to schedule coflows online to meet #2 more deadlines? N. Datacenter. N for fair # allocation of the network?

17 Varys, Aalo & HUG Coflow Scheduler 2. Global Coordination. The Coflow API Faster, application-aware data transfers throughout the network Consistent calculation and enforcement of scheduler decisions Decouples network optimizations from applications, relieving developers and end users 1. Efficient Coflow Scheduling with Varys, SIGCOMM Efficient Coflow Scheduling Without Prior Knowledge, SIGCOMM HUG: Multi-Resource Fairness for Correlated and Elastic Demands, NSDI 2016.

18 Benefits of Inter-Coflow Scheduling Coflow 1 Coflow 2 Link 2 Link 1 Units 6 Units 2 Units Fair Sharing Smallest-Flow First 1,2 The Optimal L2 L1 L2 L1 L2 L time Coflow1 comp. time = 5 Coflow2 comp. time = time Coflow1 comp. time = 5 Coflow2 comp. time = time Coflow1 comp. time = Coflow2 comp. time = 6 1. Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM pfabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM 201.

19 Inter-Coflow Scheduling is NP-Hard Coflow 1 Coflow 2 Link 2 Link 1 Units 6 Units 2 Units Concurrent Open Shop Scheduling with Coupled Resources Examples include job scheduling and caching blocks Solutions use a ordering heuristic Consider matching constraints 6 2 Input Links 1 2 Output Links 1 2 Datacenter

20 Many Problems to Solve Clairvoyant Objective Optimal Varys Yes Min CCT No Aalo No Min CCT No HUG No Fair CCT Yes

21 Coflow-Based Architecture Centralized master-slave architecture Applications use a client library to communicate with the master Actual timing and rates are determined by the coflow scheduler Local Daemon Coflow Scheduler Master/Coordinator Local Daemon f Network Interface Coordination Computation tasks Local Daemon

22 Coflow API Change the applications At the very least, we need to know what a coflow is For clairvoyant versions, we need more information Changing the framework can enabled ALL jobs to take advantage of coflows DO NOT change the applications 1 Infer coflows from traffic network traffic patterns Design robust coflow scheduler that can tolerate misestimations Our current solution only works for coflows without dependencies; we need DAG support! 1. CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark, SIGCOMM 2016.

23 Performance Benefits of Using Coflows Overhead Over Varys Lower is Better , 4 Varys Per-Flow Fair FIFO Per-Flow Priority FIFO-LM Aalo NC Fairness Prioritization 1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM pfabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM Decentralized Task-Aware Scheduling for Data Center Networks, SIGCOMM 2014

24 The Need for Coordination Coordination is necessary to determine realtime Coflow size (sum); Coflow rates (max); Partial order of coflows (ordering); Can be a large source of overhead Does not impact too much for large coflows in slow networks, but How to perform decentralized coflow scheduling? Average Coordination Time (ms) # (Emulated) Aalo Slaves

25 Coflow-Aware Load Balancing Especially useful in asymmetric topologies For example, in the presence of switch or link failures Provides an additional degree of freedom During path selection For dynamically determining load balancing granularity Increased need for coordination, but at an even higher cost

26 Coflow-Aware Routing Relevant in topologies w/o full bisection bandwidth When topologies have temporary in-network oversubscriptions In geo-distributed analytics Scheduling-only solutions do not work well Calls for routing-scheduling joint solutions Must take network utilization into account Must avoid frequent path changes Increased need for coordination

27 Coflows in Circuit-Switched Networks Circuit switching is relevant again due to the rise of optical networks Provides very high bandwidth Expensive to setup new circuits Co-scheduling applications and coflows Schedule tasks so that we can reuse already-setup circuits Perform in-network aggregation using existing circuits instead of waiting for new circuits to be created

28 Extension to Multiple Resources 1 A DAG of coflows is very similar to a job DAG of stages Same principle applies, but with new challenges Consider both fungible (b/w) and non-fungible resources (cores) Across the entire DAG 1. Altruistic Scheduling in Multi-Resource Clusters, OSDI2016.

29 Coflow Communication abstraction for data-parallel applications to express their performance goals Key open challenges 1. Better theoretical understanding 2. Efficient solutions to deal with decentralization, topologies, multi-resource settings, estimations over DAG, circuit-switching, etc. More information 1. Papers: 2. Software/simulator/workloads:

Coflow. Big Data. Data-Parallel Applications. Big Datacenters for Massive Parallelism. Recent Advances and What s Next?

Coflow. Big Data. Data-Parallel Applications. Big Datacenters for Massive Parallelism. Recent Advances and What s Next? Big Data Coflow The volume of data businesses want to make sense of is increasing Increasing variety of sources Recent Advances and What s Next? Web, mobile, wearables, vehicles, scientific, Cheaper disks,