Exploiting Inter-Flow Relationship for Coflow Placement in Data Centers. Xin Sunny Huang, T. S. Eugene Ng Rice University
|
|
- Bartholomew Bond
- 5 years ago
- Views:
Transcription
1 Exploiting Inter-Flow Relationship for Coflow Placement in Data Centers Xin Sunny Huang, T S Eugene g Rice University
2 This Work Optimizing Coflow performance has many benefits such as avoiding application straggles [,] and improving resource utilization [,] Coflow placement is an unexplored, important factor to determine Coflow performance D-Placement leverages inter-flow relationship to find good placement for Coflows [] Orchestra (SIGCOMM ) [] Varys (SIGCOMM ) [] CARBYE (OSDI 6) [] YAR-ME (memory elasticity, in ATC 7)
3 Coflow Coflow [] : A set of parallel flows Produced by distributed applications (eg Hadoop & Spark) Performance is measured by Coflow Completion Time (CCT), ie the slowest flow s completion time Coflow # (shuffle) Coflow # (aggregation) Coflow # (broadcast) [] Chowdhury, M et al Coflow: An application layer abstraction for cluster networking (Hotets )
4 Coflow Scheduling Prior works demonstrate benefits of Coflow scheduling Limitation: Assume predetermined placement for Coflows, ie predetermined sender/receiver locations - - Existing Varys (SIGCOMM ), Aalo (SIGCOMM 5), CODA (SIGCOMM 6) and Sunflow (CoEXT 6), etc
5 Coflow Scheduling Prior works demonstrate benefits of Coflow scheduling Limitation: Assume predetermined placement for Coflows, ie predetermined sender/receiver locations - - Existing ewly arriving Varys (SIGCOMM ), Aalo (SIGCOMM 5), CODA (SIGCOMM 6) and Sunflow (CoEXT 6), etc 5
6 Coflow Placement Coflow placement can be flexible (eg cluster scheduler to choose machines for tasks in a stage) Placement and scheduling decide Coflow performance - - 6
7 Coflow Placement Coflow placement can be flexible (eg cluster scheduler to choose machines for tasks in a stage) Placement and scheduling decide Coflow performance
8 Coflow placement can be flexible (eg cluster scheduler to choose machines for tasks in a stage) Placement and scheduling decide Coflow performance 8 Coflow Placement
9 Coflow placement can be flexible (eg cluster scheduler to choose machines for tasks in a stage) Placement and scheduling decide Coflow performance 9 Coflow Placement
10 Coflow Placement Coflow placement can be flexible (eg cluster scheduler to choose machines for tasks in a stage) Placement and scheduling decide Coflow performance Finding input/output ports to place sender/receiver tasks for a newly arrival Coflow
11 Coflow Placement This work: good placement under Coflow placement can be optimal flexible scheduling (eg cluster scheduler to choose machines for tasks in a stage) Placement and scheduling decide Coflow performance Finding input/output ports to place sender/receiver tasks for a newly arrival Coflow
12 Coflow Placement Constrained by Inter-Flow Relationship Within a Coflow, flows placement are dependent
13 Coflow Placement Constrained by Inter-Flow Relationship Within a Coflow, flows placement are dependent
14 Coflow Placement Constrained by Inter-Flow Relationship Within a Coflow, flows placement are dependent
15 Coflow Placement Constrained by Inter-Flow Relationship Within a Coflow, flows placement are dependent 5
16 Coflow Placement Constrained by Inter-Flow Relationship Within a Coflow, flows placement are dependent 6
17 Challenge #: Intra-Coflow Bottleneck Delay s 0 0 r s 0 r s 50 How to place? s C 0 s 0 s 0 50 r r etwork with C in out 7
18 Challenge #: Intra-Coflow Bottleneck Delay s s r s 50 s s 0 C 0 s 0 50 r r r How to place? Only consider C : C is prioritized under optimal scheduling, and thus C is not sensitive to C etwork with C in out 8
19 Challenge #: Intra-Coflow Bottleneck Delay etwork with C C in s 0 How to place? s 0 s 0 50 r r out in Optimal out Bottleneck at r out, out, out: less bandwidth Place r at less busy port out 9
20 Challenge #: Inter-Coflow Bottleneck Contentions s C 0 s 0 s 0 r How to place? in out in Optimal out In-cast bottleneck at r in, out, out: heavily delay C (priority: C >C >C ) Place r at less busy port out 0
21 Summary: Keys to Coflow Placement Intra-Coflow Inter-Coflow Avoid delaying critical endpoints (bottleneck) Avoid contentions among critical endpoints
22 D-Placement Intra-Coflow Inter-Coflow Step : Calculate endpoint demand Identify critical endpoints that require better placement
23 D-Placement Intra-Coflow Step : Calculate endpoint demand Inter-Coflow Step : Calculate load on ports Identify critical endpoints that require better placement Find ports with less contentions
24 D-Placement Intra-Coflow Step : Calculate endpoint demand Inter-Coflow Step : Calculate load on ports Identify critical endpoints that require better placement Find ports with less contentions Avoid contentions on critical endpoints Step : Place heavily loaded endpoints on less loaded ports!
25 D-Placement Intra-Coflow Inter-Coflow r C r s s s in etwork with C out 5
26 D-Placement Intra-Coflow Inter-Coflow Step : Calculate endpoint demand r C r s s s in etwork with C out 6
27 D-Placement Intra-Coflow Step : Calculate endpoint demand Inter-Coflow Step : Calculate load on ports r C r s s s in etwork with C out 0 7
28 D-Placement Intra-Coflow Step : Calculate endpoint demand Inter-Coflow Step : Calculate load on ports r C r s s s in etwork with C out 0 Step : Place heavily loaded endpoints on less loaded ports! 8
29 D-Placement Intra-Coflow Step : Calculate endpoint demand Inter-Coflow Step : Calculate load on ports r C r s s s in etwork with C out 0 Step : Place heavily loaded endpoints on less loaded ports! 9
30 D-Placement Intra-Coflow Step : Calculate endpoint demand Inter-Coflow Step : Calculate load on ports r C r s s s in etwork with C out 0 Step : Place heavily loaded endpoints on less loaded ports! 0
31 D-Placement Intra-Coflow Step : Calculate endpoint demand Inter-Coflow Step : Calculate load on ports r C r s s s in etwork with C out 90 0 Step : Place heavily loaded endpoints on less loaded ports!
32 D-Placement Intra-Coflow Step : Calculate endpoint demand Inter-Coflow Step : Calculate load on ports r C r s s s in etwork with C out Step : Place heavily loaded endpoints on less loaded ports!
33 D-Placement Intra-Coflow Step : Calculate endpoint demand Inter-Coflow Step : Calculate load on ports r C r s s s in etwork with C out Step : Place heavily loaded endpoints on less loaded ports!
34 Intra-Coflow Step : Calculate endpoint demand D-Placement Greedy heuristic Inter-Coflow Step : Calculate load on ports r C r s s s in etwork with C out Step : Place heavily loaded endpoints on less loaded ports!
35 Simulation setup Implemented a flow-level, discrete-event simulator Workload [] : realistic trace derived from Facebook cluster hr traffic trace, > 500 Coflows, > 700,000 flows Baseline: flow-by-flow placement for Coflows (eat [] ) Coflow schedulers: Aalo [] (this talk) and Varys [] (paper), both designed to minimize average CCT by prioritizing small Coflows to avoid HOL blocking [] Varys (SIGCOMM ) [] Aalo (SIGCOMM 5) [] eat (CoEXT 6) 5
36 Improvement in Average CCT D-Placement s average-cct over eat s average-cct Aalo Lower is better x05 x075 x x5 x5 Traffic Scale Factor D-Placement improves over eat by up to % under Aalo Scheduling 6
37 Improvement in Individual CCT CCT reduction (second) Individual CCT Reduction by D-Placement from eat Reduction = Aalo Higher is better sec Small Coflows are prioritized and less sensitive to placement Large Coflows are harder to place and more sensitive to placement Ratio of Coflow bottleneck L over link bandwidth B (second) For large Coflows, D-Placement is only 085 of eat under Aalo scheduling 7
38 More in paper: Results under Varys scheduling, Sensitivity to Schedulers, 8
39 Conclusions First study on Coflow placement, which has decisive impact on Coflow performance Coflow placement is more challenging due to inter-flow dependency D-Placement leverages inter-flow relationship to find good placement for Coflows Thank You! 9
40 Thank You! Xin Sunny Huang, T S Eugene g Rice University 0
41 Backup slides
42 Sensitivity to Schedulers D-Placement s improvement over eat is usually larger under Aalo scheduling Aalo, due to lack of precise information of Coflow size, may allow temporary violation of the smallest- Coflow-first priority eat optimizes placement based on a specific traffic priority used for scheduling Thus it is prone to error in scheduling dynamics during runtime D-Placement optimizes placement in a more general case independent of the scheduling
43 Improvement in Average CCT D-Placement s average-cct over eat s average-cct Aalo Varys Lower is better x05 x075 x x5 x5 Traffic Scale Factor D-Placement improves over eat by up to 6%
44 Improvement in Individual CCT CCT reduction (second) Individual CCT Reduction by D-Placement from eat Aalo Varys Ratio of Coflow bottleneck L over link bandwidth B (second) For large Coflows, D-Placement is only 085 (09 ) of eat under Aalo (Varys) scheduling
45 Thank You! Xin Sunny Huang, T S Eugene g Rice University 5
Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan
Coflow Recent Advances and What s Next? Mosharaf Chowdhury University of Michigan Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow Networking Open Source Apache Spark Open
More informationA Network-aware Scheduler in Data-parallel Clusters for High Performance
A Network-aware Scheduler in Data-parallel Clusters for High Performance Zhuozhao Li, Haiying Shen and Ankur Sarker Department of Computer Science University of Virginia May, 2018 1/61 Data-parallel clusters
More informationCoflow. Big Data. Data-Parallel Applications. Big Datacenters for Massive Parallelism. Recent Advances and What s Next?
Big Data Coflow The volume of data businesses want to make sense of is increasing Increasing variety of sources Recent Advances and What s Next? Web, mobile, wearables, vehicles, scientific, Cheaper disks,
More informationVarys. Efficient Coflow Scheduling. Mosharaf Chowdhury, Yuan Zhong, Ion Stoica. UC Berkeley
Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley Communication is Crucial Performance Facebook analytics jobs spend 33% of their runtime in communication 1 As in-memory
More information6.888 Lecture 8: Networking for Data Analy9cs
6.888 Lecture 8: Networking for Data Analy9cs Mohammad Alizadeh ² Many thanks to Mosharaf Chowdhury (Michigan) and Kay Ousterhout (Berkeley) Spring 2016 1 Big Data Huge amounts of data being collected
More informationSiphon: Expediting Inter-Datacenter Coflows in Wide-Area Data Analytics. Shuhao Liu, Li Chen, Baochun Li University of Toronto July 12, 2018
Siphon: Expediting Inter-Datacenter Coflows in Wide-Area Data Analytics Shuhao Liu, Li Chen, Baochun Li University of Toronto July 12, 2018 What is a Coflow? One stage in a data analytic job Map 1 Reduce
More informationSaath: Speeding up CoFlows by Exploiting the Spatial Dimension. Chengkok-Koh
Saath: Speeding up CoFlows by Exploiting the Spatial Dimension Akshay Jajoo Rohan Gandhi Y. Charlie Hu Chengkok-Koh 1 Analytics Jobs in Big Data Analytics jobs in data-centers Process huge amount of data
More informationSinbad. Leveraging Endpoint Flexibility in Data-Intensive Clusters. Mosharaf Chowdhury, Srikanth Kandula, Ion Stoica. UC Berkeley
Sinbad Leveraging Endpoint Flexibility in Data-Intensive Clusters Mosharaf Chowdhury, Srikanth Kandula, Ion Stoica UC Berkeley Communication is Crucial for Analytics at Scale Performance Facebook analytics
More informationSincronia: Near-Optimal Network Design for Coflows. Shijin Rajakrishnan. Joint work with
Sincronia: Near-Optimal Network Design for Coflows Shijin Rajakrishnan Joint work with Saksham Agarwal Akshay Narayan Rachit Agarwal David Shmoys Amin Vahdat Traditional Applications: Care about performance
More informationCoflow Efficiently Sharing Cluster Networks
Coflow Efficiently Sharing Cluster Networks Mosharaf Chowdhury Qualifying Exam, UC Berkeley Apr 11, 2013 Network Matters Typical Facebook jobs spend 33% of running time in communication Weeklong trace
More informationTowards Systematic Design of Enterprise Networks
Towards Systematic Design of Enterprise Networks Geoffrey Xie Naval Postgraduate School In collaboration with: Eric Sung, Xin Sun, and Sanjay Rao (Purdue Univ.) David Maltz (MSR) Copyright 2008 AT&T. All
More informationInformation-Agnostic Flow Scheduling for Commodity Data Centers. Kai Chen SING Group, CSE Department, HKUST May 16, Stanford University
Information-Agnostic Flow Scheduling for Commodity Data Centers Kai Chen SING Group, CSE Department, HKUST May 16, 2016 @ Stanford University 1 SING Testbed Cluster Electrical Packet Switch, 1G (x10) Electrical
More information15-744: Computer Networking. Data Center Networking II
15-744: Computer Networking Data Center Networking II Overview Data Center Topology Scheduling Data Center Packet Scheduling 2 Current solutions for increasing data center network bandwidth FatTree BCube
More informationOptimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa
Optimizing Network Performance in Distributed Machine Learning Luo Mai Chuntao Hong Paolo Costa Machine Learning Successful in many fields Online advertisement Spam filtering Fraud detection Image recognition
More informationApplication Placement and Demand Distribution in a Global Elastic Cloud: A Unified Approach
Application Placement and Demand Distribution in a Global Elastic Cloud: A Unified Approach 1 Hangwei Qian, 2 Michael Rabinovich 1 VMware 2 Case Western Reserve University 1 Introduction System Environment
More informationCamdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa
Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file
More informationAggregation on the Fly: Reducing Traffic for Big Data in the Cloud
Aggregation on the Fly: Reducing Traffic for Big Data in the Cloud Huan Ke, Peng Li, Song Guo, and Ivan Stojmenovic Abstract As a leading framework for processing and analyzing big data, MapReduce is leveraged
More informationTopic 6: SDN in practice: Microsoft's SWAN. Student: Miladinovic Djordje Date:
Topic 6: SDN in practice: Microsoft's SWAN Student: Miladinovic Djordje Date: 17.04.2015 1 SWAN at a glance Goal: Boost the utilization of inter-dc networks Overcome the problems of current traffic engineering
More informationCODA: Toward Automatically Identifying and Scheduling COflows in the DArk
: Toward Automatically Identifying and Scheduling COflows in the DArk Hong Zhang Li Chen Bairen Yi Kai Chen Mosharaf Chowdhury 2 Yanhui Geng 3 SING Group, Hong Kong University of Science and Technology
More informationDON T CRY OVER SPILLED RECORDS Memory elasticity of data-parallel applications and its application to cluster scheduling
DON T CRY OVER SPILLED RECORDS Memory elasticity of data-parallel applications and its application to cluster scheduling Călin Iorgulescu (EPFL), Florin Dinu (EPFL), Aunn Raza (NUST Pakistan), Wajih Ul
More informationHiTune. Dataflow-Based Performance Analysis for Big Data Cloud
HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241
More informationBigDataBench-MT: Multi-tenancy version of BigDataBench
BigDataBench-MT: Multi-tenancy version of BigDataBench Gang Lu Beijing Academy of Frontier Science and Technology BigDataBench Tutorial, ASPLOS 2016 Atlanta, GA, USA n Software perspective Multi-tenancy
More informationNetwork Traffic Characteristics of Data Centers in the Wild. Proceedings of the 10th annual conference on Internet measurement, ACM
Network Traffic Characteristics of Data Centers in the Wild Proceedings of the 10th annual conference on Internet measurement, ACM Outline Introduction Traffic Data Collection Applications in Data Centers
More informationInfiniswap. Efficient Memory Disaggregation. Mosharaf Chowdhury. with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin
Infiniswap Efficient Memory Disaggregation Mosharaf Chowdhury with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow
More informationMixApart: Decoupled Analytics for Shared Storage Systems
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto, NetApp Abstract Data analytics and enterprise applications have very
More informationSiphon: Expediting Inter-Datacenter Coflows in Wide-Area Data Analytics
Siphon: Expediting Inter-Datacenter Coflows in Wide-Area Data Analytics Shuhao Liu, Li Chen and Baochun Li Department of Electrical and Computer Engineering, University of Toronto Abstract It is increasingly
More informationKey aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling
Key aspects of cloud computing Cluster Scheduling 1. Illusion of infinite computing resources available on demand, eliminating need for up-front provisioning. The elimination of an up-front commitment
More informationMUD: Send me your top 1 3 questions on this lecture
Administrivia Review 1 due tomorrow Email your reviews to me Office hours on Thursdays 10 12 MUD: Send me your top 1 3 questions on this lecture Guest lectures next week by Prof. Richard Martin Class slides
More informationNaaS Network-as-a-Service in the Cloud
NaaS Network-as-a-Service in the Cloud joint work with Matteo Migliavacca, Peter Pietzuch, and Alexander L. Wolf costa@imperial.ac.uk Motivation Mismatch between app. abstractions & network How the programmers
More informationEfficient Coflow Scheduling with Varys
Efficient Coflow Scheduling with Varys Mosharaf Chowdhury 1, Yuan Zhong 2, Ion Stoica 1 1 UC Berkeley, 2 Columbia University {mosharaf, istoica}@cs.berkeley.edu, yz2561@columbia.edu ABSTRACT Communication
More informationA Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics
A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics Shuhao Liu*, Li Chen, Baochun Li, Aiden Carnegie University of Toronto April 17, 2018 Graph Analytics What is Graph Analytics? 2
More informationMaximizing Link Utilization with Coflow-Aware Scheduling in Datacenter Networks
Maximizing Link Utilization with Coflow-Aware Scheduling in Datacenter Networks Jingie Jiang, Shiyao Ma, Bo Li, Baochun Li, and Jiangchuan Liu Department of Computer Science and Engineering, Hong Kong
More informationSupporting Service Differentiation for Real-Time and Best-Effort Traffic in Stateless Wireless Ad-Hoc Networks (SWAN)
Supporting Service Differentiation for Real-Time and Best-Effort Traffic in Stateless Wireless Ad-Hoc Networks (SWAN) G. S. Ahn, A. T. Campbell, A. Veres, and L. H. Sun IEEE Trans. On Mobile Computing
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO Albis Pocket
More informationMulti-tenancy version of BigDataBench
Multi-tenancy version of BigDataBench Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Multi-tenancy
More informationWhen Creek Meets River: Exploiting High-Bandwidth Circuit Switch in Scheduling Multicast Data
... ToR ToR ToR ToR ToR n... ToR ToR ToR ToR ToR n When Creek Meets River: Exploiting High-Bandwidth Circuit Switch in Scheduling Multicast Data Xiaoye Steven Sun Rice University T. S. Eugene Ng Rice University
More informationFast and Accurate Load Balancing for Geo-Distributed Storage Systems
Fast and Accurate Load Balancing for Geo-Distributed Storage Systems Kirill L. Bogdanov 1 Waleed Reda 1,2 Gerald Q. Maguire Jr. 1 Dejan Kostic 1 Marco Canini 3 1 KTH Royal Institute of Technology 2 Université
More informationPacket Scheduling in Data Centers. Lecture 17, Computer Networks (198:552)
Packet Scheduling in Data Centers Lecture 17, Computer Networks (198:552) Datacenter transport Goal: Complete flows quickly / meet deadlines Short flows (e.g., query, coordination) Large flows (e.g., data
More informationNetwork layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai Routers.. A router consists - A set of input interfaces at which packets arrive - A set of output interfaces from which
More informationBuilding an Internet-Scale Publish/Subscribe System
Building an Internet-Scale Publish/Subscribe System Ian Rose Mema Roussopoulos Peter Pietzuch Rohan Murty Matt Welsh Jonathan Ledlie Imperial College London Peter R. Pietzuch prp@doc.ic.ac.uk Harvard University
More informationAn Implementation of the Homa Transport Protocol in RAMCloud. Yilong Li, Behnam Montazeri, John Ousterhout
An Implementation of the Homa Transport Protocol in RAMCloud Yilong Li, Behnam Montazeri, John Ousterhout Introduction Homa: receiver-driven low-latency transport protocol using network priorities HomaTransport
More informationTracker-based Peer Selection using ALTO Map Information
Tracker-based Peer Selection using ALTO Map Information draft-yang-tracker-peer-selection-00 Y. Richard Yang Richard Alimi, Ye Wang, David Zhang, Kai Lee Challenges Tracker Scalability Many peers distributed
More informationCAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters
2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing : Exploring Memory Locality for Big Data Analytics in Virtualized Clusters Eunji Hwang, Hyungoo Kim, Beomseok Nam and Young-ri
More informationExploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance Rachata Ausavarungnirun Saugata Ghose, Onur Kayiran, Gabriel H. Loh Chita Das, Mahmut Kandemir, Onur Mutlu Overview of This Talk Problem:
More informationKey aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling
Key aspects of cloud computing Cluster Scheduling 1. Illusion of infinite computing resources available on demand, eliminating need for up-front provisioning. The elimination of an up-front commitment
More informationBohr: Similarity Aware Geo-distributed Data Analytics. Hangyu Li, Hong Xu, Sarana Nutanong City University of Hong Kong
Bohr: Similarity Aware Geo-distributed Data Analytics Hangyu Li, Hong Xu, Sarana Nutanong City University of Hong Kong 1 Big Data Analytics Analysis Generate 2 Data are geo-distributed Frankfurt US Oregon
More informationSincronia: Near-Optimal Network Design for Coflows
Sincronia: Near-Optimal Network Design for Coflows Saksham Agarwal Cornell University Rachit Agarwal Cornell University Shijin Rajakrishnan Cornell University David Shmoys Cornell University Akshay Narayan
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO FS Albis Streaming
More informationAccelerating Analytical Workloads
Accelerating Analytical Workloads Thomas Neumann Technische Universität München April 15, 2014 Scale Out in Big Data Analytics Big Data usually means data is distributed Scale out to process very large
More informationShaping Deadline Coflows to Accelerate Non-Deadline Coflows
Shaping Deadline Coflows to Accelerate Non-Deadline Coflows Renhai Xu, Wenxin Li, Keqiu Li, Xiaobo Zhou Tianjin Key Laboratory of Advanced Networking, School of Computer Science and Technology, Tianjin
More informationDeadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen
Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen Presenter: Haiying Shen Associate professor *Department of Electrical and Computer Engineering, Clemson University,
More informationNetwork traffic: Scaling
Network traffic: Scaling 1 Ways of representing a time series Timeseries Timeseries: information in time domain 2 Ways of representing a time series Timeseries FFT Timeseries: information in time domain
More informationPricing Intra-Datacenter Networks with
Pricing Intra-Datacenter Networks with Over-Committed Bandwidth Guarantee Jian Guo 1, Fangming Liu 1, Tao Wang 1, and John C.S. Lui 2 1 Cloud Datacenter & Green Computing/Communications Research Group
More informationScalable Constraint-based Virtual Data Center Allocation
Scalable Constraint-based Virtual Data Center Allocation Sam Bayless Nodir Kodirov Ivan Beschastnikh Holger H. Hoos Alan J. Hu Computer Science University of British Columbia Data centers, data centers,
More informationCoflourish: An SDN-Assisted Coflow Scheduling Framework for Clouds
Coflourish: An SDN-Assisted Coflow Scheduling Framework for Clouds Chui-Hui Chiu, Dipak Kumar Singh, Qingyang Wang, Seung-Jong Park Division of Computer Science and Engineering, Center for Computation
More informationDevoFlow: Scaling Flow Management for High-Performance Networks
DevoFlow: Scaling Flow Management for High-Performance Networks Andy Curtis Jeff Mogul Jean Tourrilhes Praveen Yalagandula Puneet Sharma Sujata Banerjee Software-defined networking Software-defined networking
More informationShen, Tang, Yang, and Chu
Integrated Resource Management for Cluster-based Internet s About the Authors Kai Shen Hong Tang Tao Yang LingKun Chu Published on OSDI22 Presented by Chunling Hu Kai Shen: Assistant Professor of DCS at
More informationReViNE: Reallocation of Virtual Network Embedding to Eliminate Substrate Bottleneck
ReViNE: Reallocation of Virtual Network Embedding to Eliminate Substrate Bottleneck Shihabur R. Chowdhury, Reaz Ahmed, Nashid Shahriar, Aimal Khan, Raouf Boutaba Jeebak Mitra, Liu Liu Virtual Network Embedding
More informationTowards Makespan Minimization Task Allocation in Data Centers
Towards Makespan Minimization Task Allocation in Data Centers Kangkang Li, Ziqi Wan, Jie Wu, and Adam Blaisse Department of Computer and Information Sciences Temple University Philadelphia, Pennsylvania,
More informationStadium. A Distributed Metadata-private Messaging System. Matei Zaharia Nickolai Zeldovich SOSP 2017
Stadium A Distributed Metadata-private Messaging System Nirvan Tyagi Yossi Gilad Derek Leung Matei Zaharia Nickolai Zeldovich SOSP 2017 Previous talk: Anonymous broadcast This talk: Private messaging Alice
More informationTo Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai, Mosharaf Chowdhury, Harsha Madhyastha
To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai, Mosharaf Chowdhury, Harsha Madhyastha Background Over 40 Data Centers (DCs) on EC2, Azure, Google Cloud A geographically denser set of DCs across
More informationVNF Chain Allocation and Management at Data Center Scale
VNF Chain Allocation and Management at Data Center Scale Internet Cloud Provider Tenants Nodir Kodirov, Sam Bayless, Fabian Ruffy, Ivan Beschastnikh, Holger Hoos, Alan Hu Network Functions (NF) are useful
More informationBolt: I Know What You Did Last Summer In the Cloud
Bolt: I Know What You Did Last Summer In the Cloud Christina Delimitrou1 and Christos Kozyrakis2 1Cornell University, 2Stanford University Platform Lab Review February 2018 Executive Summary Problem: cloud
More informationTowards Makespan Minimization Task Allocation in Data Centers
Towards Makespan Minimization Task Allocation in Data Centers Kangkang Li, Ziqi Wan, Jie Wu, and Adam Blaisse Department of Computer and Information Sciences Temple University Philadelphia, Pennsylvania,
More informationCSE 344 MAY 2 ND MAP/REDUCE
CSE 344 MAY 2 ND MAP/REDUCE ADMINISTRIVIA HW5 Due Tonight Practice midterm Section tomorrow Exam review PERFORMANCE METRICS FOR PARALLEL DBMSS Nodes = processors, computers Speedup: More nodes, same data
More informationBolt: I Know What You Did Last Summer In the Cloud
Bolt: I Know What You Did Last Summer In the Cloud Christina Delimitrou 1 and Christos Kozyrakis 2 1 Cornell University, 2 Stanford University ASPLOS April 12 th 2017 Executive Summary Problem: cloud resource
More informationVenice: Reliable Virtual Data Center Embedding in Clouds
Venice: Reliable Virtual Data Center Embedding in Clouds Qi Zhang, Mohamed Faten Zhani, Maissa Jabri and Raouf Boutaba University of Waterloo IEEE INFOCOM Toronto, Ontario, Canada April 29, 2014 1 Introduction
More informationEE 122: Router Design
Routers EE 22: Router Design Kevin Lai September 25, 2002.. A router consists - A set of input interfaces at which packets arrive - A set of output interfaces from which packets depart - Some form of interconnect
More informationFar-sighted Multi-stage Aware Coflow Scheduling
Far-sighted Multi-stage Aware Coflow Scheduling Shuai Zhang, Sheng Zhang, Xiaoda Zhang, Zhuzhong Qian Mingjun Xiao, Jie Wu, Jidong Ge, Xiaoliang Wang State Key Lab. for Novel Software Technology, Nanjing
More informationSummary Cache based Co-operative Proxies
Summary Cache based Co-operative Proxies Project No: 1 Group No: 21 Vijay Gabale (07305004) Sagar Bijwe (07305023) 12 th November, 2007 1 Abstract Summary Cache based proxies cooperate behind a bottleneck
More informationPacket-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel
Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO Oliver Michel Network monitoring is important Security issues Performance issues Equipment failure Analytics Platform
More informationGaia: Geo-Distributed Machine Learning Approaching LAN Speeds
Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Kevin Hsieh Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, Onur Mutlu Machine Learning and Big
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream Processing Topics Model Issues System Issues Distributed Processing Web-Scale Streaming 3 System Issues Architecture
More informationPitfalls for ISP-friendly P2P design. Michael Piatek*, Harsha V. Madhyastha, John P. John*, Arvind Krishnamurthy*, Thomas Anderson* *UW, UCSD
Pitfalls for ISP-friendly P2P design Michael Piatek*, Harsha V. Madhyastha, John P. John*, Arvind Krishnamurthy*, Thomas Anderson* *UW, UCSD P2P & ISPs P2P systems: Large volume of traffic (20 80% of total)
More informationPROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP
ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge
More informationFair Coflow Scheduling without Prior Knowledge
Fair Coflow Scheduling without Prior Knowledge Luping Wang, Wei Wang Hong Kong University of Science and Technology {lwangbm, weiwa}@cse.ust.h bstract Coflow scheduling improves the networing performance
More informationDCRoute: Speeding up Inter-Datacenter Traffic Allocation while Guaranteeing Deadlines
DCRoute: Speeding up Inter-Datacenter Traffic Allocation while Guaranteeing Deadlines Mohammad Noormohammadpour, Cauligi S. Raghavendra Ming Hsieh Department of Electrical Engineering University of Southern
More informationEnergy Consumption in Mobile Phones: A Measurement Study and Implications for Network Applications (IMC09)
Energy Consumption in Mobile Phones: A Measurement Study and Implications for Network Applications (IMC09) Niranjan Balasubramanian Aruna Balasubramanian Arun Venkataramani University of Massachusetts
More informationEnd-to-End Mechanisms for QoS Support in Wireless Networks
End-to-End Mechanisms for QoS Support in Wireless Networks R VS Torsten Braun joint work with Matthias Scheidegger, Marco Studer, Ruy de Oliveira Computer Networks and Distributed Systems Institute of
More informationApplication-Aware SDN Routing for Big-Data Processing
Application-Aware SDN Routing for Big-Data Processing Evaluation by EstiNet OpenFlow Network Emulator Director/Prof. Shie-Yuan Wang Institute of Network Engineering National ChiaoTung University Taiwan
More informationLocality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Virtual Clusters on Cloud } Private cluster on public cloud } Distributed
More informationPAC485 Managing Datacenter Resources Using the VirtualCenter Distributed Resource Scheduler
PAC485 Managing Datacenter Resources Using the VirtualCenter Distributed Resource Scheduler Carl Waldspurger Principal Engineer, R&D This presentation may contain VMware confidential information. Copyright
More informationToward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017
Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store Wei Xie TTU CS Department Seminar, 3/7/2017 1 Outline General introduction Study 1: Elastic Consistent Hashing based Store
More informationGeneric Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture
Generic Architecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,
More informationGeneric Topology Mapping Strategies for Large-scale Parallel Architectures
Generic Topology Mapping Strategies for Large-scale Parallel Architectures Torsten Hoefler and Marc Snir Scientific talk at ICS 11, Tucson, AZ, USA, June 1 st 2011, Hierarchical Sparse Networks are Ubiquitous
More informationDC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale. ACM Symposium on Cloud Computing 2018 Ian A Kash, Greg O Shea, Stavros Volos
DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale ACM Symposium on Cloud Computing 2018 Ian A Kash, Greg O Shea, Stavros Volos 1 Public Cloud DC hosting enterprise customers O(100K) servers,
More informationIncorporating DMA into QoS Policies for Maximum Performance in Shared Memory Systems. Scott Marshall and Stephen Twigg
Incorporating DMA into QoS Policies for Maximum Performance in Shared Memory Systems Scott Marshall and Stephen Twigg 2 Problems with Shared Memory I/O Fairness Memory bandwidth worthless without memory
More informationApplication of SDN: Load Balancing & Traffic Engineering
Application of SDN: Load Balancing & Traffic Engineering Outline 1 OpenFlow-Based Server Load Balancing Gone Wild Introduction OpenFlow Solution Partitioning the Client Traffic Transitioning With Connection
More informationTetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters
TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters Alexey Tumanov Timothy Zhu, Jun Woo Park, Michael Kozuch, Mor Harchol-Balter, Gregory R. Ganger http://www.pdl.cmu.edu
More informationAdding Capacity Points to a Wireless Mesh Network Using Local Search
Adding Capacity Points to a Wireless Mesh Network Using Local Search Joshua Robinson, Mustafa Uysal, Ram Swaminathan, Edward Knightly Rice University & HP Labs INFOCOM 2008 Multi-Tier Mesh Architecture
More informationABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve
More informationIntroduction to ATM Technology
Introduction to ATM Technology ATM Switch Design Switching network (N x N) Switching network (N x N) SP CP SP CP Presentation Outline Generic Switch Architecture Specific examples Shared Buffer Switch
More informationHybrid 2.0 In search of the holy grail
Hybrid 2.0 In search of the holy grail A Talk for OWASP BeNeLux by Roger Thornton Founder/CTO Fortify Software Inc 2008 All Right Reserved Fortify Software Inc. 2 Before we Begin: Expectations Objectives
More informationStatistics Driven Workload Modeling for the Cloud
UC Berkeley Statistics Driven Workload Modeling for the Cloud Archana Ganapathi, Yanpei Chen Armando Fox, Randy Katz, David Patterson SMDB 2010 Data analytics are moving to the cloud Cloud computing economy
More informationLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters Mosharaf Chowdhury 1, Srikanth Kandula 2, Ion Stoica 1 1 UC Berkeley, 2 Microsoft Research {mosharaf, istoica}@cs.berkeley.edu, srikanth@microsoft.com
More informationCS 425 / ECE 428 Distributed Systems Fall 2015
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Measurement Studies Lecture 23 Nov 10, 2015 Reading: See links on website All Slides IG 1 Motivation We design algorithms, implement
More informationCoded Distributed Computing: Fundamental Limits and Practical Challenges
Coded Distributed Computing: Fundamental Limits and Practical Challenges Songze Li, Qian Yu, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr Department of Electrical Engineering, University of Southern
More informationMixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage
More informationDatacenter Wide- area Enterprise
Datacenter Wide- area Enterprise Client LOAD- BALANCER Can t choose path : ( Servers Outline and goals A new architecture for distributed load-balancing joint (server, path) selection Demonstrate a nation-wide
More informationWorkload Characterization and Optimization of TPC-H Queries on Apache Spark
Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research
More information