High Node Count - Scalability Challenges for Interconnection Networks
|
|
- Valentine Perry
- 5 years ago
- Views:
Transcription
1 High Node Count - Scalability Challenges for Interconnection Networks Professor Olav Lysne Simula Research Laboratory
2 Overview Congestion control Fault Tolerance Scalable Modular Routing State Of The Art: State of Technology State of Knowledge State of Problem
3 CONGESTION CONTROL
4 Shared network resources could lead to network congestion and head-of-line (HOL) blocking. BECN FEC N Host Congestion tree - CCT - CCT Index Increase - CCT Index Min HOL blocked traffic (Victim) Switch - Threshold The InfiniBand CC mechanism - CCT relies Index Limit on a closed loop - Marking feedback Rate control systems to remove the - CCT congestion Index Timer tree. - Packet Size
5 Experiments show that the HOL blocking leads to performance degradation when CC is not activated.
6 The InfiniBand CC mechanism is able to remove both the HOL blocking and the parking lot problem. Parameter Values: Threshold 15 Marking Rate 1 Packet Size 8 CCTI Increase 1 CCTI Limit 127 CCTI Min 0 CCTI Timer 150 Without CC With CC
7 The average throughput of the victim flow as a function of the Marking_Rate (sw) and the CCTI_Timer (host).
8 The average combined throughput of the contributors as a function of the Marking_Rate and the CCTI_Timer.
9 Contributors may experience unfairness if an unfortunate CCTI_Timer value is chosen Contributors experience unfairness among each other for an extended periode of time each time a new contributer is added when an unfortunate timer is chosen. = (max value) (min value) Max value Min value.. The treatment variation variable : TVV = Var( 1, 2,..., n)
10 The treatment variation variable rules out a large part of the parameter space. Parameter Values: Threshold 15 Marking Rate 1 Packet Size 8 CCTI Increase 1 CCTI Limit 127 CCTI Min 0 CCTI Timer 150
11 InfiniBand Congestion Control in M9 (SUN DATACENTER INFINIBAND SWITCH 648) Gbps IBTA Specification 1.2 compliant 648 QDR/DDR/SDR 4x InfiniBand ports Three-stage internal full Clos network (non-blocking) Gbps!HS,!CC HS,!CC HS, CC HS, CC HS, CC QP!HS, CC 20% of the nodes send to everyone 80% of the nodes send to 8 hotspots Further simulation studies: - Different traffic patterns - Other topologies (M24: SUN DATA CENTER SWITCH 3456)!HS,!CC HS,!CC HS, CC HS, CC HS, CC QP!HS, CC
12 Congestion Control - State of the art State of technology InfiniBand Congestion Control Fecn/Becn Datacenter Ethernet - TBD Much more to be expected State of Knowledge Regional Explicit Congestion Notification Improvements on Fecn/Becn Parametrizations Dynamics? Impact on applications? Much more to do
13 Fault Tolerance -Living with faults Static Reconfiguration-based End-to-End Rerouting Local Rerouting
14 What is network deadlock? Deadlock is a cycle of packets all waiting for the next packet in the cycle to proceed before it can proceed itselg Routing functions may be deadlock free topologies may not for almost all topologies there exist reasonable but deadlocking routing functions, as well as reasonable and deadlock free routing functions.
15 Static Fault Tolerance Checkpoint Reconfigure Rollback Restart Requires topology agnostic routing algorithms LASH, TOR, LASH/TOR, L-turn, Segment-based, Up*/Down
16 Dynamic Reconfiguration A deadlock will contain old packets waiting behind new packets as well as new packets waiting Dependencies of Rold Key idea: make sure that new packets never wait
17 Dynamic Reconfiguration TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting Dependencies of Rold Key idea: make sure that new packets never wait
18 Dynamic Reconfiguration Dependencies of Rold STOP Depend. of Rnew A deadlock will contain old packets waiting behind new packets as well as new packets waiting Key idea: make sure that new packets never wait
19 Dynamic Reconfiguration Dependencies of Rold TOKEN TOKEN A deadlock will contain old packets waiting behind new packets as well as STOP Depend. of Rnew new packets waiting Key idea: make sure that new packets never wait
20 Dynamic Reconfiguration Dependencies of Rold TOKEN TOKEN A deadlock will contain old packets waiting behind new packets as well as STOP Depend. of Rnew new packets waiting Key idea: make sure that new packets never wait
21 Dynamic Reconfiguration Dependencies of Rold TOKEN TOKEN A deadlock will contain old packets waiting behind new packets as well as STOP Depend. of Rnew new packets waiting Key idea: make sure that new packets never wait
22 Dynamic Reconfiguration Dependencies of Rold TOKEN TOKEN A deadlock will contain old packets waiting behind new packets as well as STOP Depend. of Rnew new packets waiting Key idea: make sure that new packets never wait
23 Dynamic Reconfiguration TOKEN Dependencies of Rold STOP Depend. of Rnew A deadlock will contain old packets waiting behind new packets as well as new packets waiting Key idea: make sure that new packets never wait
24 Dynamic Reconfiguration TOKEN Dependencies of Rold STOP Depend. of Rnew A deadlock will contain old packets waiting behind new packets as well as new packets waiting Key idea: make sure that new packets never wait
25 Dynamic Reconfiguration Dependencies of Rold STOP Depend. of Rnew A deadlock will contain old packets waiting behind new packets as well as new packets waiting Key idea: make sure that new packets never wait TOKEN
26 Dynamic Reconfiguration Depend. of Rnew A deadlock will contain old packets waiting behind new packets as well as new packets waiting Key idea: make sure that new packets never wait
27 Dynamic Reconfiguration Depend. of Rnew A deadlock will contain old packets waiting behind new packets as well as new packets waiting Key idea: make sure that new packets never wait
28 Views Fully Connected Subnetworks for endpoint fault-tolerance The fat-tree is divided into a set of sub-networks. Each of these constitute a view.
29 Views Fully Connected Subnetworks for endpoint fault-tolerance Close-up of a subtree with 3 views One link is present in one, and only one, view Any path through the network is contained entirely within one view Only bottom-tier switches (and the endnode-connections) will contain traffic for serveral views.
30 FROOTS Dynamic Fault Tolerance Configuration 1 Configuration 2
31 Full VL for the non affected traffic, and two VLs for the traffic affected by faults.
32 Handling faults
33 Fault tolerance State Of The Art State of technology Topology agnostic routing algorithms (OFED) Static Reconfiguration with LASH (OFED) Endpoint Dynamic Reconfiguration (APM in IBA) State of Knowledge Dynamic Reconfiguration Local Rerouting New Compatible Routing function
34 Modularity of routing
35 What is the problem?
36 Dependencies aggregate c 1 c 2 The aggregated dependencies in a switch fabric must either be identified and removed, or taken into consideration in how the fabric is used
37 So... A configuration of Network of networks is free from deadlocks if its channel dependency graph extended with the aggregated dependencies in the switches is free from deadlocks.
38 Well what about local fault tolerance?
39 Modularity of routing State of technology Not present State of Knowledge Wide open but there is an approach
40 There is a way to do it better find it! T. A. Edison Simplicity is the ultimate sophistication. L.DaVinci
InfiniBand Congestion Control
InfiniBand Congestion Control Modelling and validation ABSTRACT Ernst Gunnar Gran Simula Research Laboratory Martin Linges vei 17 1325 Lysaker, Norway ernstgr@simula.no In a lossless interconnection network
More informationDATA CENTER FABRIC COOKBOOK
Do It Yourself! DATA CENTER FABRIC COOKBOOK How to prepare something new from well known ingredients Emil Gągała WHAT DOES AN IDEAL FABRIC LOOK LIKE? 2 Copyright 2011 Juniper Networks, Inc. www.juniper.net
More informationExploring InfiniBand Congestion Control
Exploring InfiniBand Congestion Control Ahmed Yusuf Mahamud Master s Thesis Spring 215 Exploring InfiniBand Congestion Control Ahmed Yusuf Mahamud May 18, 215 ii Abstract Congestion Control (CC) is used
More informationCongestion Management in Lossless Interconnects: Challenges and Benefits
Congestion Management in Lossless Interconnects: Challenges and Benefits José Duato Technical University of Valencia (SPAIN) Conference title 1 Outline Why is congestion management required? Benefits Congestion
More informationData Center Network Topologies II
Data Center Network Topologies II Hakim Weatherspoon Associate Professor, Dept of Computer cience C 5413: High Performance ystems and Networking April 10, 2017 March 31, 2017 Agenda for semester Project
More informationCongestion Management for Ethernet-based Lossless DataCenter Networks
Congestion Management for Ethernet-based Lossless DataCenter Networks Pedro Javier Garcia 1, Jesus Escudero-Sahuquillo 1, Francisco J. Quiles 1 and Jose Duato 2 1: University of Castilla-La Mancha (UCLM)
More informationFuture Routing Schemes in Petascale clusters
Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract
More informationRequirement Discussion of Flow-Based Flow Control(FFC)
Requirement Discussion of Flow-Based Flow Control(FFC) Nongda Hu Yolanda Yu hunongda@huawei.com yolanda.yu@huawei.com IEEE 802.1 DCB, Stuttgart, May 2017 www.huawei.com new-dcb-yolanda-ffc-proposal-0517-v01
More informationXCo: Explicit Coordination for Preventing Congestion in Data Center Ethernet
XCo: Explicit Coordination for Preventing Congestion in Data Center Ethernet Vijay Shankar Rajanna, Smit Shah, Anand Jahagirdar and Kartik Gopalan Computer Science, State University of New York at Binghamton
More informationCongestion Management in HPC
Congestion Management in HPC Interconnection Networks Pedro J. García Universidad de Castilla-La Mancha (SPAIN) Conference title 1 Outline Why may congestion become a problem? Should we care about congestion
More informationRouting and Fault-Tolerance Capabilities of the Fabriscale FM compared to OpenSM
Routing and Fault-Tolerance Capabilities of the Fabriscale FM compared to OpenSM Jesus Camacho Villanueva, Tor Skeie, and Sven-Arne Reinemo Fabriscale Technologies E-mail: {jesus.camacho,tor.skeie,sven-arne.reinemo}@fabriscale.com
More informationExpeditus: Congestion-Aware Load Balancing in Clos Data Center Networks
Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Peng Wang, Hong Xu, Zhixiong Niu, Dongsu Han, Yongqiang Xiong ACM SoCC 2016, Oct 5-7, Santa Clara Motivation Datacenter networks
More informationP802.1Qcz Congestion Isolation
P802.1Qcz Congestion Isolation IEEE 802 / IETF Workshop on Data Center Networking Bangkok November 2018 Paul Congdon (Huawei/Tallac) The Case for Low-latency, Lossless, Large-Scale DCNs More and more latency-sensitive
More informationTransport layer issues
Transport layer issues Dmitrij Lagutin, dlagutin@cc.hut.fi T-79.5401 Special Course in Mobility Management: Ad hoc networks, 28.3.2007 Contents Issues in designing a transport layer protocol for ad hoc
More informationAn Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing
An Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing Pedro Yébenes 1, Jesús Escudero-Sahuquillo 1, Pedro J. García 1, Francisco
More informationFast-Response Multipath Routing Policy for High-Speed Interconnection Networks
HPI-DC 09 Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks Diego Lugones, Daniel Franco, and Emilio Luque Leonardo Fialho Cluster 09 August 31 New Orleans, USA Outline Scope
More informationIEEE P802.1Qcz Proposed Project for Congestion Isolation
IEEE P82.1Qcz Proposed Project for Congestion Isolation IETF 11 London ICCRG Paul Congdon paul.congdon@tallac.com Project Background P82.1Qcz Project Initiation November 217 - Agreed to develop a Project
More informationCongestion in InfiniBand Networks
Congestion in InfiniBand Networks Philip Williams Stanford University EE382C Abstract The InfiniBand Architecture (IBA) is a relatively new industry-standard networking technology suited for inter-processor
More informationExtending commodity OpenFlow switches for large-scale HPC deployments
Extending commodity OpenFlow switches for large-scale HPC deployments Mariano Benito Enrique Vallejo Ramón Beivide Cruz Izu University of Cantabria The University of Adelaide Overview 1.Introduction 1.
More informationBaidu s Best Practice with Low Latency Networks
Baidu s Best Practice with Low Latency Networks Feng Gao IEEE 802 IC NEND Orlando, FL November 2017 Presented by Huawei Low Latency Network Solutions 01 1. Background Introduction 2. Network Latency Analysis
More informationRouting Verification Tools
Routing Verification Tools ibutils e.g. ibdmchk infiniband-diags e.g. ibsim, etc. Dave McMillen What do you verify? Did it work? Is it deadlock free? Does it distribute routes as expected? What happens
More informationRevisiting Network Support for RDMA
Revisiting Network Support for RDMA Radhika Mittal 1, Alex Shpiner 3, Aurojit Panda 1, Eitan Zahavi 3, Arvind Krishnamurthy 2, Sylvia Ratnasamy 1, Scott Shenker 1 (1: UC Berkeley, 2: Univ. of Washington,
More informationRouting Domains in Data Centre Networks. Morteza Kheirkhah. Informatics Department University of Sussex. Multi-Service Networks July 2011
Routing Domains in Data Centre Networks Morteza Kheirkhah Informatics Department University of Sussex Multi-Service Networks July 2011 What is a Data Centre? Large-scale Data Centres (DC) consist of tens
More informationUSING HIGH PERFORMANCE NETWORK INTERCONNECTS IN DYNAMIC ENVIRONMENTS
12 th ANNUAL WORKSHOP 2016 USING HIGH PERFORMANCE NETWORK INTERCONNECTS IN DYNAMIC ENVIRONMENTS Vangelis Tasoulas Simula Research Laboratory [ April 7 th, 2016 ] ACKNOWLEDGEMENTS Feroz Zahid, Ernst Gunnar
More informationA Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks
A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks Xuan-Yi Lin, Yeh-Ching Chung, and Tai-Yi Huang Department of Computer Science National Tsing-Hua University, Hsinchu, Taiwan 00, ROC
More informationNetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013
NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching
More informationEnabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures. Dimitra Simeonidou
Enabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures Dimitra Simeonidou Challenges and Drivers for DC Evolution Data centres are growing in size and
More informationNon-minimal Adaptive Routing Based on Explicit Congestion Notifications
This is an earlier accepted version; the final version of this work will be published in CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE. Copyright belongs to John Wiley & Sons. CONCURRENCY AND COMPUTATION:
More informationLecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel
More informationMellanox Virtual Modular Switch
WHITE PAPER July 2015 Mellanox Virtual Modular Switch Introduction...1 Considerations for Data Center Aggregation Switching...1 Virtual Modular Switch Architecture - Dual-Tier 40/56/100GbE Aggregation...2
More informationUtilizing Datacenter Networks: Centralized or Distributed Solutions?
Utilizing Datacenter Networks: Centralized or Distributed Solutions? Costin Raiciu Department of Computer Science University Politehnica of Bucharest We ve gotten used to great applications Enabling Such
More informationIntroduction to Infiniband
Introduction to Infiniband FRNOG 22, April 4 th 2014 Yael Shenhav, Sr. Director of EMEA, APAC FAE, Application Engineering The InfiniBand Architecture Industry standard defined by the InfiniBand Trade
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationSwitchX Virtual Protocol Interconnect (VPI) Switch Architecture
SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture 2012 MELLANOX TECHNOLOGIES 1 SwitchX - Virtual Protocol Interconnect Solutions Server / Compute Switch / Gateway Virtual Protocol Interconnect
More informationSmall-World Datacenters
2 nd ACM Symposium on Cloud Computing Oct 27, 2011 Small-World Datacenters Ji-Yong Shin * Bernard Wong +, and Emin Gün Sirer * * Cornell University + University of Waterloo Motivation Conventional networks
More informationIndustry Standards for the Exponential Growth of Data Center Bandwidth and Management. Craig W. Carlson
Industry Standards for the Exponential Growth of Data Center Bandwidth and Management Craig W. Carlson 2 Or Finding the Fat Pipe through standards Creative Commons, Flikr User davepaker Overview Part of
More informationc-through: Part-time Optics in Data Centers
Data Center Network Architecture c-through: Part-time Optics in Data Centers Guohui Wang 1, T. S. Eugene Ng 1, David G. Andersen 2, Michael Kaminsky 3, Konstantina Papagiannaki 3, Michael Kozuch 3, Michael
More informationPacket Scheduling in Data Centers. Lecture 17, Computer Networks (198:552)
Packet Scheduling in Data Centers Lecture 17, Computer Networks (198:552) Datacenter transport Goal: Complete flows quickly / meet deadlines Short flows (e.g., query, coordination) Large flows (e.g., data
More informationInterconnection Networks: Routing. Prof. Natalie Enright Jerger
Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly
More informationIBM TotalStorage SAN Switch M12
High availability director supports highly scalable fabrics for large enterprise SANs IBM TotalStorage SAN Switch M12 High port density packaging saves space Highlights Enterprise-level scalability and
More informationIBM TotalStorage SAN Switch F32
Intelligent fabric switch with enterprise performance for midrange and large storage networks IBM TotalStorage SAN Switch F32 High port density packaging helps save rack space Highlights Can be used as
More informationAdvanced Computer Networks. Datacenter TCP
Advanced Computer Networks 263 3501 00 Datacenter TCP Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Today Problems with TCP in the Data Center TCP Incast TPC timeouts Improvements
More informationA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture B Y M O H A M M A D A L - F A R E S A L E X A N D E R L O U K I S S A S A M I N V A H D A T P R E S E N T E D B Y N A N X I C H E N M A Y. 5, 2 0
More informationAdvanced Computer Networks. Flow Control
Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce
More informationA Cost and Scalability Comparison of the Dragonfly versus the Fat Tree. Frank Olaf Sem-Jacobsen Simula Research Laboratory
A Cost and Scalability Comparison of the Dragonfly versus the Fat Tree Frank Olaf Sem-Jacobsen frankose@simula.no Simula Research Laboratory HPC Advisory Council Workshop Barcelona, Spain, September 12,
More informationUM DIA NA VIDA DE UM PACOTE CEE
UM DIA NA VIDA DE UM PACOTE CEE Marcelo M. Molinari System Engineer - Brazil May 2010 CEE (Converged Enhanced Ethernet) Standards Making 10GbE Lossless and Spanning-Tree Free 2010 Brocade Communications
More informationExperience the GRID Today with Oracle9i RAC
1 Experience the GRID Today with Oracle9i RAC Shig Hiura Pre-Sales Engineer Shig_Hiura@etagon.com 2 Agenda Introduction What is the Grid The Database Grid Oracle9i RAC Technology 10g vs. 9iR2 Comparison
More informationSwitching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.
Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine
More informationDynamic Network Reconfiguration for Switch-based Networks
Dynamic Network Reconfiguration for Switch-based Networks Ms. Deepti Metri 1, Prof. A. V. Mophare 2 1Student, Computer Science and Engineering, N. B. N. Sinhgad College of Engineering, Maharashtra, India
More informationETHERNET ENHANCEMENTS FOR STORAGE. Sunil Ahluwalia, Intel Corporation
ETHERNET ENHANCEMENTS FOR STORAGE Sunil Ahluwalia, Intel Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use
More informationSupporting Service Differentiation for Real-Time and Best-Effort Traffic in Stateless Wireless Ad-Hoc Networks (SWAN)
Supporting Service Differentiation for Real-Time and Best-Effort Traffic in Stateless Wireless Ad-Hoc Networks (SWAN) G. S. Ahn, A. T. Campbell, A. Veres, and L. H. Sun IEEE Trans. On Mobile Computing
More informationEnd-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet
Hot Interconnects 2014 End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet Green Platform Research Laboratories, NEC, Japan J. Suzuki, Y. Hayashi, M. Kan, S. Miyakawa,
More informationFrame Relay. Frame Relay: characteristics
Frame Relay Andrea Bianco Telecommunication Network Group firstname.lastname@polito.it http://www.telematica.polito.it/ Network management and QoS provisioning - 1 Frame Relay: characteristics Packet switching
More informationTCP Incast problem Existing proposals
TCP Incast problem & Existing proposals Outline The TCP Incast problem Existing proposals to TCP Incast deadline-agnostic Deadline-Aware Datacenter TCP deadline-aware Picasso Art is TLA 1. Deadline = 250ms
More informationAdaptive Routing Strategies for Modern High Performance Networks
Adaptive Routing Strategies for Modern High Performance Networks Patrick Geoffray Myricom patrick@myri.com Torsten Hoefler Indiana University htor@cs.indiana.edu 28 August 2008 Hot Interconnect Stanford,
More informationMikko Ohvo Business Development Manager Nokia
HW Solution for distributed edge data centers Mikko Ohvo Business Development Manager Nokia HW Solution for distributed edge data centers Introduction In this presentation Nokia will share design considerations
More informationThemes. The Network 1. Energy in the DC: ~15% network? Energy by Technology
Themes The Network 1 Low Power Computing David Andersen Carnegie Mellon University Last two classes: Saving power by running more slowly and sleeping more. This time: Network intro; saving power by architecting
More informationTagger: Practical PFC Deadlock Prevention in Data Center Networks
Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo* (Toutiao), Kun Tan*(Huawei), Jitendra Padhye, Kai Chen (HKUST) Microsoft CoNEXT
More informationLecture 15: Datacenter TCP"
Lecture 15: Datacenter TCP" CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Mohammad Alizadeh Lecture 15 Overview" Datacenter workload discussion DC-TCP Overview 2 Datacenter Review"
More informationNetwork bandwidth is a performance bottleneck for cluster computing. Especially for clusters built with SMP machines.
Mingzhe Li Motivation Network bandwidth is a performance bottleneck for cluster computing. Especially for clusters built with SMP machines. Multirail network is an efficient way to alleviate this problem
More informationAdvancing RDMA. A proposal for RDMA on Enhanced Ethernet. Paul Grun SystemFabricWorks
Advancing RDMA A proposal for RDMA on Enhanced Ethernet Paul Grun SystemFabricWorks pgrun@systemfabricworks.com Objective: Accelerate the adoption of RDMA technology Why bother? I mean, who cares about
More informationOutlines. Introduction (Cont d) Introduction. Introduction Network Evolution External Connectivity Software Control Experience Conclusion & Discussion
Outlines Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network Singh, A. et al. Proc. of ACM SIGCOMM '15, 45(4):183-197, Oct. 2015 Introduction Network Evolution
More informationFOLLOWING the introduction of networks of workstations
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 1, JANUARY 2006 51 Layered Routing in Irregular Networks Olav Lysne, Member, IEEE, Tor Skeie, Sven-Arne Reinemo, and Ingebjørg Theiss
More informationAdvanced Computer Networks Spring Set #1
Advanced Computer Networks Spring 2019- Set #1 Prof. Zygmunt J. Haas Computer Science Department The University of Texas at Dallas ECSS 4.405 Richardson, TX 75080 http://www.utdallas.edu/~haas/courses/acn
More informationSincronia: Near-Optimal Network Design for Coflows. Shijin Rajakrishnan. Joint work with
Sincronia: Near-Optimal Network Design for Coflows Shijin Rajakrishnan Joint work with Saksham Agarwal Akshay Narayan Rachit Agarwal David Shmoys Amin Vahdat Traditional Applications: Care about performance
More informationLecture 4 Wide Area Networks - Congestion in Data Networks
DATA AND COMPUTER COMMUNICATIONS Lecture 4 Wide Area Networks - Congestion in Data Networks Mei Yang Based on Lecture slides by William Stallings 1 WHAT IS CONGESTION? congestion occurs when the number
More informationTCP and QCN in a Multihop Output Generated Hot Spot Scenario
TCP and QCN in a Multihop Output Generated Hot Spot Scenario Brad Matthews, Bruce Kwan & Ashvin Lakshmikantha IEEE 802.1Qau Plenary Meeting (Orlando, FL) March 2008 Goals Quantify performance of QCN +
More informationDesigning Distributed Systems using Approximate Synchrony in Data Center Networks
Designing Distributed Systems using Approximate Synchrony in Data Center Networks Dan R. K. Ports Jialin Li Naveen Kr. Sharma Vincent Liu Arvind Krishnamurthy University of Washington CSE Today s most
More informationIntroduction to High-Speed InfiniBand Interconnect
Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output
More informationEnhanced Forward Explicit Congestion Notification (E-FECN) Scheme for Datacenter Ethernet Networks
Enhanced Forward Explicit Congestion Notification (E-FECN) Scheme for Datacenter Ethernet Networks Chakchai So-In, Raj Jain, and Jinjing Jiang Department of Computer Science and Engineering Washington
More informationOFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management
Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly
More informationTCP over ad hoc networks
TCP over ad hoc networks Ad Hoc Networks will have to be interfaced with the Internet. As such backward compatibility is a big issue. One might expect that the TCP/IP suite of protocols be applicable to
More informationInformatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0
INFINIBAND OVERVIEW -, 2010 Page 1 Version 1.0 Why InfiniBand? Open and comprehensive standard with broad vendor support Standard defined by the InfiniBand Trade Association (Sun was a founder member,
More informationSynchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom
ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia Tech (aniruddh@gatech.edu) Tushar
More informationKiller Fabrics for Scalable Datacenters
Killer Fabrics for Scalable Datacenters Michael Schlansker, Jean Tourrilhes, Jose Renato Santos, Yoshio Turner HP Laboratories HPL-2009-26 Keyword(s): Networks, Ethernet, datacenter, routing, multipath,
More informationTransport Protocols for Data Center Communication. Evisa Tsolakou Supervisor: Prof. Jörg Ott Advisor: Lect. Pasi Sarolahti
Transport Protocols for Data Center Communication Evisa Tsolakou Supervisor: Prof. Jörg Ott Advisor: Lect. Pasi Sarolahti Contents Motivation and Objectives Methodology Data Centers and Data Center Networks
More informationUnify Virtual and Physical Networking with Cisco Virtual Interface Card
White Paper Unify Virtual and Physical Networking with Cisco Virtual Interface Card Simplicity of Cisco VM-FEX technology and Power of VMware VMDirectPath What You Will Learn Server virtualization has
More informationUnified Access Network Design and Considerations
CHAPTER 2 Unified Network Design and Considerations Cisco Borderless Network Architecture The Unified Solution uses at its foundation the Cisco Borderless Network architecture and the Cisco Borderless
More informationFOUNDATIONS OF INTENT- BASED NETWORKING
FOUNDATIONS OF INTENT- BASED NETWORKING Loris D Antoni Aditya Akella Aaron Gember Jacobson Network Policies Enterprise Network Cloud Network Enterprise Network 2 3 Tenant Network Policies Enterprise Network
More informationRapidIO Interconnect Specification Part 9: Flow Control Logical Layer Extensions Specification
RapidIO Interconnect Specification Part 9: Flow Control Logical Layer Extensions Specification Rev. 1.3, 06/2005 Copyright RapidIO Trade Association RapidIO Trade Association Revision History Revision
More informationBuilding Efficient and Reliable Software-Defined Networks. Naga Katta
FPO Talk Building Efficient and Reliable Software-Defined Networks Naga Katta Jennifer Rexford (Advisor) Readers: Mike Freedman, David Walker Examiners: Nick Feamster, Aarti Gupta 1 Traditional Networking
More informationIBM TotalStorage SAN Switch F08
Entry workgroup fabric connectivity, scalable with core/edge fabrics to large enterprise SANs IBM TotalStorage SAN Switch F08 Entry fabric switch with high performance and advanced fabric services Highlights
More informationONOS OVERVIEW. Architecture, Abstractions & Application
ONOS OVERVIEW Architecture, Abstractions & Application WHAT IS ONOS? Open Networking Operating System (ONOS) is an open source SDN network operating system (controller). Mission: to enable Service Providers
More informationWelcome to the IBTA Fall Webinar Series
Welcome to the IBTA Fall Webinar Series A four-part webinar series devoted to making I/O work for you Presented by the InfiniBand Trade Association The webinar will begin shortly. 1 September 23 October
More informationLecture 22: Fault Tolerance
Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA 03, Wisconsin A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures, HPCA 07, Spain Error
More informationApplication-Transparent Checkpoint/Restart for MPI Programs over InfiniBand
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering
More informationThe Network Layer and Routers
The Network Layer and Routers Daniel Zappala CS 460 Computer Networking Brigham Young University 2/18 Network Layer deliver packets from sending host to receiving host must be on every host, router in
More informationA Thermal-aware Application specific Routing Algorithm for Network-on-chip Design
A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong
More informationCongestion Control in Datacenters. Ahmed Saeed
Congestion Control in Datacenters Ahmed Saeed What is a Datacenter? Tens of thousands of machines in the same building (or adjacent buildings) Hundreds of switches connecting all machines What is a Datacenter?
More informationInfiniBand Credit-Based Link-Layer Flow-Control
InfiniBand Credit-Based Link-Layer Flow-Control 802.1 DCB TG - IEEE 802 Plenary March 2014 Introduction to InfiniBand Credit Based Flow Control Credit Represents Receiver Commitment In-band Delivery of
More informationIn-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017
In-Network Computing Sebastian Kalcher, Senior System Engineer HPC May 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait
More informationVLAN and bridges. Transparent Bridging (TB) Transparent Bridging (TB) LB: Learning Bridge. Several Learning Bridges. Loops. Loop-Free topology
VLN and bridges dvanced Computer Networks Interconnection Layer : bridges and VLNs Contents Transparent bridges Spanning Tree Protocol (STP) apid Spanning Tree Protocol (STP) VLNs Prof. ndrzej uda duda@imag.fr
More informationOptical Interconnection Networks in Data Centers: Recent Trends and Future Challenges
Optical Interconnection Networks in Data Centers: Recent Trends and Future Challenges Speaker: Lin Wang Research Advisor: Biswanath Mukherjee Kachris C, Kanonakis K, Tomkos I. Optical interconnection networks
More informationTCP Conformance for Network-Based Control
TCP Conformance for Network-Based Control Arata Koike NTT Information Sharing Platform Laboratories 3-9-11 Midori-cho, Musashino-shi, Tokyo 180-8585, Japan E-mail: koike.arata@lab.ntt.co.jp Abstract. In
More informationFault Tolerance in Parallel Systems. Jamie Boeheim Sarah Kay May 18, 2006
Fault Tolerance in Parallel Systems Jamie Boeheim Sarah Kay May 18, 2006 Outline What is a fault? Reasons for fault tolerance Desirable characteristics Fault detection Methodology Hardware Redundancy Routing
More informationOverview. SUSE OpenStack Cloud Monitoring
Overview SUSE OpenStack Cloud Monitoring Overview SUSE OpenStack Cloud Monitoring Publication Date: 08/04/2017 SUSE LLC 10 Canal Park Drive Suite 200 Cambridge MA 02141 USA https://www.suse.com/documentation
More informationQuickSpecs. HP InfiniBand Options for HP BladeSystems c-class. Overview
Overview HP supports 40Gbps (QDR) and 20Gbps (DDR) InfiniBand products that include mezzanine Host Channel Adapters (HCA) for server blades, switch blades for c-class enclosures, and rack switches and
More informationThe desire for higher interconnect speeds between
Evaluating high speed industry standard serial interconnects By Harpinder S. Matharu The desire for higher interconnect speeds between chips, boards, and chassis continues to grow in order to satisfy the
More informationP D1.1 RPR OPNET Model User Guide
P802.17 D1.1 RPR OPNET Model User Guide Revision Nov7 Yan F. Robichaud Mark Joseph Francisco Changcheng Huang Optical Networks Laboratory Carleton University 7 November 2002 Table Of Contents 0 Overview...1
More informationABSTRACT. Handling Congestion and Routing Failures in Data Center Networking. Brent Stephens
ABSTRACT Handling Congestion and Routing Failures in Data Center Networking by Brent Stephens Today s data center networks are made of highly reliable components. Nonetheless, given the current scale of
More information