BCube: A High Performance, Servercentric. Architecture for Modular Data Centers

Size: px
Start display at page:

Download "BCube: A High Performance, Servercentric. Architecture for Modular Data Centers"

Transcription

1 BCube: A High Performance, Servercentric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang1;2, Yunfeng Shi1;3, Chen Tian1;4, Yongguang Zhang1, Songwu Lu1;5 1: Microsoft Research Asia, 2: Tsinghua, 3: PKU, 4: HUST, 5: UCLA {chguo,lguohan,danil,hwu}@microsoft.com, xuan-zhang05@mails.tsinghua.edu.cn, shiyunfeng@pku.edu.cn, tianchen@mail.hust.edu.cn, ygz@microsoft.com, slu@cs.ucla.edu Presented by: Rami Jiossy at Technion

2 Container-based Modular DataCenter Couple thousands of servers ( ) 20- to 40-feet shipping container Difficult to service MDC once deployed Sun Microsystems states that the system can made operational for 1% of the cost of building a traditional data center Main Benefits: High mobility, Just Plug: Power water (cooling) Network Increased cooling efficiency Manufacturing & H/W Admin. Savings

3 Bcube Netowork Architecture Design and implementation derived from data-intensive applications and MDC requirements Graceful performance degradation Upon server/switch failures Support various bandwidth-intensive traffic patterns: One-to-one One-to-several One to-all All-to-all Uses only COTS mini-switches (low expense)

4 BCube1 BCube structure <1,0> <1,1> <1,2> <1,3> Level-1 BCube0 <0,0> <0,1> <0,2> <0,3> Level switch server Connecting rule - The i-th server in the j-th BCube 0 connects to the j-th port of the i-th level-1 switch Server 13 is connected to switches <0,1> and <1,3>

5 Screen clipping taken: 1/5/2011, 11:56 Bigger BCube: 3-levels (k=2)

6 Notations and Observations A BCube k has: K+1 levels: 0 through k. n-port switches, same count at each level (n k ) n k+1 total servers, (k+1)n k total switches n=8,k=3 : 4-levels connecting 4096 servers using port switches at each layer. A server is assigned a BCube addr (a k, a k-1,, a 0 ) where a i [0,k] Neighboring server addresses differ by only one digit [ h(a,b) = 1 ] How many neighbors? Switches only connect to servers (K+1)(n-1) (act as neighbors dummy L2 crossbars)

7 How to route from Server 00 to Server 21? 1. Decide on permutation of indices 0-k, π 2. Correct digits in server address array according to π dictated order. What is the diameter of a BCube network?

8 Parallel paths at BCube Two paths between two servers A and B, are Parallel in case they are node/switch-disjoint. THEOREM 2. If for two servers A=a k a k-1.a 0 and B=b k b k-1.b 0 it holds that a i b i ; Then for the following permutations: π 0 = [i 0, (i 0-1) mod (k+1),, (i 0 -k) mod (k+1)] π 1 = [i 1, (i 1-1) mod (k+1),, (i 1 -k) mod (k+1)] i 1 i 0 BCubeRouting will produce two parallel paths. (0,k)

9 Multi-paths for one-to-one traffic THEOREM 3. There are k+1 parallel paths between any two servers in a Bcube k (BuildPathSet alogorithm) Useful when there is a server pairs <1,0> <1,1> <1,2> <1,3> exchanging large amount of data <0,0> <0,1> <0,2> <0,3>

10 Speedup for one-to-several traffic THEOREM 4. Server A and a set of servers {di di is A s level-i neighbor} form an edge disjoint complete graph. <1,0> <1,1> <1,2> <1,3> Writing to r servers, is r-times faster Than pipeline replication <0,0> <0,1> <0,2> <0,3> P1 P1 P2 P2 10

11 Speedup for one-to-all traffic src THEOREM 5. There are k+1 edge-disjoint spanning trees in a Bcube k One server transmits to all other servers. Cases like: upgrading system image A source can deliver a file of size L to all the other L servers in time in a k 1 Bcube k. distributing application binaries

12 ABT for all-to-all traffic All-to-all: shuffles data among all servers.. Flow = Connection between two servers (Path) In BCube there are no bottlenecks Aggregate bottleneck throughput (ABT) : since all links are used equally ABT = # Flows X throughput of the bottleneck flow Reflects the capacity of the network n ( N n 1 ABT for BCube increases lineary with the number of servers. where n is the switch port number and N is the total server count Theorem 6. The ABT for a BCube network is 1) 12

13 Screen clipping taken: 1/1/2011, 20:07 BCube Source Routing (BSR) Server-centric source routing Source server decides the best path for a flow and encodes the path in the packet header. (how best is chosen?) Intermediate servers only forward the packets based on the packet header. Packet header when sending from server 00 to 13: Path(00,13) = {02,22,23,13} 13

14 Path Selection Source server: 1. construct k+1 paths using BuildPathSet 2. Probes all these paths (no link status broadcasting) 3. If a path is not found, it uses BFS to find alternative (after removing all others) Intermediate servers: BSR design goals: Updates - Scalability Bandwidth: min(packetbw, InBW, OutBW) If next hops is not found, returns failure to source Destination - Routing server: performance Updates Bandwidth: min(packetbw, InBW) Send probe response to source on reverse path 4. Use a metric to select best path. (maximum available bandwidth / end-to-end delay) During Path Selection, the source servers sends on one of the selected parallel paths; and switches path if a better path has been found.

15 Path Adaptation Source performs path selection periodically (say, every 10 secs) to adapt to failures and network condition changes. If a failure is received, the source switches to an available path and waits for next timer to expire for the next selection round and not immediately. Usually uses randomness in timer to avoid path oscillation.

16 Packet Forwarding Each server has two components: Neighbor status table (k+1)x(n-1) entries Maintained by the neighbor maintenance protocol (updated upon probing / packet forwarding) Uses NHA(next hop index) encoding for indexing neighbors ([DP:DV]) DP: diff digit (2-bit for 2-levels) DV: value of diff digit (rest of bits) Almost static (except Status) Packet forwarding procedure Intermediate servers update next hop MAC address on packet if next hop is alive Intermediate servers update status from packet One table lookup NHI Output port MAC Status 0:0 0 Mac20 1 0:1 0 Mac21 1 0:2 0 Mac22 0 1:0 1 Mac03 0 1:1 1 Mac13 1 1:3 1 Mac33 1

17 Path compression and fast packet <0,0> <0,1> forwarding Traditional address array needs 16 bytes: Path(00,13) = {02,22,23,13} The Next Hop Index (NHI) Array needs 4 bytes: Path(00,13)={0:2,1:2,0:3,1:1} <1,0> <1,1> <1,2> <1,3> Fwd node Next hop <0,2> Forwarding table of server 23 NHI Output port MAC Status 0:0 0 Mac20 1 0:1 0 Mac21 1 0:2 0 Mac22 0 1:0 1 Mac03 0 1:1 1 Mac13 1 1:3 1 Mac33 1 <0,3>

18 Screen clipping taken: 1/5/2011, 14:02 Dcell

19 Graceful degradation The metric: aggregation bottleneck throughput (ABT) under different server and switch failure rates (Simulation Based) Server failure Switch failure BCube Fat-tree BCube Fat-tree DCell DCell 19

20 Routing to external networks Ethernet has two levels link rate hierarchy 1G for end hosts and 10G for uplink aggregator 10G <1,0> <1,1> <1,2> <1,1> <1,3> 1G <0,0> <0,1> <0,2> <0,3> gateway gateway gateway gateway 20

21 Implementation software BCube configuration TCP/IP protocol driver app kernel Intermediate driver BCube driver Neighbor maintenance Packet send/recv Ethernet miniport driver packet fwd Ava_band calculation IF 0 IF 1 IF k BSR path probing & selection Flow-path cache Intel PRO/1000 PT Quad Port Server Adapter hardware Neighbor maintenance packet fwd Ava_band calculation server ports NetFPGA 21

22 Testbed A BCube testbed 16 servers (Dell Precision 490 workstation with Intel 2.00GHz dualcore CPU, 4GB DRAM, 160GB disk) in bcube1 (4 bcube 0) 8 4-port mini-switches (DLink 8-port Gigabit switch DGS-1008D) Utilizes only 2 ports of the 4 ports in the switch NIC Intel Pro/1000 PT quad-port Ethernet NIC NetFPGA Because of PCI Interface limitations (160Mb/s) software implementation is used 22

23 Screen clipping taken: 1/2/2011, 11:42 CPU Overhead for Packet Forwarding Packet forwarding ideally is placed at the HW level. At the testbed we limit MTU to 9KB threshold.

24 Bandwidth-intensive application Per-server throughput support 24

25 Support for all-to-all traffic Total throughput for all-to-all 25

26 Conclusions By installing a small number of network ports at each server and using COTS mini-switches as crossbars, and putting routing intelligence at the server side, BCube forms a server-centric architecture We have shown that BCube significantly accelerates one-to-x traffic patterns and provides high network capacity for all-to-all traffic The BSR routing protocol further enables graceful performance degradation Future work will study how to scale the current servercentric design from the single container to multiple containers

27 Q & A 27

RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store

RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store Yiming Zhang, Rui Chu @ NUDT Chuanxiong Guo, Guohan Lu, Yongqiang Xiong, Haitao Wu @ MSRA June, 2012 1 Background Disk-based storage

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

DATA centers run applications from online services such

DATA centers run applications from online services such IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 5, OCTOBER 2014 1503 Revisiting the Design of Mega Data Centers: Considering Heterogeneity Among Containers Dan Li, Member,IEEE, Hongze Zhao, Mingwei Xu,

More information

Arrakis: The Operating System is the Control Plane

Arrakis: The Operating System is the Control Plane Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building

More information

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache

More information

PacketShader: A GPU-Accelerated Software Router

PacketShader: A GPU-Accelerated Software Router PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Scalable Enterprise Networks with Inexpensive Switches

Scalable Enterprise Networks with Inexpensive Switches Scalable Enterprise Networks with Inexpensive Switches Minlan Yu minlanyu@cs.princeton.edu Princeton University Joint work with Alex Fabrikant, Mike Freedman, Jennifer Rexford and Jia Wang 1 Enterprises

More information

NDN-NIC: Name-based Filtering on Network Interface Card

NDN-NIC: Name-based Filtering on Network Interface Card NDN-NIC: Name-based Filtering on Network Interface Card Junxiao Shi, Teng Liang, Beichuan Zhang (University of Arizona) Hao Wu, Bin Liu (Tsinghua University) Communication over shared media Each device

More information

Motivation CPUs can not keep pace with network

Motivation CPUs can not keep pace with network Deferred Segmentation For Wire-Speed Transmission of Large TCP Frames over Standard GbE Networks Bilic Hrvoye (Billy) Igor Chirashnya Yitzhak Birk Zorik Machulsky Technion - Israel Institute of technology

More information

Building Mega Data Center from Heterogeneous Containers

Building Mega Data Center from Heterogeneous Containers 2011 19th IEEE International Conference on Network Protocols Building Mega Data Center from Heterogeneous Containers Dan Li, Mingwei Xu, Hongze Zhao, Xiaoming Fu Computer Science Department of Tsinghua

More information

MODULAR datacenter (MDC) uses shipping-containers as

MODULAR datacenter (MDC) uses shipping-containers as JOURNAL OF L A T E X CLASS FILES, VOL. 3, NO. 9, SEPTEMBER 24 : A High Capacity, Fault-tolerant and Traffic Isolated Modular Datacenter Network Feng Huang, Yiming Zhang, Dongsheng Li, Jiaxin Li, Jie Wu,

More information

DCube: A Family of Network Structures for Containerized Data Centers Using Dual-Port Servers

DCube: A Family of Network Structures for Containerized Data Centers Using Dual-Port Servers DCube: A Family of etwork Structures for Containerized Data Centers Using Dual-Port Servers Deke Guo,a, Chaoling Li b, Jie Wu c, Tao Chen a, Xiaolei Zhou a, Xueshan Luo a a School of Information System

More information

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

EXAM TCP/IP NETWORKING Duration: 3 hours With Solutions

EXAM TCP/IP NETWORKING Duration: 3 hours With Solutions SCIPER: First name: Family name: EXAM TCP/IP NETWORKING Duration: 3 hours With Solutions Jean-Yves Le Boudec January 2016 INSTRUCTIONS 1. Write your solution into this document and return it to us (you

More information

Performance Characteristics on Gigabit networks

Performance Characteristics on Gigabit networks Version 4.7 Impairment Emulator Software for IP Networks (IPv4 & IPv6) Performance Characteristics on Gigabit networks ZTI Communications / 1 rue Ampère / 22300 LANNION / France Phone: +33 2 9613 4003

More information

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #4 1/24/2018 Xuehai Qian xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Announcements PA #1

More information

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4 W H I T E P A P E R Comparison of Storage Protocol Performance in VMware vsphere 4 Table of Contents Introduction................................................................... 3 Executive Summary............................................................

More information

Performance Characteristics on Fast Ethernet and Gigabit networks

Performance Characteristics on Fast Ethernet and Gigabit networks Version 2.5 Traffic Generator and Measurement Tool for IP Networks (IPv4 & IPv6) FTTx, LAN, MAN, WAN, WLAN, WWAN, Mobile, Satellite, PLC, etc Performance Characteristics on Fast Ethernet and Gigabit networks

More information

DevoFlow: Scaling Flow Management for High Performance Networks

DevoFlow: Scaling Flow Management for High Performance Networks DevoFlow: Scaling Flow Management for High Performance Networks SDN Seminar David Sidler 08.04.2016 1 Smart, handles everything Controller Control plane Data plane Dump, forward based on rules Existing

More information

Interconnection Network

Interconnection Network Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics

More information

A 400Gbps Multi-Core Network Processor

A 400Gbps Multi-Core Network Processor A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

Abstract. AM; Reviewed: WCH/JK 9/11/02. Solution & Interoperability Test Lab Application Notes 2002 Avaya Inc. All Rights Reserved.

Abstract. AM; Reviewed: WCH/JK 9/11/02. Solution & Interoperability Test Lab Application Notes 2002 Avaya Inc. All Rights Reserved. Configuring a Hunt Group Between a Microsoft Windows NT 4.0 Server Equipped with Two Compaq Gigabit Server NICs and an Avaya P882 MultiService Switch - Issue 1.0 Abstract These Application Notes provide

More information

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc.

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc. The Convergence of Storage and Server Virtualization 2007 Solarflare Communications, Inc. About Solarflare Communications Privately-held, fabless semiconductor company. Founded 2001 Top tier investors:

More information

Performance Characteristics on Fast Ethernet, Gigabit and 10 Gigabits networks

Performance Characteristics on Fast Ethernet, Gigabit and 10 Gigabits networks Versions 2.6 Traffic Generator for IP Networks (IPv4 & IPv6) FTTx, LAN, MAN, WAN, WLAN, WWAN, Mobile, Satellite, PLC, etc. Performance Characteristics on Fast Ethernet, Gigabit and 10 Gigabits networks

More information

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Peng Wang, Hong Xu, Zhixiong Niu, Dongsu Han, Yongqiang Xiong ACM SoCC 2016, Oct 5-7, Santa Clara Motivation Datacenter networks

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

EXAM TCP/IP NETWORKING Duration: 3 hours With Solutions

EXAM TCP/IP NETWORKING Duration: 3 hours With Solutions SCIPER: First name: Family name: EXAM TCP/IP NETWORKING Duration: 3 hours With Solutions Jean-Yves Le Boudec January 2013 INSTRUCTIONS 1. Write your solution into this document and return it to us (you

More information

CSCI 466 Midterm Networks Fall 2011

CSCI 466 Midterm Networks Fall 2011 CSCI 466 Midterm Networks Fall 2011 Name: This exam consists of 7 problems on the following 9 pages. You may use your single- sided hand- written 8 ½ x 11 note sheet and a calculator during the exam. No

More information

End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet

End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet Hot Interconnects 2014 End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet Green Platform Research Laboratories, NEC, Japan J. Suzuki, Y. Hayashi, M. Kan, S. Miyakawa,

More information

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Technology Brief Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances The world

More information

QuickSpecs. HP Z 10GbE Dual Port Module. Models

QuickSpecs. HP Z 10GbE Dual Port Module. Models Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or

More information

Computer Networks Principles

Computer Networks Principles Computer Networks Principles Introduction Prof. Andrzej Duda duda@imag.fr http://duda.imag.fr 1 Contents Introduction protocols and layered architecture encapsulation interconnection structures performance

More information

RDMA over Commodity Ethernet at Scale

RDMA over Commodity Ethernet at Scale RDMA over Commodity Ethernet at Scale Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitendra Padhye, Marina Lipshteyn ACM SIGCOMM 2016 August 24 2016 Outline RDMA/RoCEv2 background DSCP-based

More information

Performance Characteristics on Gigabit networks

Performance Characteristics on Gigabit networks Version 4.6 Impairment Emulator Software for IP Networks (IPv4 & IPv6) Performance Characteristics on Gigabit networks ZTI / 1 boulevard d'armor / BP 20254 / 22302 Lannion Cedex / France Phone: +33 2 9648

More information

Advanced Computer Networks. End Host Optimization

Advanced Computer Networks. End Host Optimization Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct

More information

Chapter 3 Part 2 Switching and Bridging. Networking CS 3470, Section 1

Chapter 3 Part 2 Switching and Bridging. Networking CS 3470, Section 1 Chapter 3 Part 2 Switching and Bridging Networking CS 3470, Section 1 Refresher We can use switching technologies to interconnect links to form a large network What is a hub? What is a switch? What is

More information

EXAM TCP/IP NETWORKING Duration: 3 hours

EXAM TCP/IP NETWORKING Duration: 3 hours SCIPER: First name: Family name: EXAM TCP/IP NETWORKING Duration: 3 hours Jean-Yves Le Boudec January 2013 INSTRUCTIONS 1. Write your solution into this document and return it to us (you do not need to

More information

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale

More information

Network Virtualization in Multi-tenant Datacenters

Network Virtualization in Multi-tenant Datacenters Network Virtualization in Multi-tenant Datacenters Teemu Koponen, Keith Amidon, Peter Balland, Martín Casado, Anupam Chanda, Bryan Fulton, Igor Ganichev, Jesse Gross, Natasha Gude, Paul Ingram, Ethan Jackson,

More information

New Fault-Tolerant Datacenter Network Topologies

New Fault-Tolerant Datacenter Network Topologies New Fault-Tolerant Datacenter Network Topologies Rana E. Ahmed and Heba Helal Department of Computer Science and Engineering, American University of Sharjah, Sharjah, United Arab Emirates Email: rahmed@aus.edu;

More information

vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018

vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018 x vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018 Current Network Solution for Virtualization Control Plane Control Plane virtio virtio user space PF VF2 user space TAP1 SW Datapath

More information

CSE398: Network Systems Design

CSE398: Network Systems Design CSE398: Network Systems Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University March 14, 2005 Outline Classification

More information

GUIDE. Optimal Network Designs with Cohesity

GUIDE. Optimal Network Designs with Cohesity Optimal Network Designs with Cohesity TABLE OF CONTENTS Introduction...3 Key Concepts...4 Five Common Configurations...5 3.1 Simple Topology...5 3.2 Standard Topology...6 3.3 Layered Topology...7 3.4 Cisco

More information

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER 80 GBIT/S OVER IP USING DPDK Performance, Code, and Architecture Charles Shiflett Developer of next-generation

More information

CS155b: E-Commerce. Lecture 3: Jan 16, How Does the Internet Work? Acknowledgements: S. Bradner and R. Wang

CS155b: E-Commerce. Lecture 3: Jan 16, How Does the Internet Work? Acknowledgements: S. Bradner and R. Wang CS155b: E-Commerce Lecture 3: Jan 16, 2001 How Does the Internet Work? Acknowledgements: S. Bradner and R. Wang Internet Protocols Design Philosophy ordered set of goals 1. multiplexed utilization of existing

More information

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs. Parallel Database Systems STAVROS HARIZOPOULOS stavros@cs.cmu.edu Outline Background Hardware architectures and performance metrics Parallel database techniques Gamma Bonus: NCR / Teradata Conclusions

More information

CSE 473 Introduction to Computer Networks. Exam 2. Your name here: 11/7/2012

CSE 473 Introduction to Computer Networks. Exam 2. Your name here: 11/7/2012 CSE 473 Introduction to Computer Networks Jon Turner Exam 2 Your name here: 11/7/2012 1. (10 points). The diagram at right shows a DHT with 16 nodes. Each node is labeled with the first value in its range

More information

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to

More information

Housekeeping. Fall /5 CptS/EE 555 1

Housekeeping. Fall /5 CptS/EE 555 1 Housekeeping Lab access HW turn-in Jin? Class preparation for next time: look at the section on CRCs 2.4.3. Be prepared to explain how/why the shift register implements the CRC Skip Token Rings section

More information

Design and Implementation of Virtual TAP for Software-Defined Networks

Design and Implementation of Virtual TAP for Software-Defined Networks Design and Implementation of Virtual TAP for Software-Defined Networks - Master Thesis Defense - Seyeon Jeong Supervisor: Prof. James Won-Ki Hong Dept. of CSE, DPNM Lab., POSTECH, Korea jsy0906@postech.ac.kr

More information

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX -2 EN with RoCE Adapter Delivers Reliable Multicast Messaging With Ultra Low Latency

More information

Chapter 2 - Part 1. The TCP/IP Protocol: The Language of the Internet

Chapter 2 - Part 1. The TCP/IP Protocol: The Language of the Internet Chapter 2 - Part 1 The TCP/IP Protocol: The Language of the Internet Protocols A protocol is a language or set of rules that two or more computers use to communicate 2 Protocol Analogy: Phone Call Parties

More information

Motivation to Teach Network Hardware

Motivation to Teach Network Hardware NetFPGA: An Open Platform for Gigabit-rate Network Switching and Routing John W. Lockwood, Nick McKeown Greg Watson, Glen Gibb, Paul Hartke, Jad Naous, Ramanan Raghuraman, and Jianying Luo JWLockwd@stanford.edu

More information

FAR: A Fault-avoidance Routing Method for Data Center Networks with Regular Topology

FAR: A Fault-avoidance Routing Method for Data Center Networks with Regular Topology FAR: A Fault-avoidance Routing Method for Data Center Networks with Regular Topology Yantao Sun School of Computer and Information Technology Beijing Jiaotong University Beijing 100044, China ytsun@bjtu.edu.cn

More information

Physical Organization of Parallel Platforms. Alexandre David

Physical Organization of Parallel Platforms. Alexandre David Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:

More information

G-NET: Effective GPU Sharing In NFV Systems

G-NET: Effective GPU Sharing In NFV Systems G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science

More information

Acceleration Systems Technical Overview. September 2014, v1.4

Acceleration Systems Technical Overview. September 2014, v1.4 Acceleration Systems Technical Overview September 2014, v1.4 Acceleration Systems 2014 Table of Contents 3 Background 3 Cloud-Based Bandwidth Optimization 4 Optimizations 5 Protocol Optimization 5 CIFS

More information

Exploiting Efficient and Scalable Shuffle Transfers in Future Data Center Networks

Exploiting Efficient and Scalable Shuffle Transfers in Future Data Center Networks Exploiting Efficient and Scalable Shuffle Transfers in Future Data Center Networks Deke Guo, Member, IEEE, Junjie Xie, Xiaolei Zhou, Student Member, IEEE, Xiaomin Zhu, Member, IEEE, Wei Wei, Member, IEEE,

More information

Chapter 7 Routing Protocols

Chapter 7 Routing Protocols Chapter 7 Routing Protocols Nonroutable Protocols In the early days of networking, networks were small collections of computers linked together For the purposes of sharing information and expensive peripherals

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

ARISTA: Improving Application Performance While Reducing Complexity

ARISTA: Improving Application Performance While Reducing Complexity ARISTA: Improving Application Performance While Reducing Complexity October 2008 1.0 Problem Statement #1... 1 1.1 Problem Statement #2... 1 1.2 Previous Options: More Servers and I/O Adapters... 1 1.3

More information

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava Hardware Acceleration in Computer Networks Outline Motivation for hardware acceleration Longest prefix matching using FPGA Hardware acceleration of time critical operations Framework and applications Contracted

More information

Performing MapReduce on Data Centers with Hierarchical Structures

Performing MapReduce on Data Centers with Hierarchical Structures INT J COMPUT COMMUN, ISSN 1841-9836 Vol.7 (212), No. 3 (September), pp. 432-449 Performing MapReduce on Data Centers with Hierarchical Structures Z. Ding, D. Guo, X. Chen, X. Luo Zeliu Ding, Deke Guo,

More information

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Utilizing Datacenter Networks: Centralized or Distributed Solutions? Utilizing Datacenter Networks: Centralized or Distributed Solutions? Costin Raiciu Department of Computer Science University Politehnica of Bucharest We ve gotten used to great applications Enabling Such

More information

LiRa: a WLAN architecture for Visible Light Communication with a Wi-Fi uplink

LiRa: a WLAN architecture for Visible Light Communication with a Wi-Fi uplink LiRa: a WLAN architecture for Visible Light Communication with a Wi-Fi uplink Sharan Naribole, Shuqing Chen, Ethan Heng and Edward Knightly Naribole Visible Light Communication System (VLC) Dual-purposing

More information

Primavera Compression Server 5.0 Service Pack 1 Concept and Performance Results

Primavera Compression Server 5.0 Service Pack 1 Concept and Performance Results - 1 - Primavera Compression Server 5.0 Service Pack 1 Concept and Performance Results 1. Business Problem The current Project Management application is a fat client. By fat client we mean that most of

More information

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures Haiyang Shi, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda {shi.876, lu.932, panda.2}@osu.edu The Ohio State University

More information

Future Routing Schemes in Petascale clusters

Future Routing Schemes in Petascale clusters Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce

More information

Next Generation Architecture for NVM Express SSD

Next Generation Architecture for NVM Express SSD Next Generation Architecture for NVM Express SSD Dan Mahoney CEO Fastor Systems Copyright 2014, PCI-SIG, All Rights Reserved 1 NVMExpress Key Characteristics Highest performance, lowest latency SSD interface

More information

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 > Agenda Sun s x86 1. Sun s x86 Strategy 2. Sun s x86 Product Portfolio 3. Virtualization < 1 > 1. SUN s x86 Strategy Customer Challenges Power and cooling constraints are very real issues Energy costs are

More information

CS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:

CS550. TA: TBA   Office: xxx Office hours: TBA. Blackboard: CS550 Advanced Operating Systems (Distributed Operating Systems) Instructor: Xian-He Sun Email: sun@iit.edu, Phone: (312) 567-5260 Office hours: 1:30pm-2:30pm Tuesday, Thursday at SB229C, or by appointment

More information

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate NIC-PCIE-1SFP+-PLU PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate Flexibility and Scalability in Virtual

More information

Deduplication Storage System

Deduplication Storage System Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business

More information

Ref: A. Leon Garcia and I. Widjaja, Communication Networks, 2 nd Ed. McGraw Hill, 2006 Latest update of this lecture was on

Ref: A. Leon Garcia and I. Widjaja, Communication Networks, 2 nd Ed. McGraw Hill, 2006 Latest update of this lecture was on IP Routing Routing is the process performed by routers to transfer packets from the source machine to the destination. Unlike switches, routers are configured by a network administrator. Routers share

More information

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme NET1343BU NSX Performance Samuel Kommu #VMworld #NET1343BU Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no

More information

- Hubs vs. Switches vs. Routers -

- Hubs vs. Switches vs. Routers - 1 Layered Communication - Hubs vs. Switches vs. Routers - Network communication models are generally organized into layers. The OSI model specifically consists of seven layers, with each layer representing

More information

Scalable Data Center Multicast. Reporter: 藍于翔 Advisor: 曾學文

Scalable Data Center Multicast. Reporter: 藍于翔 Advisor: 曾學文 Scalable Data Center Multicast Reporter: 藍于翔 Advisor: 曾學文 Outline Introduction Scalable multicast Conclusion Comparison Reference Exploring Efficient and Scalable Multicast Routing in Future Data Center

More information

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage A Dell Technical White Paper Dell Database Engineering Solutions Anthony Fernandez April 2010 THIS

More information

Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers

Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers Silvano Gai 1 The sellable HPSR Seamless LAN/WLAN/SAN/WAN Network as a platform System-wide network intelligence as platform for

More information

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group WHITE PAPER: BEST PRACTICES Sizing and Scalability Recommendations for Symantec Rev 2.2 Symantec Enterprise Security Solutions Group White Paper: Symantec Best Practices Contents Introduction... 4 The

More information

Performance Characterization of the Dell Flexible Computing On-Demand Desktop Streaming Solution

Performance Characterization of the Dell Flexible Computing On-Demand Desktop Streaming Solution Performance Characterization of the Dell Flexible Computing On-Demand Desktop Streaming Solution Product Group Dell White Paper February 28 Contents Contents Introduction... 3 Solution Components... 4

More information

Lecture 16: Router Design

Lecture 16: Router Design Lecture 16: Router Design CSE 123: Computer Networks Alex C. Snoeren Eample courtesy Mike Freedman Lecture 16 Overview End-to-end lookup and forwarding example Router internals Buffering Scheduling 2 Example:

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems

More information

Multi-resource Energy-efficient Routing in Cloud Data Centers with Network-as-a-Service

Multi-resource Energy-efficient Routing in Cloud Data Centers with Network-as-a-Service in Cloud Data Centers with Network-as-a-Service Lin Wang*, Antonio Fernández Antaº, Fa Zhang*, Jie Wu+, Zhiyong Liu* *Institute of Computing Technology, CAS, China ºIMDEA Networks Institute, Spain + Temple

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 19: Networks and Distributed Systems

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 19: Networks and Distributed Systems S 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2004 Lecture 19: Networks and Distributed Systems 19.0 Main Points Motivation for distributed vs. centralized systems

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

Milestone Solution Partner IT Infrastructure Components Certification Report

Milestone Solution Partner IT Infrastructure Components Certification Report Milestone Solution Partner IT Infrastructure Components Certification Report Dell MD3860i Storage Array Multi-Server 1050 Camera Test Case 4-2-2016 Table of Contents Executive Summary:... 3 Abstract...

More information

Tagger: Practical PFC Deadlock Prevention in Data Center Networks

Tagger: Practical PFC Deadlock Prevention in Data Center Networks Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo* (Toutiao), Kun Tan*(Huawei), Jitendra Padhye, Kai Chen (HKUST) Microsoft CoNEXT

More information

Towards a Robust Protocol Stack for Diverse Wireless Networks Arun Venkataramani

Towards a Robust Protocol Stack for Diverse Wireless Networks Arun Venkataramani Towards a Robust Protocol Stack for Diverse Wireless Networks Arun Venkataramani (in collaboration with Ming Li, Devesh Agrawal, Deepak Ganesan, Aruna Balasubramanian, Brian Levine, Xiaozheng Tie at UMass

More information

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER Aspera FASP Data Transfer at 80 Gbps Elimina8ng tradi8onal bo

More information

Use of the Internet SCSI (iscsi) protocol

Use of the Internet SCSI (iscsi) protocol A unified networking approach to iscsi storage with Broadcom controllers By Dhiraj Sehgal, Abhijit Aswath, and Srinivas Thodati In environments based on Internet SCSI (iscsi) and 10 Gigabit Ethernet, deploying

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #5 1/29/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline

More information