Link Aggregation: A Server Perspective

Similar documents
Advanced Computer Networks. End Host Optimization

Chapter 13 TRANSPORT. Mobile Computing Winter 2005 / Overview. TCP Overview. TCP slow-start. Motivation Simple analysis Various TCP mechanisms

Control Hazards. Prediction

Control Hazards. Branch Prediction

Virtual Networks: Host Perspective

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

Hardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc.

Lessons Learned Operating Active/Active Data Centers Ethan Banks, CCIE

Ron Emerick, Oracle Corporation

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc.

No Tradeoff Low Latency + High Efficiency

CSC 5930/9010 Cloud S & P: Virtualization

Congestion and its control: Broadband access networks. David Clark MIT CFP October 2008

TCP and BBR. Geoff Huston APNIC

Sun N1: Storage Virtualization and Oracle

Network+ Guide to Networks 6 th Edition

CSE 544: Principles of Database Systems

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IsoStack Highly Efficient Network Processing on Dedicated Cores

ibench: Quantifying Interference in Datacenter Applications

Low-Latency Datacenters. John Ousterhout Platform Lab Retreat May 29, 2015

TCP and BBR. Geoff Huston APNIC

Design and Implementation of Virtual TAP for Software-Defined Networks

Beyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji

Direct Link Communication I: Basic Techniques. Data Transmission. ignore carrier frequency, coding etc.

Accelerate Applications Using EqualLogic Arrays with directcache

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS

Impact of transmission errors on TCP performance. Outline. Random Errors

Accelerating 4G Network Performance

Network Control and Signalling

End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet

Outline 9.2. TCP for 2.5G/3G wireless

G-NET: Effective GPU Sharing In NFV Systems

The Power of Batching in the Click Modular Router

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

Transport layer issues

CSE 4215/5431: Mobile Communications Winter Suprakash Datta

Providing Multi-tenant Services with FPGAs: Case Study on a Key-Value Store

Industry Collaboration and Innovation

Arrakis: The Operating System is the Control Plane

The Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers

JANUARY 28, 2014, SAN JOSE, CA. Microsoft Lead Partner Architect OS Vendors: What NVM Means to Them

SaaS Providers. ThousandEyes for. Summary

COSC6376 Cloud Computing Lecture 15: IO Virtualization

EP2210 Scheduling. Lecture material:

Resilient Network Interconnect using Distributed Link Aggregation Version 3. Stephen Haddock Extreme Networks September 15, 2010

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

ThousandEyes for. Application Delivery White Paper

Lenovo - Excelero NVMesh Reference Architecture

Mobile Communications Chapter 9: Mobile Transport Layer

Mobile Communications Chapter 9: Mobile Transport Layer

NUMA replicated pagecache for Linux

Computer Networking

Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES

Pricing Intra-Datacenter Networks with

Support for Smart NICs. Ian Pratt

6.9. Communicating to the Outside World: Cluster Networking

Stateless Network Functions:

Real-Time Internet of Things

Networks and distributed computing

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

BlackBerry AtHoc Networked Crisis Communication Capacity Planning Guidelines. AtHoc SMS Codes

Remote Persistent Memory With Nothing But Net Tom Talpey Microsoft

Using a Real-time, QoS-based ORB to Intelligently Manage Communications Bandwidth in a Multi-Protocol Environment

Problems with IntServ. EECS 122: Introduction to Computer Networks Differentiated Services (DiffServ) DiffServ (cont d)

High Performance Packet Processing with FlexNIC

Performance Isolation in Multi- Tenant Relational Database-asa-Service. Sudipto Das (Microsoft Research)

Network Design Considerations for Grid Computing

Networked Control Systems for Manufacturing: Parameterization, Differentiation, Evaluation, and Application. Ling Wang

CSE 123A Computer Networks

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

The NE010 iwarp Adapter

Worst-case Ethernet Network Latency for Shaped Sources

Towards General-Purpose Resource Management in Shared Cloud Services

Optimizing TCP Receive Performance

Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Chapter 1. Introduction

Workloads, Scalability and QoS Considerations in CMP Platforms

Cisco HyperFlex Systems

Cisco Application Centric Infrastructure and Microsoft SCVMM and Azure Pack

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

Proposal for an Open Loop PHY Rate Control Mechanism

genzconsortium.org Gen-Z Technology: Enabling Memory Centric Architecture

Predictable Interrupt Management and Scheduling in the Composite Component-based System

CSCI Computer Networks

Database Performance on NAS: A Tutorial. Darrell Suggs. NAS Industry Conference Network Appliance - Darrell Suggs

Troubleshooting with Network Analysis Module

Enabling Efficient and Scalable Zero-Trust Security

Live Migration of Direct-Access Devices. Live Migration

CloudNet: Dynamic Pooling of Cloud Resources by Live WAN Migration of Virtual Machines

JavaPolis 2004 Access Control Architectures: COM+ vs. EJB

High-Performance ACID via Modular Concurrency Control

Networking in a Vertically Scaled World

! Readings! ! Room-level, on-chip! vs.!

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

VALE: a switched ethernet for virtual machines

Transcription:

Link Aggregation: A Server Perspective Shimon Muller Sun Microsystems, Inc. May 2007

Supporters Howard Frazier, Broadcom Corp.

Outline The Good, The Bad & The Ugly LAG and Network Virtualization Networking and Multi-Threading Summary

The Good Potential for linear throughput scaling > Assumptions: > Total throughput is an aggregation of multiple conversations (flows) > The conversations are uniformly distributed across the physical links > Packet ordering must be maintained at all times Works well for applications where a large number of network flows is a given > Web Tier servers > Thousands/Millions of flows > Statistical multiplexing works for almost any traffic distribution algorithm > No flow dominates the bandwidth

The Bad For some applications linear scaling is not a given > Back-end Tier servers: Data Warehousing, Databases, OLTP, etc. > Dozens of connections at most > Can't assume statistical multiplexing > Need to dynamically manage the flow spreading over the LAG > Move flows around ---too complicated Single flow throughput limited to the speed of a single phys. link > Directly affects the performance of some lication Tier servers > Bulk data transfers: file servers, backup servers, etc. The LAG distributor can have a performance impact > On the host, typically implemented in the driver or just above it > Requires packet inspection > The deeper, the better spreading, but implies higher overhead > Duplicates protocol stack processing Encrypted traffic... Layer violations...

The Ugly Latency penalty > Dominates the performance of transactional applications > Typically a request-response exchange, followed by a bulk data transfer > Measured in single-digit microseconds > Round-trip latency is directly proportional to the speed of the physical link > Common scenario: a small packet (request/response) stuck after a large packet (bulk transfer) > A typical LAG distributor will map all the packets to the same physical link > Time to send a packet: 1.23usec for a 4x10Gb LAG vs. 0.31usec for a 40Gb link > LAG distributor serialization point adds latency > Packet inspection > Mutex lock contention The LAG distributor creates a serialization point for Tx traffic > Breaks the parallelism paradigm for multi-core/multi-threaded computing > Breaks the network virtualization story > Interferes with efficient network b/w provisioning, capacity planning and QoS > Choice of suboptimal traffic spreading vs. sophisticated h/w

Network Interface Virtualization Logical Domain 1 Logical Domain 2 Logical Domain 3 V-Ether V-Ether V-Ether Driver Driver Driver Service Domain V-Ether Bridge V-Ether Bridge Device Driver Hypervisor I/O Bridge Data movement occurs directly between Logical Domains and virtualized device Shared Network Interface

LAG and Network Virtualization Network virtualization goals > Pool all the networking resources in a system > Dynamically provision network resources to applications in compute domains with fine granularity and QoS > Enforce isolation between the compute domains Network virtualization usage models > Today network virtualization is done using the proxy model > All network traffic goes through the Service Domain > Creates a performance bottleneck > Breaks parallelism for network processing > Next generation of n/w virtualization will provide a direct path to shared NIC The role of LAG > Efficient LAG distribution algorithms require a complete view of all the network flows in the system > Implies the use of the proxy model > Doing LAG in Guest Domains will be suboptimal due to a limited number of flows > Complicates bandwidth provisioning and QoS > May need to split the bandwidth across multiple physical links

Networking in a Thread-Rich System n+2 n+3 I P/2 n P/2 n+1 I n+3 n+2 n+1 I/2 I/2 n An arbitrary combination of parallel and pipeline semantics > Can assign threads to do very specific chores with minimal latency > Use parallelism to improve latency and throughput > Use pipelining to improve throughput

LAG and Multi-Threading New techniques for optimizing server network performance > Typically will be a combination of parallelism and pipelining > Server performance can be optimized without re-writing applications > Use the threads to distribute the work intelligently Multi-threading does not necessarily imply more n/w flows > Nor is there a need to assume that for a faster pipe > No need to rely on statistics > Can use the threads to speed up the throughput of a single connection > It doesn't take too many network connections to saturate a 10Gb pipe today

Summary LAG is a good thing......but not as good as a faster pipe!