FM4000. A Scalable, Low-latency 10 GigE Switch for High-performance Data Centers

Similar documents
QoS Architecture and Its Implementation. Sueng- Yong Park, Ph.D. Yonsei University

Faster FAST Multicore Acceleration of

Contents. Introduction. Methodology. Check for Output Drops. Introduction Methodology

Basics (cont.) Characteristics of data communication technologies OSI-Model

IEEE Time-Sensitive Networking (TSN)

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

Cisco Nexus 9508 Switch Power and Performance

Configuring Queuing and Flow Control

WHITE PAPER. Latency & Jitter WHITE PAPER OVERVIEW

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture

Switching Architectures for Cloud Network Designs

Configuring Priority Flow Control

Configuring Queuing and Flow Control

Monitoring Ports. Port State

Cisco Data Center Ethernet

Configuring Priority Flow Control

Cisco ASR 1000 Series Aggregation Services Routers: QoS Architecture and Solutions

Cisco IOS Switching Paths Overview

GUARANTEED END-TO-END LATENCY THROUGH ETHERNET

Modular Quality of Service Overview on Cisco IOS XR Software

Lecture 9. Quality of Service in ad hoc wireless networks

White Paper Enabling Quality of Service With Customizable Traffic Managers

Configuring Local SPAN and ERSPAN

Cisco Nexus 9500 Series Switches Buffer and Queuing Architecture

MQC Hierarchical Queuing with 3 Level Scheduler

Product features. Applications

Requirement Discussion of Flow-Based Flow Control(FFC)

Traditional network management methods have typically

Cisco Nexus 5000 Series Architecture: The Building Blocks of the Unified Fabric

CN-100 Network Analyzer Product Overview

CS519: Computer Networks. Lecture 1 (part 2): Jan 28, 2004 Intro to Computer Networking

Cisco Nexus 3548 Switch Performance Validation December 2012

Informal Quiz #01: SOLUTIONS

DATA CENTER FABRIC COOKBOOK

Cisco UCS 6324 Fabric Interconnect

Configuring Priority Flow Control

Lecture 5: Performance Analysis I

Ethernet Hub. Campus Network Design. Hubs. Sending and receiving Ethernet frames via a hub

Understanding How Routing Updates and Layer 2 Control Packets Are Queued on an Interface with a QoS Service Policy

QoS provisioning. Lectured by Alexander Pyattaev. Department of Communications Engineering Tampere University of Technology

ECE 697J Advanced Topics in Computer Networks

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

A common issue that affects the QoS of packetized audio is jitter. Voice data requires a constant packet interarrival rate at receivers to convert

Configuring Modular QoS Congestion Avoidance

Cisco Nexus 9300-EX Platform Switches Architecture

Configuration Commands. Generic Commands. description XRS Quality of Service Guide Page 125

Cisco Nexus 6004 Switch Performance Validation

Configuring Quality of Service

In-Band Flow Establishment for End-to-End QoS in RDRN. Saravanan Radhakrishnan

Configuring Quality of Service

Master Course Computer Networks IN2097

Baidu s Best Practice with Low Latency Networks

Configuring Link Level Flow Control

Latency on a Switched Ethernet Network

Real-Time Protocol (RTP)

6.033 Spring Lecture #12. In-network resource management Queue management schemes Traffic differentiation spring 2018 Katrina LaCurts

Lecture Outline. Bag of Tricks

Arista 7020R Series: Q&A

Throughput & Latency Control in Ethernet Backplane Interconnects. Manoj Wadekar Gary McAlpine. Intel

QoS in IPv6. Madrid Global IPv6 Summit 2002 March Alberto López Toledo.

Resource allocation in networks. Resource Allocation in Networks. Resource allocation

5. QoS Functions in Core and Backbone Networks

16-Port Industrial Gigabit Web Smart DIN-Rail Switch TI-G160WS (v1.0r)

The GLIMPS Terabit Packet Switching Engine

RSVP and the Integrated Services Architecture for the Internet

Portable 2-Port Gigabit Wirespeed Streams Generator & Network TAP

Gigabit EasySmart Switches

Understanding Queuing and Scheduling QoS on Catalyst 4000 Supervisor III and IV

Technology Overview. Frequently Asked Questions: MX Series 3D Universal Edge Routers Quality of Service. Published:

Per Hop Worst Case Class A Latency

Smart Managed PoE-Powered 5-Port Gigabit Switch

Axon: A Low-latency Device Implementing Source-routed Ethernet

Mohammad Hossein Manshaei 1393

Low Cost 26 Port 10 GbE and GbE ATCA Switch

M120 Class-of-Service Behavior Analysis

Port-Shaper and LLQ in the Presence of EFPs

PFC QoS. Prerequisites for PFC QoS. Restrictions for PFC QoS CHAPTER

Chapter 7 CONCLUSION

3. What could you use if you wanted to reduce unnecessary broadcast, multicast, and flooded unicast packets?

IP SLA Service Performance Testing

Configuring PFC QoS CHAPTER

Community-of-Interest Multicast Cache Loading

Localization approaches based on Ethernet technology

Advanced Computer Networks

Master Course Computer Networks IN2097

EP2210 Scheduling. Lecture material:

H3C S9500 QoS Technology White Paper

Time-Sensitive Networking: A Technical Introduction

Overview. Hardware Features S3100-8F-8G S F-8G S G-2F S G. S3100 Series L2 Gigabit Ethernet Switches

Lecture 16: Network Layer Overview, Internet Protocol

Switch shall have 4 SFP 1000 Mb/s ports (2 Port dual-personality ports; 10/100/1000BASE-T or SFP and 2 Fixed 1G SFP port)

Best Practices for Deployments using DCB and RoCE

IP SLA Service Performance Testing

Gigabit Metro Ethernet Switches

Deploying the Dell PowerConnect 8100 Series Switch with DCB Features in an iscsi Environment. Kili Land Manjesh Siddamurthy Kevin Locklear

Latency on a Switched Ethernet Network

32 Port (Multi Optical Ports) Full Gigabit Managed 10G Core Switch

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Queuing Mechanisms. Overview. Objectives

Topic 4a Router Operation and Scheduling. Ch4: Network Layer: The Data Plane. Computer Networking: A Top Down Approach

Transcription:

A Scalable, Low-latency 10 GigE Switch for High-performance Data Centers Uri Cummings Rebecca Collins Virat Agarwal Dan Daly Fabrizio Petrini Michael Perrone Davide Pasetto Hot Interconnects 17 (Aug 2009) Columbia University IBM TJ Watson Research Center, US IBM Computational Science Center, Ireland 1

What is? 10 GigE Silicon L2/L3 Switch/Router 300 ns Latency Parallel Multicast CEE Feature Set 2

Convergence CEE: Converged Enhanced Ethernet Converge What? - Technologies? - Requirements? Storage HPC IPC NEVER Loss TCP/I P milliseconds Latency Budget microseconds Some 3

Latency Becomes Key Metric All Non-IP Flows Are Latency Sensitive Old Foes Incur New Costs: Loss Latency - Retransmits become too expensive Queuing Latency - Big buffers have big side effects Network Speed 1B Queue Delay 9K Queue Delay 1 Gigabit / sec 0.008 microseconds 72 microseconds 10 Gigabit / sec 0.0008 microseconds 7.2 microseconds Cut-through Avoids Per-hop Queuing 4

Converged Network Requirements Latency Becomes System Level Metric Guaranteed Lossless - Non-Blocking Architectures Unicast & Multicast - Flow Control Throttle senders before loss occurs Link-Level (PFC) and End-to-End (QCN) Fast response time critical Transmission Selection - Transmit latency sensitive traffic first - IP traffic can queue when needed to 5

Convergence Where do transaction networks fall? Can they share the same wire? Storage Transaction NEVER HPC IPC Allowed Loss TCP/I P milliseconds Latency Budget microseconds Some 6

Converged Network Requirements Step 1: Distribute Market Information Must be multicast, low latency and low jitter Step 2: End Users Process Updates End users prefer updates at the same time Step 3: Profit 7

OPRA Feeds: Exponential Growth OPRA market peak data rates (million messages per second) 0.9 1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 8

Transaction Network Requirements Latency is THE System Level Metric Provisioned Lossless Transmission Selection 9

Problem Statement 1. Can CEE switches be used to implement a transaction network? Yes using low latency Ethernet 2. Can I use this same network to check my email? Yes using CEE enabled switches 3. Can this functionality be demonstrated without building out an entire network? Yes by pushing the envelope 10

Architecture Output Queued Shared Memory Switch 11

Architecture N Ports * (Min-frame-size framerate) = Packet Rate 24 Ports * 14.88 M packets/sec = 357 Mpps N Ports * 10GigE = Data Rate (input and output) 24 Ports * 10GigE = 240 Gigabits / second Output Scheduler: Runs at Double Packet Rate Shared Memory: Loads and Stores at Data Rate 12

Forwarding Pipeline Ethernet Specific Tables - MAC, ARP & IP Tables Fully Flexible TCAM - ACLs, flow tracking/forwarding, etc Port Mapping - Global addressing for thousands of ports in a fabric QoS and Congestion Management Runs at frame rate ( > 360 Mpps ) Sits in forwarding latency path 13

Multicast Replication Replication on Egress Pointer tracking Parallel draining at port 67.2 nsec loop Multicast copies transmit synchronized within 67.2 nsec 14

Egress Scheduling 8 queues per port Two Level Scheduler - Scheduler first by group - Schedule second by priority Min Bandwidth Guarantees - Deficit round-robin Max Bandwidth Guarantees - Bandwidth shaper groups Strict Priority Group Strict Priority DRR Group DRR DRR Strict Priority 8x (colors represent priority sets) Shaping Group Example Configuration Priority Pause Scheduler 15

Measuring Performance Existing RFC Benchmarks Most RFC performance tests unlikely to occur in real network Defines performance envelop with extreme corner case traffic flows Testing Goal: Start with RFCs, then expand to converged multicast traffic 16

Port to Port Benchmark RFC 2544 No inherent congestion - expect zero queuing Inability to forward a rate incurs: - Loss - Queuing latencies 17

Port to Port Benchmark Results Zero Loss, nanosecond latency across frame sizes 18

23 to 1 Flow Control Single Hot Spot Simultaneous Congesting Flows Flow Controlled to 1/23 rd of Line Rate Must Be Fair 2 3 22 23 24 RX 1 TX 1 RX 2 TX 2 RX 3 TX 3 Switch RX 22 TX 22 RX 23 TX 23 RX 24 TX 24 22 23 24 2 3 19

23 to 1 Benchmark Results Zero Loss, ports throttled evenly (within 2%) 20

1 to 24 Multicast Test Single Multicast Transmitter Loopback Suppression Disabled - Traffic hairpin turns back to sender Non-blocking Multicast Forwarding 21

Multicast Background Traffic Test Add Broadcast Traffic On All Ports Switch Filled With 100% Broadcast Traffic 22

Multicast Background Traffic Results Prioritization Guarantees BW and Latency 23

Multicast Background Traffic Without background traffic: - 300 nanosecond latency, 1 to 24 multicast - Nanosecond jitter between multicast copies With low priority background traffic: - Adds up to 1 low priority frame s worth of queuing 24

Conclusions Tests Demonstrate: - Non-blocking architecture - Cut-through latency - Predictable delay under stress - Lossless flow control on all ports - Multicast transmission selection of high vs low priority Data Determines Latency Budget - Same wire for transactions and IP incurs latency on transactions - Separate wires, same switch incurs no penalty 25

Contributors Contributor Rebecca Collins Virat Agarwal Fabrizio Petrini Michael Perrone Davide Pasetto Uri Cummings Dan Daly Organization IBM TJ Watson Research Center, US and Columbia University IBM TJ Watson Research Center, US IBM TJ Watson Research Center, US IBM TJ Watson Research Center, US IBM Computational Science Center, Ireland Fulcrum Microsystems Fulcrum Microsystems Many thanks to the team for putting together this paper! Questions? 26