Rx Stack Accelerator for 10 GbE Integrated NIC

Size: px
Start display at page:

Download "Rx Stack Accelerator for 10 GbE Integrated NIC"

Transcription

1 IEEE Hot Interconnects 20, Santa Clara, CA, Aug , 2012 Rx Stack Accelerator for 10 GbE Integrated NIC F. Abel, C. Hagleitner IBM Research Zurich Switzerland F. Verplanken IBM Systems & Technology Group La Gaude France 2009 IBM Corporation

2 EI3 RX EI3 TX L2 L2 L2 CI HEA/ MC L2 Comp BIC BIC RegX MC Crypto XML Discrete vs Integrated Network Interface Controller Discrete NIC (dnic) eripheral ASIC device Marketed as: Ethernet Controller Converged Controller Integrated NIC (inic) Sun Niagara 2 Freescale QorIQ TM family IBM oweren TM 410 mm 2 (1.43 billion transistors) CI/Ethernet Network Interface Card (NIC) inic ~3% BIC BIC AT 2 AT 3 BUS AT 0 AT 1 BEC Converged Network Adapter (CNA) DDR3 DDR3 DDR3 DDR3 LAN On Motherboard (LOM) Higher performance, lower latency Lower power consumption Significant cost reduction Alter the general-purpose nature of the computer complex 2

3 Outline Hardware context of this work oweren TM / Host Ethernet Adapter Functional requirements Architecture of the Rx Stack Accelerator Data path acket parser acket handler Results Implementation erformance Summary and conclusions 3

4 Hardware Context 4

5 Context: oweren TM / Host Ethernet Adapter (HEA) AT 0 AT 1 AT 2 AT 3 Mem HY Mem HY LL LL LL LL LL LL L L L L LL LL Comp / Decomp Crypto XML attern Engine 2MB L2 2MB L2 2MB L2 2MB L2 MC MC BIC BIC Bus Bus External Controller Root/E Engine CI Exp Gen 2 BIC Root Engine CI Exp Gen 2 BIC Quad-port 10 GbE NIC a.k.a Host Ethernet Adapter (HEA) bus Access Macro ervasive 4x 10GE MAC or 4x 1GE MAC EI3 EI3 EI3 x8 HY x8 HY x8 HY x4 HY Misc I/O 5

6 Overview of the Host Ethernet Adapter (HEA) 4 10 GbE state-of-art Ethernet controller featuring: I/O virtualization support Through 128 queue pairs Internal layer-2 switch for partition-to-partition data traffic Flexible queue selection and scheduling assist Rx and Tx protocol acceleration Low-latency through direct processor bus attachment and cache injection Multi-core scaling, Interrupt and receive coalescing assist, 9 KB Jumbo frame support, Memory address protection... Equivalent functionality and performance levels as modern discrete NICs Small footprint (13 mm 2 ) and low power (2.6 W) SoC Bus Host Interface Tx Acc. Tx Tx Tx Acc. Acc. Acc. Rx Acc. Rx Rx Rx Acc Acc Acc XGMAC XGMAC XGMAC XGMAC XGMAC XGMAC XGXSCS XGCS XGCS XGCS XGXSCS XGCS XGCS XGCS 6

7 RXACC Functional Requirements 7

8 Target Domain Requirements oweren TM targets network-facing applications Must cope with a large spectrum of protocols (13+), traffic characteristics and policies E.g. routers, firewalls, intrusion-prevention systems and network analytics Must be able to adapt to changes Handle other existing or emerging protocols Virtualized I/Os Must sustain 16 Gb/s (internal layer-2 switch) Ethernet: A trucking service for application data rotocol layered architecture (i.e., encapsulation) permutations Iv4t ISL DIX VLAN QinQ SA SNA oe Iv4 TC ICMv6 L1.5 L2 L2 L2 L2.5 L3 L4 ayload L2 8 MLS MLS MLS MLS MLS MLS Iv6 Iv6t UD Ipv6 Ipv6 Ext. Ext. Hdr. Ipv6 Ext. Hdr. Ipv6 Ext. Hdr. Hdr.

9 Distribution of the rotocol Stacks U, 558 L5, 424 L4, L4, 2118 RH, 186 LF, 329 MF, 331 FF, 651 I6t, 186 Iv6, I6, M, 85 A, 25 DIX, SA, 1079 SNA, 1078 Q, VLAN, 1183 QQ, 362, 972 I4t, 197 I4, 940 M1, 1788 M6, 394 M5, 665 M4, 941 M3, 1231 M2, 1510 MLS2,

10 10 Offloaded Service Requirements WHY TC Rx+Tx processing ~3000 instructions (assuming zero-copy and checksum offload) 10 GbE Occurrence of 64 B packets 67.2 ns (14.8 Mfps) A generally accepted rule of thumb 1 GHz / 1 Gb/s WHAT (business as usual) 'per-byte' operations check-summing 'per-packet' operations header processing: VLAN, MAC, flow identification, QoS determination, discard, errors, traffic steering HOW (different) Flexible way rogrammable parsing and processing (rule-based) Meta-data descriptors Save instructions per frame E.g., rotocol stack signature (31b), rotocol statck offsets (64b) D I X S A S N A V L A N Q i n Q o E M L S 1 M L S 2 M L S 3 M L S 4 M L S 5 M L S 6 I v 4 I v 4 t I v 6 I v 6 t F F r g M F r g L F r g R H d r L 4 L 5 N u N u N u N u N u I n c o m M a l f d A b o r t N u L4 rot. L2.5 os. L3 osition L4 osition ayload osition

11 Rx Stack Accelerator 11

12 Rx Stack rocessing Can be decomposed in three major tasks: (t1) arsing Identify protocol stack + position of protocol fields (t2) Data extraction Locate and retrieve data to be processed (t3) rocessing Execute instructions based on identified rules Filtering (MAC, VLAN), VLAN extraction, Rx queue assignment, checksum verification, flow determination, discard, counters increments, Main characteristics exhibited by protocol processing applications [Jantsch1998] (c1) Intensive use of pattern matching especially on headers (c2) Complex and control dominated flow many nested if-then-else and case structures (c3) Intensive use of irregular memory accesses various sizes and patterns 12

13 RXACC Architecture A Transport Triggered Architecture (TTA) w/ 5 transport buses (1 byte/bus) Cut-through architecture low latency efficient irregular mem. access (c3) clk = 625 MHz octaword-out Data ath Transport Src FU 1 (180 bits) meta-data acket Handler Sockets Dst acket arser (programmable) FU N Ctrl Frame signature saves instructions (avg.) IL architecture instruction-level parallelism parallel, independent, and pipelined coprocessors (FUs) performance scalability hardware efficiency hardware modularity ultimate RISC architecture - only 1 instruction move data octaword-in (16 B) rog. FSM + Balanced Routing Table search algorithm (B-FSM) efficient pattern-matching (c1) efficient one-cycle multiway branching (c2) 13

14 Data ath (D) 14

15 Multi-field acket Inspection & Data Extraction QQ/VLAN/SA/SNA/Iv6/UD Off.5 Off.4 Off.3 Off.2 Off.1 F Off.5 Off.4 Off.3 Off.2 Off.1 F 15 Current fields of interest Next fields of interest

16 Data ath Architecture Cut-through architecture Relaxed buffering (2 16B) + Low latency Flexible data extraction (any 5 bytes within the 32B window) Entire packet is exposed to the parser Can inspect and match any data of the frame D00 D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 D13 D14 D15 Status HOST I/F Legend: L2 (DIX) L3 (Iv4) Extracted fields: Frmtr = 16 Offset1 = 2 Offset2 = 3 Offset3 = 7 Offset4 = 8 Offset5 = 9 R H Tot. Len. R L MAC DA Id. MAC SA r Hd. ot. Csum Ether Type Status SA DA Status M 00 M 0 B C N T SrcOp I/F Tranport I/F B1-B5 Frmtr Off(1-5) D00 D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 D13 D14 D15 Status XGMAC I/F 16

17 acket arser () 17

18 acket arser Design space: micro-coded limited branch capabilities multi-ghz operation power finite state machine (FSM) efficient but inflexible programmable finite state machine (pfsm) high performance and energy efficient Iv4_SA S22 Rule Current State Input symbols Input 1 Input 2 Input 3 Next State riority R44 S22 XXXX_XXXXb XXXX_XXXXb XXXX_XXXXb S24 0 R52 S24 XXXX_0101b 0001_0001b X00X_XXXXb S65 0 Iv4_DA_rot S24 R53 S24 XXXX_XXXXb 0001_0001b X00X_XXXXb S82 1 R61 S24 XXXX_XXXXb 0010_1001b X00X_XXXXb S36 2 R112 S24 XXXX_0101b 0000_0110b X00X_XXXXb S Iv4_UD S65 Iv4_TC S44 Iv4_UL4 S36 State transition diagram State transition rules ('X' symbol represents a don't care bit) HW Engine We use the B-FSM architecture [van Lunteren2006] 'B' stands for Balanced Routing Table search algorithm (BaRT) Originally designed for longest matching prefix searches (i.e. routing table lookups)

19 Output H I/F acket arser Architecture Input D I/F B1 N -B3 N Data ath Controller FInc, Src1-Src5 Range Comparator RCC Address Generator H next-state State Register output Rule Selector input Transition Rule Memory B-FSM engine bits 256 rules (1 rule = 160 bits) S C B1 C -B3 C STATE RCR B1 B2 B3 S N Match 1 F Transition Rule Selector Match 2 G H' Match 3 LUT Match 4 R1 R2 R3 R4 FWR Transition Rule Memory (F, F', G, G', H') Reg. File A,B Stat Reg. Input Rule Structure 19 Test part Result part H part S C Input symbols S N Output symbols H sym.

20 acket Handler (H) 20

21 acket Handler Architecture Functional Units (FU) Independent coprocessors (I-blocks) Concurrent operation IL erformance Configurable through MMIO Implement TTA socket registers: O = operand register(s) T = trigger register (note: result registers are not implemented) FU example: the HASHER 32 meta-data Hasher (XORs) Symmetry Bit Mask MMIO config. FU 1 acket Handler FU N O 1 O 2 O 3 O N T Sockets Transport Dst Ctrl Transport Decode 21 Dst

22 Rule Format 160 bits CS Compare Value Compare Mask NS Src 1 Src 2 Src 3 Src 4 Src 5 Dst 1 Dst 2 Dst 3 Dst 4 Dst 5 Test art (62 bits) Result art (variable-length instruction) (87 bits) Frame pointer increment (Long/Short Fixed/Variable) Source of the operands (i.e. offsets w.r.t the frame pointer) Destination of the operands (i.e. functional unit registers) H art (9 bits) Simplified Instruction Set Frame pointer instructions e.g. ffpinc(4); // fixed increment e.g. vfpinc(1); // variable increment Move instruction e.g.: move(d(4), CSI.O1); // unicast destination e.g.: move(d(12), HSH.O1 & CSI.O1 & CST.O9); // multicast destination (socket write sharing) 22

23 erformance and Implementation Results 23

24 Implementation Results Area (in 45 nm SOI technology) 1 RXACC = 0.7 mm 2 4 RXACC = 21.5 % of HEA (entire HEA = 13 mm 2 ) Clock Frequency 625 MHz (27% of core frequency) ower (estimate) RXACC 0.15 W (entire HEA = 2.6 W) Transition Rule Memory Utilization 256 rules 64 x (4 x 160) bits = 40 kbits 68 states out of 128 (53 %) 221 rules out of 256 (86 %) Transition rule memory + ECC logic = 15 % of RXACC 24

25 RXACC Bandwidth Bandwidth (Gb/s) DIX/Iv4/TC DIX/Iv6/UD DIX/QQ/VLAN/oE/MLS/MLS/MLS/MLS/Iv4/TC DIX/QQ/VLAN/Iv6-HBH-ROUT-FRAG/TC Theoretical limit of 10 GbE media (Inter-frame gap=12b - reample=8b) rotocol stacks do not fit (missing bars) Frame length (bytes) 25

26 Summary & Conclusions rocessor compute complex + integrated network I/O complex IFF high-computation performance (1) + high-chip density (2) + low-power (3) + flexibility (4) RXACC delivers (1) + (2) + (3) + (4) (1) erformance 15 Mfps, 20 Gb/s (at relaxed clock frequency 625 MHz) Saves hundreds of CU cycles per frame (2-3) Area and power efficiency 0.7 mm 2 (45 nm SOI), 0.15 W (4) Flexibility protocol permutations rogrammable parsing and processing Rule-based Can parse new emerging standards (e.g. SDN) Can inspect header and payload ragmatic approach w/ one code-set per application RXACC = TTA + pfsm = novel Application Specific rocessor Key enabler for integrated NICs The architecture has headroom to scale towards GbE 26

27 Rx Stack Accelerator for 10 GbE Integrated NIC Thank you IEEE Hot Interconnects 20, Santa Clara, CA, Aug , IBM Corporation

P51: High Performance Networking

P51: High Performance Networking P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman noa.zilberman@cl.cam.ac.uk Lent 2017/18 High Throughput Interfaces Performance Limitations So far we discussed

More information

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES Greg Hankins APRICOT 2012 2012 Brocade Communications Systems, Inc. 2012/02/28 Lookup Capacity and Forwarding

More information

A 400Gbps Multi-Core Network Processor

A 400Gbps Multi-Core Network Processor A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,

More information

A Next Generation Home Access Point and Router

A Next Generation Home Access Point and Router A Next Generation Home Access Point and Router Product Marketing Manager Network Communication Technology and Application of the New Generation Points of Discussion Why Do We Need a Next Gen Home Router?

More information

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21 100 GBE AND BEYOND 2011 Brocade Communications Systems, Inc. Diagram courtesy of the CFP MSA. v1.4 2011/11/21 Current State of the Industry 10 Electrical Fundamental 1 st generation technology constraints

More information

An Intelligent NIC Design Xin Song

An Intelligent NIC Design Xin Song 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) An Intelligent NIC Design Xin Song School of Electronic and Information Engineering Tianjin Vocational

More information

Advanced Computer Networks. End Host Optimization

Advanced Computer Networks. End Host Optimization Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct

More information

INT 1011 TCP Offload Engine (Full Offload)

INT 1011 TCP Offload Engine (Full Offload) INT 1011 TCP Offload Engine (Full Offload) Product brief, features and benefits summary Provides lowest Latency and highest bandwidth. Highly customizable hardware IP block. Easily portable to ASIC flow,

More information

PacketShader: A GPU-Accelerated Software Router

PacketShader: A GPU-Accelerated Software Router PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,

More information

INT G bit TCP Offload Engine SOC

INT G bit TCP Offload Engine SOC INT 10011 10 G bit TCP Offload Engine SOC Product brief, features and benefits summary: Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx/Altera FPGAs or Structured ASIC flow.

More information

The iflow Address Processor Forwarding Table Lookups using Fast, Wide Embedded DRAM

The iflow Address Processor Forwarding Table Lookups using Fast, Wide Embedded DRAM Enabling the Future of the Internet The iflow Address Processor Forwarding Table Lookups using Fast, Wide Embedded DRAM Mike O Connor - Director, Advanced Architecture www.siliconaccess.com Hot Chips 12

More information

440GX Application Note

440GX Application Note Overview of TCP/IP Acceleration Hardware January 22, 2008 Introduction Modern interconnect technology offers Gigabit/second (Gb/s) speed that has shifted the bottleneck in communication from the physical

More information

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level

More information

COMP211 Chapter 4 Network Layer: The Data Plane

COMP211 Chapter 4 Network Layer: The Data Plane COMP211 Chapter 4 Network Layer: The Data Plane All material copyright 1996-2016 J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down Approach 7 th edition Jim Kurose, Keith Ross

More information

NOW Handout Page 1. Recap: Gigaplane Bus Timing. Scalability

NOW Handout Page 1. Recap: Gigaplane Bus Timing. Scalability Recap: Gigaplane Bus Timing 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Address Rd A Rd B Scalability State Arbitration 1 4,5 2 Share ~Own 6 Own 7 A D A D A D A D A D A D A D A D CS 258, Spring 99 David E. Culler

More information

NetFPGA Hardware Architecture

NetFPGA Hardware Architecture NetFPGA Hardware Architecture Jeffrey Shafer Some slides adapted from Stanford NetFPGA tutorials NetFPGA http://netfpga.org 2 NetFPGA Components Virtex-II Pro 5 FPGA 53,136 logic cells 4,176 Kbit block

More information

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan

More information

Outline. Limited Scaling of a Bus

Outline. Limited Scaling of a Bus Outline Scalability physical, bandwidth, latency and cost level of integration Realizing rogramming Models network transactions protocols safety input buffer problem: N-1 fetch deadlock Communication Architecture

More information

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Weirong Jiang, Viktor K. Prasanna University of Southern California Norio Yamagaki NEC Corporation September 1, 2010 Outline

More information

Gigabit Ethernet XMVR LAN Services Modules

Gigabit Ethernet XMVR LAN Services Modules Gigabit Ethernet XMVR LAN Services Modules Ixia's Gigabit Ethernet XMVR LAN Services Modules (LSMs) offer Layer 2-3 network testing functionality in a single test system. Each test port supports wire-speed

More information

FMS18 Invited Session 101-B1 Hardware Acceleration Techniques for NVMe-over-Fabric

FMS18 Invited Session 101-B1 Hardware Acceleration Techniques for NVMe-over-Fabric Flash Memory Summit 2018 Santa Clara, CA FMS18 Invited Session 101-B1 Hardware Acceleration Techniques for NVMe-over-Fabric Paper Abstract: The move from direct-attach to Composable Infrastructure is being

More information

Gigabit Ethernet XMVR LAN Services Modules

Gigabit Ethernet XMVR LAN Services Modules Gigabit Ethernet XMVR LAN Services Modules Ixia's Gigabit Ethernet XMVR LAN Services Modules (LSMs) offer Layer 2-3 network testing functionality in a single test system. Each test port supports wire-speed

More information

QuickSpecs. HP Z 10GbE Dual Port Module. Models

QuickSpecs. HP Z 10GbE Dual Port Module. Models Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or

More information

A 10GE Monitoring System. Ariën Vijn

A 10GE Monitoring System. Ariën Vijn A 10GE Monitoring System Ariën Vijn arien@ams-ix.net Agenda - Introduction The role of an internet exchange (IX). - The problem to be solved. Real life examples - The chosen solution for that problem *

More information

Network Processors. Nevin Heintze Agere Systems

Network Processors. Nevin Heintze Agere Systems Network Processors Nevin Heintze Agere Systems Network Processors What are the packaging challenges for NPs? Caveat: I know very little about packaging. Network Processors What are the packaging challenges

More information

MSC8156 Ethernet Interface

MSC8156 Ethernet Interface June 21, 2010 MSC8156 Ethernet Interface QUICC Engine Ethernet Programming Andrew Temple NMG DSP Applications Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis,

More information

CN-100 Network Analyzer Product Overview

CN-100 Network Analyzer Product Overview CN-100 Network Analyzer Product Overview CN-100 network analyzers offer an extremely powerful yet cost effective solution for today s complex networking requirements. Test Ethernet or ATM networks with

More information

KeyStone C66x Multicore SoC Overview. Dec, 2011

KeyStone C66x Multicore SoC Overview. Dec, 2011 KeyStone C66x Multicore SoC Overview Dec, 011 Outline Multicore Challenge KeyStone Architecture Reminder About KeyStone Solution Challenge Before KeyStone Multicore performance degradation Lack of efficient

More information

Software Datapath Acceleration for Stateless Packet Processing

Software Datapath Acceleration for Stateless Packet Processing June 22, 2010 Software Datapath Acceleration for Stateless Packet Processing FTF-NET-F0817 Ravi Malhotra Software Architect Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions

More information

10G bit UDP Offload Engine (UOE) MAC+ PCIe SOC IP

10G bit UDP Offload Engine (UOE) MAC+ PCIe SOC IP Intilop Corporation 4800 Great America Pkwy Ste-231 Santa Clara, CA 95054 Ph: 408-496-0333 Fax:408-496-0444 www.intilop.com 10G bit UDP Offload Engine (UOE) MAC+ PCIe INT 15012 (Ultra-Low Latency SXUOE+MAC+PCIe+Host_I/F)

More information

INT-1010 TCP Offload Engine

INT-1010 TCP Offload Engine INT-1010 TCP Offload Engine Product brief, features and benefits summary Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx or Altera FPGAs INT-1010 is highly flexible that is

More information

Support for Smart NICs. Ian Pratt

Support for Smart NICs. Ian Pratt Support for Smart NICs Ian Pratt Outline Xen I/O Overview Why network I/O is harder than block Smart NIC taxonomy How Xen can exploit them Enhancing Network device channel NetChannel2 proposal I/O Architecture

More information

Scalable Distributed Memory Machines

Scalable Distributed Memory Machines Scalable Distributed Memory Machines Goal: Parallel machines that can be scaled to hundreds or thousands of processors. Design Choices: Custom-designed or commodity nodes? Network scalability. Capability

More information

Automatic Generation of 100 Gbps Packet Parsers from P4 Description

Automatic Generation of 100 Gbps Packet Parsers from P4 Description Automatic Generation of 100 Gbps acket arsers from 4 Description avel Benáček 1 Viktor uš 1 Hana Kubátová 2 1 Liberouter CESNET 2 Faculty of Information Technology Czech Technical University in rague H

More information

4. Shared Memory Parallel Architectures

4. Shared Memory Parallel Architectures Master rogram (Laurea Magistrale) in Computer cience and Networking High erformance Computing ystems and Enabling latforms Marco Vanneschi 4. hared Memory arallel Architectures 4.4. Multicore Architectures

More information

Programmable Forwarding Planes at Terabit/s Speeds

Programmable Forwarding Planes at Terabit/s Speeds Programmable Forwarding Planes at Terabit/s Speeds Patrick Bosshart HIEF TEHNOLOGY OFFIER, BREFOOT NETWORKS nd the entire Barefoot Networks team Hot hips 30, ugust 21, 2018 Barefoot Tofino : Domain Specific

More information

Keystone Architecture Inter-core Data Exchange

Keystone Architecture Inter-core Data Exchange Application Report Lit. Number November 2011 Keystone Architecture Inter-core Data Exchange Brighton Feng Vincent Han Communication Infrastructure ABSTRACT This application note introduces various methods

More information

Design principles in parser design

Design principles in parser design Design principles in parser design Glen Gibb Dept. of Electrical Engineering Advisor: Prof. Nick McKeown Header parsing? 2 Header parsing? Identify headers & extract fields A???? B???? C?? Field Field

More information

Power Technology For a Smarter Future

Power Technology For a Smarter Future 2011 IBM Power Systems Technical University October 10-14 Fontainebleau Miami Beach Miami, FL IBM Power Technology For a Smarter Future Jeffrey Stuecheli Power Processor Development Copyright IBM Corporation

More information

Lecture 16: Network Layer Overview, Internet Protocol

Lecture 16: Network Layer Overview, Internet Protocol Lecture 16: Network Layer Overview, Internet Protocol COMP 332, Spring 2018 Victoria Manfredi Acknowledgements: materials adapted from Computer Networking: A Top Down Approach 7 th edition: 1996-2016,

More information

Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin,

Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin, Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin, ydlin@cs.nctu.edu.tw Chapter 1: Introduction 1. How does Internet scale to billions of hosts? (Describe what structure

More information

Programmable NICs. Lecture 14, Computer Networks (198:552)

Programmable NICs. Lecture 14, Computer Networks (198:552) Programmable NICs Lecture 14, Computer Networks (198:552) Network Interface Cards (NICs) The physical interface between a machine and the wire Life of a transmitted packet Userspace application NIC Transport

More information

PacketExpert PDF Report Details

PacketExpert PDF Report Details PacketExpert PDF Report Details July 2013 GL Communications Inc. 818 West Diamond Avenue - Third Floor Gaithersburg, MD 20878 Phone: 301-670-4784 Fax: 301-670-9187 Web page: http://www.gl.com/ E-mail:

More information

TXS 10/100 Mbps and Gigabit Ethernet Load Modules

TXS 10/100 Mbps and Gigabit Ethernet Load Modules TXS 10/100 Mbps and Gigabit Ethernet Load Modules Ixia's TXS family of Ethernet load modules offer complete layer 2-7 network and application testing functionality in a single platform. Wire-speed layer

More information

Motivation CPUs can not keep pace with network

Motivation CPUs can not keep pace with network Deferred Segmentation For Wire-Speed Transmission of Large TCP Frames over Standard GbE Networks Bilic Hrvoye (Billy) Igor Chirashnya Yitzhak Birk Zorik Machulsky Technion - Israel Institute of technology

More information

Cubro Packetmaster EX32100

Cubro Packetmaster EX32100 Cubro Packetmaster EX32100 PRODUCT OVERVIEW Network Packet Broker (NPB) At a glance The Packetmaster EX32100 is a network packet broker and network controller switch that aggregates, filters and load balances

More information

Developing deterministic networking technology for railway applications using TTEthernet software-based end systems

Developing deterministic networking technology for railway applications using TTEthernet software-based end systems Developing deterministic networking technology for railway applications using TTEthernet software-based end systems Project n 100021 Astrit Ademaj, TTTech Computertechnik AG Outline GENESYS requirements

More information

LM1000STXR4 Gigabit Ethernet Load Module

LM1000STXR4 Gigabit Ethernet Load Module Gigabit Ethernet Load Module Gigabit Ethernet Load Module Ixia's Gigabit Ethernet Load Modules offer complete Layer 2-3 network and routing/bridging protocol testing functionality in a single platform.

More information

cs144 Midterm Review Fall 2010

cs144 Midterm Review Fall 2010 cs144 Midterm Review Fall 2010 Administrivia Lab 3 in flight. Due: Thursday, Oct 28 Midterm is this Thursday, Oct 21 (during class) Remember Grading Policy: - Exam grade = max (final, (final + midterm)/2)

More information

IsoStack Highly Efficient Network Processing on Dedicated Cores

IsoStack Highly Efficient Network Processing on Dedicated Cores IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single

More information

Topic & Scope. Content: The course gives

Topic & Scope. Content: The course gives Topic & Scope Content: The course gives an overview of network processor cards (architectures and use) an introduction of how to program Intel IXP network processors some ideas of how to use network processors

More information

CS 856 Latency in Communication Systems

CS 856 Latency in Communication Systems CS 856 Latency in Communication Systems Winter 2010 Latency Challenges CS 856, Winter 2010, Latency Challenges 1 Overview Sources of Latency low-level mechanisms services Application Requirements Latency

More information

End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet

End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet Hot Interconnects 2014 End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet Green Platform Research Laboratories, NEC, Japan J. Suzuki, Y. Hayashi, M. Kan, S. Miyakawa,

More information

Introduction to Routers and LAN Switches

Introduction to Routers and LAN Switches Introduction to Routers and LAN Switches Session 3048_05_2001_c1 2001, Cisco Systems, Inc. All rights reserved. 3 Prerequisites OSI Model Networking Fundamentals 3048_05_2001_c1 2001, Cisco Systems, Inc.

More information

GIGABIT ETHERNET XMVR LAN SERVICES MODULES

GIGABIT ETHERNET XMVR LAN SERVICES MODULES GIGABIT ETHERNET XMVR LAN SERVICES MODULES DATA SHEET Ixia's Gigabit Ethernet XMVR LAN Services Modules (LSMs) offer Layer 2-3 network testing functionality in a single test system. Each test port supports

More information

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2.

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2. Overview 1. Product description 2. Product features 1. Product description HPE Ethernet 10Gb 2-port 535FLR-T adapter 1 HPE Ethernet 10Gb 2-port 535T adapter The HPE Ethernet 10GBase-T 2-port 535 adapters

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision At-A-Glance Unified Computing Realized Today, IT organizations assemble their data center environments from individual components.

More information

An Introduction to the QorIQ Data Path Acceleration Architecture (DPAA) AN129

An Introduction to the QorIQ Data Path Acceleration Architecture (DPAA) AN129 July 14, 2009 An Introduction to the QorIQ Data Path Acceleration Architecture (DPAA) AN129 David Lapp Senior System Architect What is the Datapath Acceleration Architecture (DPAA)? The QorIQ DPAA is a

More information

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L09.1 Smith Spring 2008 MIPS

More information

High Performance Packet Processing with FlexNIC

High Performance Packet Processing with FlexNIC High Performance Packet Processing with FlexNIC Antoine Kaufmann, Naveen Kr. Sharma Thomas Anderson, Arvind Krishnamurthy University of Washington Simon Peter The University of Texas at Austin Ethernet

More information

Internet II. CS10 : Beauty and Joy of Computing. cs10.berkeley.edu. !!Senior Lecturer SOE Dan Garcia!!! Garcia UCB!

Internet II. CS10 : Beauty and Joy of Computing. cs10.berkeley.edu. !!Senior Lecturer SOE Dan Garcia!!!  Garcia UCB! cs10.berkeley.edu CS10 : Beauty and Joy of Computing Internet II!!Senior Lecturer SOE Dan Garcia!!!www.cs.berkeley.edu/~ddgarcia CS10 L17 Internet II (1)! Why Networks?! Originally sharing I/O devices

More information

Chapter 4 Network Layer: The Data Plane

Chapter 4 Network Layer: The Data Plane Chapter 4 Network Layer: The Data Plane Lu Su Assistant Professor Department of Computer Science and Engineering State University of New York at Buffalo Adapted from the slides of the book s authors Computer

More information

Ruler: High-Speed Packet Matching and Rewriting on Network Processors

Ruler: High-Speed Packet Matching and Rewriting on Network Processors Ruler: High-Speed Packet Matching and Rewriting on Network Processors Tomáš Hrubý Kees van Reeuwijk Herbert Bos Vrije Universiteit, Amsterdam World45 Ltd. ANCS 2007 Tomáš Hrubý (VU Amsterdam, World45)

More information

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State

More information

The Network Processor Revolution

The Network Processor Revolution The Network Processor Revolution Fast Pattern Matching and Routing at OC-48 David Kramer Senior Design/Architect Market Segments Optical Mux Optical Core DWDM Ring OC 192 to OC 768 Optical Mux Carrier

More information

On the cost of tunnel endpoint processing in overlay virtual networks

On the cost of tunnel endpoint processing in overlay virtual networks J. Weerasinghe; NVSDN2014, London; 8 th December 2014 On the cost of tunnel endpoint processing in overlay virtual networks J. Weerasinghe & F. Abel IBM Research Zurich Laboratory Outline Motivation Overlay

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline

More information

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures Programmable Logic Design Grzegorz Budzyń Lecture 15: Advanced hardware in FPGA structures Plan Introduction PowerPC block RocketIO Introduction Introduction The larger the logical chip, the more additional

More information

Lecture 7: Ethernet Hardware Addressing and Frame Format. Dr. Mohammed Hawa. Electrical Engineering Department, University of Jordan.

Lecture 7: Ethernet Hardware Addressing and Frame Format. Dr. Mohammed Hawa. Electrical Engineering Department, University of Jordan. Lecture 7: Ethernet Hardware Addressing and Frame Format Dr. Mohammed Hawa Electrical Engineering Department University of Jordan EE426: Communication Networks MAC Addresses The shared medium in a LAN

More information

Portable 2-Port Gigabit Wirespeed Streams Generator & Network TAP

Portable 2-Port Gigabit Wirespeed Streams Generator & Network TAP Portable 2-Port Gigabit Wirespeed Streams Generator & Network TAP NuDOG-301C OVERVIEW NuDOG-301C is a handheld device with two Gigabit ports for Ethernet testing. The main functions of NuDOG-301C include

More information

Scalable Multiprocessors

Scalable Multiprocessors arallel Computer Organization and Design : Lecture 7 er Stenström. 2008, Sally A. ckee 2009 Scalable ultiprocessors What is a scalable design? (7.1) Realizing programming models (7.2) Scalable communication

More information

Parallel Programming Platforms

Parallel Programming Platforms arallel rogramming latforms Ananth Grama Computing Research Institute and Department of Computer Sciences, urdue University ayg@cspurdueedu http://wwwcspurdueedu/people/ayg Reference: Introduction to arallel

More information

1G Bit TCP+UDP Offload Engine (TOE+UOE) Hardware IP Core

1G Bit TCP+UDP Offload Engine (TOE+UOE) Hardware IP Core Intilop Corporation 4800 Great America Pkwy Ste-231 Santa Clara, CA 95054 Ph: 408-496-0333 Fax:408-496-0444 www.intilop.com 1G bit TCP+UDP Offload Engine MAC + Host_IF (Same PHY Port) INT 2511 (Ultra-Low

More information

Impact of Cache Coherence Protocols on the Processing of Network Traffic

Impact of Cache Coherence Protocols on the Processing of Network Traffic Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background

More information

Hardware and Software Architecture. Chapter 2

Hardware and Software Architecture. Chapter 2 Hardware and Software Architecture Chapter 2 1 Basic Components The x86 processor communicates with main memory and I/O devices via buses Data bus for transferring data Address bus for the address of a

More information

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects

More information

NIC-PCIE-4RJ45-PLU PCI Express x4 Quad Port Copper Gigabit Server Adapter (Intel I350 Based)

NIC-PCIE-4RJ45-PLU PCI Express x4 Quad Port Copper Gigabit Server Adapter (Intel I350 Based) NIC-PCIE-4RJ45-PLU PCI Express x4 Quad Port Copper Gigabit Server Adapter (Intel I350 Based) Quad-port Gigabit Ethernet server adapters designed with performance enhancing features and new power management

More information

Packet Classification and Termination in a Protocol Processor

Packet Classification and Termination in a Protocol Processor Packet Classification and Termination in a Protocol Processor Ulf Nordqvist and Dake Liu Department of Electrical Engineering Linköping University, SE-581 83 Linköping, Sweden Phone: +46-13-28-{2903, 1256}

More information

Xen Network I/O Performance Analysis and Opportunities for Improvement

Xen Network I/O Performance Analysis and Opportunities for Improvement Xen Network I/O Performance Analysis and Opportunities for Improvement J. Renato Santos G. (John) Janakiraman Yoshio Turner HP Labs Xen Summit April 17-18, 27 23 Hewlett-Packard Development Company, L.P.

More information

INT Gbit Ethernet MAC Engine

INT Gbit Ethernet MAC Engine intelop INT-10000 10-Gbit Ethernet MAC Product Brief, features and benefits summary Highly customizable hardware IP block. Easily portable to Structured ASIC flow, Custom ASICs/SoCs, Xilinx FPGAs INT-10000

More information

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture 2012 MELLANOX TECHNOLOGIES 1 SwitchX - Virtual Protocol Interconnect Solutions Server / Compute Switch / Gateway Virtual Protocol Interconnect

More information

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007 EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides

More information

Netronome NFP: Theory of Operation

Netronome NFP: Theory of Operation WHITE PAPER Netronome NFP: Theory of Operation TO ACHIEVE PERFORMANCE GOALS, A MULTI-CORE PROCESSOR NEEDS AN EFFICIENT DATA MOVEMENT ARCHITECTURE. CONTENTS 1. INTRODUCTION...1 2. ARCHITECTURE OVERVIEW...2

More information

Lecture 3. The Network Layer (cont d) Network Layer 1-1

Lecture 3. The Network Layer (cont d) Network Layer 1-1 Lecture 3 The Network Layer (cont d) Network Layer 1-1 Agenda The Network Layer (cont d) What is inside a router? Internet Protocol (IP) IPv4 fragmentation and addressing IP Address Classes and Subnets

More information

SEN366 (SEN374) (Introduction to) Computer Networks

SEN366 (SEN374) (Introduction to) Computer Networks SEN366 (SEN374) (Introduction to) Computer Networks Prof. Dr. Hasan Hüseyin BALIK (12 th Week) The Internet Protocol 12.Outline Principles of Internetworking Internet Protocol Operation Internet Protocol

More information

vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008

vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008 vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008 Virtual Datacenter OS from VMware Infrastructure vservices and Cloud vservices Existing New - roadmap Virtual Datacenter OS from VMware Agenda

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Understanding Basic 802.1ah Provider Backbone Bridge

Understanding Basic 802.1ah Provider Backbone Bridge Understanding Basic 802.1ah Provider Backbone Bridge Contents Introduction Prerequisites Requirements Components Used IEEE 802.1ah Provider Backbone Bridging Overview Terminologies Used PBB Components

More information

Disruptive Innovation in ethernet switching

Disruptive Innovation in ethernet switching Disruptive Innovation in ethernet switching Lincoln Dale Principal Engineer, Arista Networks ltd@aristanetworks.com AusNOG 2012 Ethernet switches have had a pretty boring existence. The odd speed increase

More information

Part IV: 3D WiNoC Architectures

Part IV: 3D WiNoC Architectures Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges, and Recent Developments Part IV: 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan 1 Outline: 3D WiNoC Architectures

More information

The Interconnection Structure of. The Internet. EECC694 - Shaaban

The Interconnection Structure of. The Internet. EECC694 - Shaaban The Internet Evolved from the ARPANET (the Advanced Research Projects Agency Network), a project funded by The U.S. Department of Defense (DOD) in 1969. ARPANET's purpose was to provide the U.S. Defense

More information

FPGA Augmented ASICs: The Time Has Come

FPGA Augmented ASICs: The Time Has Come FPGA Augmented ASICs: The Time Has Come David Riddoch Steve Pope Copyright 2012 Solarflare Communications, Inc. All Rights Reserved. Hardware acceleration is Niche (With the obvious exception of graphics

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

PE310G4TSF4I71 Quad Port SFP+ 10 Gigabit Ethernet PCI Express Time Stamp Server Adapter Intel Based

PE310G4TSF4I71 Quad Port SFP+ 10 Gigabit Ethernet PCI Express Time Stamp Server Adapter Intel Based PE310G4TSF4I71 Quad Port SFP+ 10 Gigabit Ethernet PCI Express Time Stamp Server Adapter Intel Based Product Description Silicom s 40 Gigabit Ethernet PCI Express Time Stamping server adapter is designed

More information

Introduction to Infiniband

Introduction to Infiniband Introduction to Infiniband FRNOG 22, April 4 th 2014 Yael Shenhav, Sr. Director of EMEA, APAC FAE, Application Engineering The InfiniBand Architecture Industry standard defined by the InfiniBand Trade

More information

UDP1G-IP reference design manual

UDP1G-IP reference design manual UDP1G-IP reference design manual Rev1.1 14-Aug-18 1 Introduction Comparing to TCP, UDP provides a procedure to send messages with a minimum of protocol mechanism, but the data cannot guarantee to arrive

More information

Last Lecture: Network Layer

Last Lecture: Network Layer Last Lecture: Network Layer 1. Design goals and issues 2. Basic Routing Algorithms & Protocols 3. Addressing, Fragmentation and reassembly 4. Internet Routing Protocols and Inter-networking 5. Router design

More information

Chapter 4 Network Layer: The Data Plane. Part A. Computer Networking: A Top Down Approach

Chapter 4 Network Layer: The Data Plane. Part A. Computer Networking: A Top Down Approach Chapter 4 Network Layer: The Data Plane Part A All material copyright 996-06 J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down Approach 7 th Edition, Global Edition Jim Kurose,

More information

Chapter 5 Reading Organizer After completion of this chapter, you should be able to:

Chapter 5 Reading Organizer After completion of this chapter, you should be able to: Chapter 5 Reading Organizer After completion of this chapter, you should be able to: Describe the operation of the Ethernet sublayers. Identify the major fields of the Ethernet frame. Describe the purpose

More information