FPGAs and Networking

Similar documents
Lighting the Blue Touchpaper for UK e-science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK March, 2007

inettest a 10 Gigabit Ethernet Test Unit

DTN End Host performance and tuning

Commissioning and Using the 4 Gigabit Lightpath from Onsala to Jodrell Bank

QuickSpecs. HP Z 10GbE Dual Port Module. Models

An FPGA-Based Optical IOH Architecture for Embedded System

ARISTA: Improving Application Performance While Reducing Complexity

NetFPGA Hardware Architecture

Optimizing the GigE transfer What follows comes from company Pleora.

6.9. Communicating to the Outside World: Cluster Networking

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

Investigation of the Performance of 100Mbit and Gigabit Ethernet Components Using Raw Ethernet Frames

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

10GE network tests with UDP. Janusz Szuba European XFEL

The CMS Event Builder

Multi-Gigabit Transceivers Getting Started with Xilinx s Rocket I/Os

46PaQ. Dimitris Miras, Saleem Bhatti, Peter Kirstein Networks Research Group Computer Science UCL. 46PaQ AHM 2005 UKLIGHT Workshop, 19 Sep

PCIe 10G 5-Speed. Multi-Gigabit Network Card

End-user systems: NICs, MotherBoards, Disks, TCP Stacks & Applications

Optimizing Performance: Intel Network Adapters User Guide

Altos R320 F3 Specifications. Product overview. Product views. Internal view

Myricom s Myri-10G Software Support for Network Direct (MS MPI) on either 10-Gigabit Ethernet or 10-Gigabit Myrinet

5-Speed NBASE-T Network. Controller Card

OneCore Storage SDK 5.0

Evaluation of the LDC Computing Platform for Point 2

M100 GigE Series. Multi-Camera Vision Controller. Easy cabling with PoE. Multiple inspections available thanks to 6 GigE Vision ports and 4 USB3 ports

PCIe 10G SFP+ Network Card

Product Overview. Programmable Network Cards Network Appliances FPGA IP Cores

FELIX the new detector readout system for the ATLAS experiment

M100 GigE Series. Multi-Camera Vision Controller. Easy cabling with PoE. Multiple inspections available thanks to 6 GigE Vision ports and 4 USB3 ports

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

An Intelligent NIC Design Xin Song

IBM POWER8 100 GigE Adapter Best Practices

Improving Packet Processing Performance of a Memory- Bounded Application

Advanced Computer Networks. End Host Optimization

Computer buses and interfaces

1 Port PCI Express 10 Gigabit Ethernet Network Card - PCIe x4 10Gb NIC

FELI. : the detector readout upgrade of the ATLAS experiment. Soo Ryu. Argonne National Laboratory, (on behalf of the FELIX group)

The new detector readout system for the ATLAS experiment

Open Source Traffic Analyzer

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)

10 Gbit/s Challenge inside the Openlab framework

2-Port PCI Express 10GBase-T Ethernet Network Card - with Intel X540 Chip

10Gbps TCP/IP streams from the FPGA

The NE010 iwarp Adapter

Achieving 98Gbps of Crosscountry TCP traffic using 2.5 hosts, 10 x 10G NICs, and 10 TCP streams

OE2G2I35 Dual Port Copper Gigabit Ethernet OCP Mezzanine Adapter Intel I350BT2 Based

Acer AR320 F1 specifications

Acer AC100 Server Specifications

Video capture using GigE Vision with MIL. What is GigE Vision

A-GEAR 10Gigabit Ethernet Server Adapter X520 2xSFP+

Altos T310 F3 Specifications

5051 & 5052 PCIe Card Overview

JMR ELECTRONICS INC. WHITE PAPER

Avoid Bottlenecks Using PCI Express-Based Embedded Systems

Network Design Considerations for Grid Computing

Schematic. A: Overview of the Integrated Detector Readout Electronics and DAQ-System. optical Gbit link. 1GB DDR Ram.

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

Hardware NVMe implementation on cache and storage systems

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

Implementing Ultra Low Latency Data Center Services with Programmable Logic

FPGA Implementation of RDMA-Based Data Acquisition System Over 100 GbE

Acer AR320 F2 Specifications

Solving the Data Transfer Bottleneck in Digitizers

Myri-10G Myrinet Converges with Ethernet

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

ROM Status Update. U. Marconi, INFN Bologna

Internet data transfer record between CERN and California. Sylvain Ravot (Caltech) Paolo Moroni (CERN)

Announcements Homework: Next Week: Research Paper:

Acer AT310 F2 Specifications

Cisco Ultra Packet Core High Performance AND Features. Aeneas Dodd-Noble, Principal Engineer Daniel Walton, Director of Engineering October 18, 2018

Ethernet transport protocols for FPGA

LANCOM Techpaper IEEE n Indoor Performance

Ronald van der Pol

nforce 680i and 680a

Compute Node Design for DAQ and Trigger Subsystem in Giessen. Justus Liebig University in Giessen

Interrupt Swizzling Solution for Intel 5000 Chipset Series based Platforms

fit-pc Intense 2 Overview

The Myricom ARC Series with DBL

D1.1 Server Scalibility

The Myricom ARC Series of Network Adapters with DBL

Machine Vision Camera Interfaces. Korean Vision Show April 2012

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

No More Waiting Around

PC BASED REAL TIME DATA EXCHANGE ON 10GbE OPTICAL NETWORK USING RTOS*

I/O Channels. RAM size. Chipsets. Cluster Computing Paul A. Farrell 9/8/2011. Memory (RAM) Dept of Computer Science Kent State University 1

OpenFlow Software Switch & Intel DPDK. performance analysis

Fast-communication PC Clusters at DESY Peter Wegner DV Zeuthen

eslim SV Dual and Quad-Core Xeon Server Dual and Quad-Core Server Computing Leader!! ESLIM KOREA INC.

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook)

Diamond Networks/Computing. Nick Rees January 2011

PE2G6SFPI35 Six Port SFP Gigabit Ethernet PCI Express Server Adapter Intel i350am4 Based

PC DESY Peter Wegner. PC Cluster Definition 1

In-chip and Inter-chip Interconnections and data transportations for Future MPAR Digital Receiving System

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc.

PLUSOPTIC NIC-PCIE-2SFP+-V2-PLU

GRVI Phalanx. A Massively Parallel RISC-V FPGA Accelerator Accelerator. Jan Gray

Transcription:

FPGAs and Networking Marc Kelly & Richard Hughes-Jones University of Manchester 12th July 27 1

Overview of Work Looking into the usage of FPGA's to directly connect to Ethernet for DAQ readout purposes. Testing both 1 and 1 Gig systems. Evaluating the new generation of PCIExpress 1Gig Ethernet cards. Bringing it all together to form a test system. 12th July 27 2

Network Virtex4 Test Board 12th July 27 3

1 Gig FPGA Work Implemented a MAC + PHY layer inside Xilinx Virtex4 FPGA. Demonstrated working Ethernet between FPGA and remote hosts, however the learning curve is steep, issues with the Xilinx CoreGen design. Adaptable testing design with Network and User FPGA logic separated Once working it has proved reliable, the Network code as survived many re-spins of the design without failing. 12th July 27 4

Overview of Design User block Network block 12th July 27 5

Receiver Frame Jitter and Packet Loss 12 us (line speed) 9 Frame Jitter 12 us fpga man2_7mar7 8 1 7 1 12 us fpga man2_7mar7 13 us fpga man2_7mar7 5 4 5 4 3 N(n) 1 6 1 1 2 1 1 2 3 Frame spacing us 4 3 2 1 1 1 5 1 2 3 4 Frame spacing us 5 1 2 3 4 5 6 7 Num Packets Lost 8 9 25 us frame spacing Peak separation 4 5 us no coalescence 9 8 7 6 5 25 us fpga man2_7mar7 25 us fpga man2_7mar7 1 1 1 4 3 2 1 1 1 1 1 5 1 Frame Spacing us 15 1.2 1.8.6.4.2 25 us fpga man2_7mar7 Packet loss 1 2 3 4 Frame spacing us 5 1 2 3 4 5 6 7 8 9 Num Packets lost Plots by Richard, stolen from one of his talks. 12th July 27 6

1 Gig FPGA Work FPGAs can drive Ethernet. It is easy once configuration hurdle is passed. More testing under way. Request response style operations to pull data out FPGA to simulate an event building scenario. Planned Upgrade to 1Gig Ethernet. Do all tests at 1Gig. Perform some initial testing to try to determine the stability of the RocketIO TX/RX latency. 12th July 27 7

1Gig Ethernet Work Richard has nice new 1Gig PCI Express cards made my Myricom. 8xLane design, in theory that has more than enough bandwidth to deal with 1Gig link. Using loaned high end servers as host machines. Performing standard network testing operations using his normal tools. Aim is to determine the suitability of 1Gig systems. 12th July 27 8

High-end Server PCs Boston/Supermicro X7DBE Two Dual Core Intel Xeon Woodcrest 513 2 GHz Independent 1.33GHz FSBuses 53 MHz FD Memory (serial) Parallel access to 4 banks Chipsets: Intel 5P MCH PCIe & Memory ESB2 PCI-X GE etc. PCI 3 8 lane PCIe buses 3* 133 MHz PCI-X 2 Gigabit Ethernet SATA 12th July 27 9

Notice rate for 8972 byte packet ~.2% packet loss in 1M packets in receiving host Recv Wire rate Mbit/s gig6 5_myri1GE 1 9 8 7 6 5 4 3 2 1 1 bytes 1472 bytes 2 bytes 3 bytes 4 bytes 5 bytes 6 bytes 7 bytes 8 bytes 8972 bytes Sending host, 3 CPUs idle For <8 µs packets, 1 CPU is >9% in kernel mode inc ~1% soft int 1 2 Spacing between frames us 1 %cpu1 kernel snd 1 GigE Back2Back: UDP Throughput Kernel 2.6.2-web1_pktd-plus Myricom 1G-PCIE-8A-R Fibre rx-usecs=25 Coalescence ON MTU 9 bytes Max throughput 9.4 Gbit/s Receiving host 3 CPUs idle For <8 µs packets, 1 CPU is 7-8% in kernel mode inc ~15% soft int 4 1 bytes gig6 5_myri1GE 1472 bytes 8 2 bytes 6 3 bytes 4 4 bytes C 2 5 bytes 6 bytes 7 bytes 5 1 15 2 25 Spacing between frames us 3 35 4 8 bytes 8972 bytes 1 bytes gig6 5_myri1GE 1 8 6 4 2 1472 bytes 2 bytes 3 bytes 4 bytes 5 bytes 6 bytes By Richard, stolen from one of his talks. 12th July 27 3 % cpu1 kernel rec 1 2 Spacing between frames us 3 7 bytes 4 8 bytes 8972 bytes 1

1 GigE Back2Back: UDP Latency 5 y =.28x + 21.937 4 Latency 22 µs & very well behaved Latency Slope.28 µs/byte B2B Expect:.268 µs/byte Mem.4 PCI-e.54 1GigE.8 PCI-e.54 Mem.4 12 3 2 1 1 2 3 4 5 6 7 8 1 Histogram FWHM ~1 2 us 12 64 bytes gig6 5 6 3 bytes gig6 5 1 1 5 8 8 4 6 6 4 2 2 2 1 2 4 6 8 Latency us 2 4 6 Latency us 8 89 bytes gig6 5 3 4 9 Message length bytes 6 gig6 5_Myri1GE_rxcoal= Latency us Motherboard: Supermicro X7DBE Chipset: Intel 5P MCH CPU: 2 Dual Intel Xeon 513 2 GHz with 496k L2 cache Mem bus: 2 independent 1.33 GHz PCI-e 8 lane Linux Kernel 2.6.2-web1_pktd-plus Myricom NIC 1G-PCIE-8A-R Fibre myri1ge v1.2. + firmware v1.4.1 rx-usecs= Coalescence OFF MSI=1 Checksums ON tx_boundary=496 MTU 9 bytes 2 4 6 Latency us By Richard, stolen from one of his talks. 12th July 27 11 8

B2B UDP with memory access Achievable UDP Throughput mean 9.39 Gb/s sigma 16 mean 9.21 Gb/s sigma 37 mean 9.2 sigma 3 gig6 5_udpmon_membw 96 Recv Wire rate Mbit/s Send UDP traffic B2B with 1GE On receiver run independent memory write task L2 Cache 496 k Byte Write 8k Byte blocks in loop 1% user mode 95 UDP UDP+cpu1 UDP+cpu3 94 93 92 91 9 Packet loss 1 2 3 Trial number 4 5 gig6 5_udpmon_membw 3.5 % Packet loss mean.4% mean 1.4 % mean 1.8 % CPU load: 6 7 UDP UDP+cpu1 UDP+cpu3 3 2.5 2 1.5 1.5 Cpu Cpu1 Cpu2 Cpu3 : 6.% us, 74.7% :.% us,.% :.% us,.% : 1.% us,.% sy, sy, sy, sy,.%.%.%.% ni,.3% id, ni, 1.% id, ni, 1.% id, ni,.% id, By Richard, stolen from one of his talks. 12th July 27.%.%.%.% 1 2 3 Trial number wa, wa, wa, wa, 1.3%.%.%.% 4 5 hi, 17.7% si, hi,.% si, hi,.% si, hi,.% si, 6.%.%.%.% 7 st st st st 12

1Gig Ethernet Work New generation of servers are capable of supporting 1Gig Ethernet, doing real work and NOT being overloaded. New generation of Cards are very capable of supporting 1Gig Ethernet. Things are however Chipset/Server design dependant. Have to make sure the architecture is sound. High bandwidth, low contention designs needed. Need modern host OS, latest drivers etc. 12th July 27 13