Entry Stage for the CBM First-level Event Selector

Size: px
Start display at page:

Download "Entry Stage for the CBM First-level Event Selector"

Transcription

1 Entry Stage for the CBM First-level Event Selector Dirk Hutter for the CBM Collaboration FIAS Frankfurt Institute for Advanced Studies Goethe-Universität Frankfurt am Main, Germany CBM CHEP 2018 Conference in Sofia, Bulgaria

2 The CBM Experiment at FAIR FAIR CBM (Electron-Hadron Setup) RICH TRD STS Fixed target heavy ion experiment at FAIR Physics goal: exploration of the QCD phase diagram Extreme reaction rates of up to 10 MHz and track densities up to 1000 tracks in aperture Conventional trigger architecture not feasible Full online event reconstruction needed Self-triggering free-streaming readout electronics Darmstadt, Germany MVD Dipole magnet MUCH TOF PSD ECAL Event selection exclusively done in FLES HPC cluster!2

3 First-level Event Selector (FLES) FLES is designed as an HPC cluster Commodity PC hardware First phase: full input connectivity, but limited processing and networking Green IT Cube FPGA-based custom PCIe input interface Total input data rate > 1 TB/s e c n a t s i d h r t a g e n n e i l l e l m b 0 a 5 c 3 ~ m 0 ~70 Timeslice building via InfiniBand network Combines the data from all input links to self-contained overlapping processing intervals and distributes them to compute nodes CBM Service Building RDMA data transfer very convenient for timeslice building Located in the Green IT Cube data center Cost-efficient infrastructure sharing Maximum CBM online computing power only needed in a fraction of time combine and share computing resources Consequences Transmit 10 TBit/s over 700 m distance Needs single-mode optics Increased link latency!3

4 Initial CBM DAQ/FLES Architecture on detector counting room green cube FLES FEE GBT DPB custom opt. Link FLIB PCIe entry InfiniBand ~700m node FLES proc. node single FLES cluster Single flat cluster design Two FPGA-based stages Data Processing Board (DPB): aggregation and preprocessing FLES Interface Board (FLIB): data preparation and interface to standard COTS hardware Long-range connection to Green Cube via custom optical links Revisited design in context of CBM modularized start version and usage of GBTx frontends Consider future upgradability Maximize use of standard hardware!4

5 Optimized DAQ/FLES Architecture on detector counting room green cube FLES FEE GBT DPB custom opt. Link FLIB PCIe entry InfiniBand ~700m node FLES proc. node Interface to standard hardware via PCIe single FLES cluster Long-range InfiniBand FEE GBT CRI PCIe FLES entry node long range InfiniBand ~700m entry cluster Small specialized cluster Combine DPB and FLIB to single FPGA board, similar to ATLAS, LHCb and ALICE Split computing into 2 dedicated clusters Only small entry cluster holds custom hardware Relaxed requirements on processing cluster design FLES proc. node compute cluster Long-range connection to Green Cube via standard network equipment (e.g., long-range InfiniBand)!5

6 FLES Entry Stage Design CRI prototype candidates: Input from detector systems 4800 GBT GBit/s (21,5 TBit/s total bandwith) < 50% link occupancy 10 Tbit/s peak 100 CRI with 48 links and PCIe 4.0 x8 (3.0 x16) 100 GBit/s card Output to processing cluster FLX-712 No significant data reduction in entry cluster possible Averaging in spill intensity variations in entry stage can smooth data rate HTG-Z920 Averaging machine duty cycle foreseen in processing cluster Connectivity to ~ 600 processing nodes Absolut worst case 10 Tbit/s peak Current entry node prototype: ASUS ESC8000 Possible entry node configuration 2 CRI + 2 HCAs per node feasible 50 Gbit/s or 100

7 Reminder: InfiniBand Networks High-throughput, low latency switched fabric network Widely used in HPC applications RDMA semantics perfectly suitable for FLES timeslice building Lossless fabric, i.e. packets are not routinely dropped Credit based flow control on link-level Each (logical) link supplies credit to the sending device denoting the available buffer space Quality of Service features Virtual lanes (VL) are separate logical communication links that share a single physical link Up to 15 virtual lanes (VL) and one management lane per link fat-tree Clos network THE SOLUTION: MELLANOX SOCKET DIRECT Mellanox Socket Direct is a unique form-fact as two PCIe cards, wherein the PCIe lanes are A key benefit that this adapter card brings to m eliminating the network traffic traversing over sockets, significantly reducing overhead and la Figure 2 shows a photograph of the Socket Dir effectively integrates a single network adapter together with an auxiliary board and a harness Next generation: HDR (200 Gbit/s) 40 port switches, e.g. QM87xx series Allows port splitting HDR100, 80 ports per switch Figure 2. Socket-Direct Adapter (Front!7

8 Long-haul connections Challenge: bridge 700m between entry and processing cluster Buffer = Bandwidth Time round trip Physical communication Typical intra data center techniques reach m Use medium or long range single-mode optics, e.g. CWDM4 up to 2km Protocol All reliable communication protocols suffer from high bandwidth-delay products Higher delays account for more buffers or lower bandwidth Mellanox InfiniBand switches allow to switch off VLs Collapses buffers form unused VLs into larger buffer Idea endorsed by Mellanox To be tested in real lab setup Bandwidth (Gbit/s) Theory: 100 GBit/s link, 0,75 µs allowed delay 1 x RTT (µs)!8

9 Long-haul Connection Test Stand QSFP Transceiver Box 8-fach ribbon 400m 8-fach ribbon 800m Testing requires connection with given delay Box with 8x 400m + 8x 800m single mode fiber G.657.A1 bend-insensitive fiber to allow compact size Each box is sufficient for: 1 PSM4 connection 4 LR4 / CWDM4 connections using MPO-LC breakouts Allows measurement in 400m steps e.g. 400m, 800m, 1200m until patching exceeds loss budget, ~3,5 db First lab tests with InfiniBand EDR very promising!9

10 Possible FLES Network 18 (2) 18 (2) used switch ports p hs num. of cables 18 m ) e n c a e - 10,8 Tbit/s inter cluster - 1,2 Tbit/s local - Up to 120 Gb/s - Up to 612 Gb/s ~1/6 blk switches, cables fibers < 600m OS2 local spine (6 fro i 10,8T 10,8T total bandwidth ,8T+1,2T 12T 40x HDR100 Fat-tree like network 6 34 x9 10,8T 61,2T 68x HDR EN nodes HDR PN nodes HDR100 Sub-clusters share a common spine Compute cluster can have 1/6 blocking ratio Timeslices are built across the long range connection Local spine for minimal connectivity w/o compute cluster!10

11 Network with CBM-local Timeslice Building could be routers T 20T fibers 5-20 Tbit/s local (shared) - 12 Tbis/s inter cluster - Up to Gb/s in switches, cables < 600m OS2 20T 20T x HDR 100 EN nodes HDR Build timeslices to EN memory send fully built timeslices to GC Additionally needed network resources feasible with HDR Timeslice building decoupled from GC network More control over critical building process, no long link delay Independent of GC network architecture, flexible against changes Better scaling to more PNs Infiniband routers additionally allow to separate subnet control!11

12 Thanks for your attention Dirk Hutter CBM!12

13 Local TS Building + Application Level Routing 30 (10) 120 EN nodes 2x HDR T 12T x HDR100 - Application level routing - 12 Tbit/s local - 12 Tbis/s inter cluster - Up to Gb/s in switches, cables 12T 12T T 12T fibers < 600m OS2 Use separate HCA (port) instead of TS building network to send timeslices to GC Full separation between networks Mix of different network technologies possible Less flexible in terms of link balancing to GC!13

14 FLES Data Management Framework RDMA-based timeslice building (flesnet) Works in close conjunction with FLIB hardware design Paradigms: Do not copy data in memory Maximize throughput Based on microslices, configurable overlap Delivers fully built timeslice to reconstruction code 10 GBit/s custom optical link FEE FLIM Direct DMA to InfiniBand send buffers Shared memory interface FLIB Server Device Driver FLIB SHM HCA Prototype implementation available C++, Boost, IB verbs Measured flesnet timeslice building (8+8 nodes, including ring buffer synchronization, overlapping timeslices): ~5 GByte/s throughput per node Timeslice building InfiniBand RDMA, true zero-copy TS- Building IN IB Verbs TS- Building CN IB Verbs HCA SHM Indexed access to timeslice data Reco/ Ana Prototype software successfully used in several CBM beam tests!14

15 RDMA Timeslice Building Dual Ring Buffer in Input Node Dual Ring Buffer in Compute Node (per Input Link) TS 1 TS 0 TS 0 TS 1000 FLIB DATA CONTENT MC 0 CONTENT MC 1 CONTENT MC 99 CONTENT MC 100 CONTENT MC Example: 2 GB ~200k microslices ~2k timeslices DATA DESC MC 0 DESC MC 1... DESC MC 99 DESC MC 100 CONTENT MC 0 CONTENT MC 1... CONTENT MC 99 CONTENT MC 100 DESC MC DESC MC Example: 10 MB ~1k microslices ~10 timeslices Analysis ADDR DESC MC 0 DESC MC 1 DESC MC 99 DESC MC 100 DESC MC Example: 8 MB ~1M microslices ~10k timeslices DESC DESC TS 0 DESC TS Example: 1 kb ~80 timeslices Two pairs of ring buffers for each input link Second buffer: index table to variable-sized data in first buffer Copy contiguous block of microslices via RDMA (exception: borders) Lazy update of buffer status between nodes, reduce transaction rate!15

16 FLES Input Data Path Dual Ring Buffer in Shared Memory... CONTENT MC 1 FLES Input Module HDR MC 1 CONTENT MC 0 HDR MC 0 MC handling DESC MC 1 DESC MC 0 CONTENT MC 1 CONTENT MC 0 DMA channel... per channel DMA Mux PCIe DATA ADDR CONTENT MC 0 DESC MC 0 DESC MC 1 CONTENT MC 1 TS 0 DESC MC 99 CONTENT MC 99 CONTENT MC 100 CONTENT MC DESC MC 100 DESC MC TS 1 Example: 8 MB ~1M microslices ~10k timeslices Example: 2 GB ~200k microslices ~2k timeslices Microslice Timeslice substructure Constant in experiment time Allow overlapping timeslices Full offload DMA engine Transmit microslices via PCIe/DMA directly to userspace buffers Buffer placed in Posix shared memory, can be registered in parallel for InfiniBand RDMA Pair of ring buffers for each link Data buffer for microslice data content Descriptor buffer for index table and microslice meta data!16

17 Interface to Online Reconstruction Code Timeslice Component 0 Component 1 Time DESC MC 0 CONTENT MC 0 DESC MC 0 CONTENT MC 0 DESC MC 1 DESC MC 1 CONTENT MC 1 CONTENT MC DESC MC 99 DESC MC 99 CONTENT MC 99 CONTENT MC 99 DESC MC 100 DESC MC 100 CONTENT MC 100 CONTENT MC 100 DESC MC 101 CONTENT MC 101 DESC MC 102 CONTENT MC 102 Basic idea: For each timeslice, an instance of the reconstruction code......is given direct indexed access to all corresponding data...uses detector-specific code to understand the contents of the microslices...applies adjustments (fine calibration) to detector time stamps if necessary...finds, reconstructs and analyzes the contained events... Core region Overlap region Timeslice data management concept Timeslice Two-dimensional indexed access to microslices Overlap according to detector time precision Interface to online reconstruction software Timeslice is self-contained Calibration and configuration data distributed to all nodes No network communication required during reconstruction and analysis!17

Improving Packet Processing Performance of a Memory- Bounded Application

Improving Packet Processing Performance of a Memory- Bounded Application Improving Packet Processing Performance of a Memory- Bounded Application Jörn Schumacher CERN / University of Paderborn, Germany jorn.schumacher@cern.ch On behalf of the ATLAS FELIX Developer Team LHCb

More information

Introduction to High-Speed InfiniBand Interconnect

Introduction to High-Speed InfiniBand Interconnect Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output

More information

FELI. : the detector readout upgrade of the ATLAS experiment. Soo Ryu. Argonne National Laboratory, (on behalf of the FELIX group)

FELI. : the detector readout upgrade of the ATLAS experiment. Soo Ryu. Argonne National Laboratory, (on behalf of the FELIX group) LI : the detector readout upgrade of the ATLAS experiment Soo Ryu Argonne National Laboratory, sryu@anl.gov (on behalf of the LIX group) LIX group John Anderson, Soo Ryu, Jinlong Zhang Hucheng Chen, Kai

More information

The new detector readout system for the ATLAS experiment

The new detector readout system for the ATLAS experiment LInk exange The new detector readout system for the ATLAS experiment Soo Ryu Argonne National Laboratory On behalf of the ATLAS Collaboration ATLAS DAQ for LHC Run2 (2015-2018) 40MHz L1 trigger 100kHz

More information

Selection of hardware platform for CBM Common Readout Interface

Selection of hardware platform for CBM Common Readout Interface Selection of hardware platform for CBM Common Readout Interface W.M.Zabołotnya, G.H.Kasprowicza, A.P.Byszuka, D.Emschermannb, M.Gumińskia, K.T.Poźniaka, R.Romaniuka Institute of Electronic Systems Warsaw

More information

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric

More information

FELIX the new detector readout system for the ATLAS experiment

FELIX the new detector readout system for the ATLAS experiment FrontEnd LInk exchange LIX the new detector readout system for the ATLAS experiment Julia Narevicius Weizmann Institute of Science on behalf of the ATLAS Collaboration Introduction to ATLAS readout: today

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

A first look at 100 Gbps LAN technologies, with an emphasis on future DAQ applications.

A first look at 100 Gbps LAN technologies, with an emphasis on future DAQ applications. 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP21) IOP Publishing Journal of Physics: Conference Series 664 (21) 23 doi:1.188/1742-696/664//23 A first look at 1 Gbps

More information

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ Networking for Data Acquisition Systems Fabrice Le Goff - 14/02/2018 - ISOTDAQ Outline Generalities The OSI Model Ethernet and Local Area Networks IP and Routing TCP, UDP and Transport Efficiency Networking

More information

arxiv: v1 [physics.ins-det] 16 Oct 2017

arxiv: v1 [physics.ins-det] 16 Oct 2017 arxiv:1710.05607v1 [physics.ins-det] 16 Oct 2017 The ALICE O 2 common driver for the C-RORC and CRU read-out cards Boeschoten P and Costa F for the ALICE collaboration E-mail: pascal.boeschoten@cern.ch,

More information

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD. OceanStor 9000 Issue V1.01 Date 2014-03-29 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be reproduced or transmitted in

More information

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

Future Routing Schemes in Petascale clusters

Future Routing Schemes in Petascale clusters Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract

More information

Optimizing LS-DYNA Productivity in Cluster Environments

Optimizing LS-DYNA Productivity in Cluster Environments 10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce

More information

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G 10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G Mohammad J. Rashti and Ahmad Afsahi Queen s University Kingston, ON, Canada 2007 Workshop on Communication Architectures

More information

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011 The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

Arista 7500 Series Interface Flexibility

Arista 7500 Series Interface Flexibility Arista 7500 Series Interface Flexibility Today s large-scale virtualized datacenters and cloud networks require a mix of 10Gb, 25Gb, 40Gb, 50Gb and 100Gb Ethernet interface speeds able to utilize the widest

More information

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture 2012 MELLANOX TECHNOLOGIES 1 SwitchX - Virtual Protocol Interconnect Solutions Server / Compute Switch / Gateway Virtual Protocol Interconnect

More information

Storage System. David Southwell, Ph.D President & CEO Obsidian Strategics Inc. BB:(+1)

Storage System. David Southwell, Ph.D President & CEO Obsidian Strategics Inc. BB:(+1) Storage InfiniBand Area Networks System David Southwell, Ph.D President & CEO Obsidian Strategics Inc. BB:(+1) 780.964.3283 dsouthwell@obsidianstrategics.com Agenda System Area Networks and Storage Pertinent

More information

Extending InfiniBand Globally

Extending InfiniBand Globally Extending InfiniBand Globally Eric Dube (eric@baymicrosystems.com) com) Senior Product Manager of Systems November 2010 Bay Microsystems Overview About Bay Founded in 2000 to provide high performance networking

More information

Overview. About CERN 2 / 11

Overview. About CERN 2 / 11 Overview CERN wanted to upgrade the data monitoring system of one of its Large Hadron Collider experiments called ALICE (A La rge Ion Collider Experiment) to ensure the experiment s high efficiency. They

More information

The CMS Event Builder

The CMS Event Builder The CMS Event Builder Frans Meijers CERN/EP-CMD CMD on behalf of the CMS-DAQ group CHEP03, La Jolla, USA, March 24-28 28 2003 1. Introduction 2. Selected Results from the Technical Design Report R&D programme

More information

PoS(EPS-HEP2017)492. Performance and recent developments of the real-time track reconstruction and alignment of the LHCb detector.

PoS(EPS-HEP2017)492. Performance and recent developments of the real-time track reconstruction and alignment of the LHCb detector. Performance and recent developments of the real-time track reconstruction and alignment of the LHCb detector. CERN E-mail: agnieszka.dziurda@cern.ch he LHCb detector is a single-arm forward spectrometer

More information

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017 In-Network Computing Sebastian Kalcher, Senior System Engineer HPC May 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait

More information

Modeling Resource Utilization of a Large Data Acquisition System

Modeling Resource Utilization of a Large Data Acquisition System Modeling Resource Utilization of a Large Data Acquisition System Alejandro Santos CERN / Ruprecht-Karls-Universität Heidelberg On behalf of the ATLAS Collaboration 1 Outline Introduction ATLAS TDAQ Simulation

More information

CBMnet as FEE ASIC Backend

CBMnet as FEE ASIC Backend CBMnet as FEE ASIC Backend 17th CBM Collaboration Meeting P2 FEE/DAQ/FLES University of Heidelberg Computer Architecture Group, Ulrich Brüning 05.04.2011 Outline Motivation Front-end ASIC CBMnet implementation

More information

ISILON ETHERNET BACKEND NETWORK OVERVIEW

ISILON ETHERNET BACKEND NETWORK OVERVIEW WHITE PAPER ISILON ETHERNET BACKEND NETWORK OVERVIEW Abstract This white paper provides an introduction to the Ethernet backend network for Dell EMC Isilon scale-out NAS. November 2018 Isilon 1 Isilon

More information

Arista 7060X, 7060X2, 7260X and 7260X3 series: Q&A

Arista 7060X, 7060X2, 7260X and 7260X3 series: Q&A Arista 7060X, 7060X2, 7260X and 7260X3 series: Q&A Product Overview What are the 7060X, 7060X2, 7260X & 7260X3 series? The Arista 7060X Series, comprising of the 7060X, 7060X2, 7260X and 7260X3, are purpose-built

More information

Industry Standards for the Exponential Growth of Data Center Bandwidth and Management. Craig W. Carlson

Industry Standards for the Exponential Growth of Data Center Bandwidth and Management. Craig W. Carlson Industry Standards for the Exponential Growth of Data Center Bandwidth and Management Craig W. Carlson 2 Or Finding the Fat Pipe through standards Creative Commons, Flikr User davepaker Overview Part of

More information

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX -2 EN with RoCE Adapter Delivers Reliable Multicast Messaging With Ultra Low Latency

More information

Detector Control LHC

Detector Control LHC Detector Control Systems @ LHC Matthias Richter Department of Physics, University of Oslo IRTG Lecture week Autumn 2012 Oct 18 2012 M. Richter (UiO) DCS @ LHC Oct 09 2012 1 / 39 Detectors in High Energy

More information

Leveraging HyperTransport for a custom high-performance cluster network

Leveraging HyperTransport for a custom high-performance cluster network Leveraging HyperTransport for a custom high-performance cluster network Mondrian Nüssle HTCE Symposium 2009 11.02.2009 Outline Background & Motivation Architecture Hardware Implementation Host Interface

More information

Introduction to Infiniband

Introduction to Infiniband Introduction to Infiniband FRNOG 22, April 4 th 2014 Yael Shenhav, Sr. Director of EMEA, APAC FAE, Application Engineering The InfiniBand Architecture Industry standard defined by the InfiniBand Trade

More information

The CMS Computing Model

The CMS Computing Model The CMS Computing Model Dorian Kcira California Institute of Technology SuperComputing 2009 November 14-20 2009, Portland, OR CERN s Large Hadron Collider 5000+ Physicists/Engineers 300+ Institutes 70+

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

Networking Terminology Cheat Sheet

Networking Terminology Cheat Sheet Networking Cheat Sheet YOUR REFERENCE SHEET FOR GOING WEB-SCALE WITH CUMULUS LINUX With Cumulus Linux, you can build a web-scale data center with all the scalability, efficiency and automation available

More information

ISILON ETHERNET BACKEND NETWORK OVERVIEW

ISILON ETHERNET BACKEND NETWORK OVERVIEW WHITE PAPER ISILON ETHERNET BACKEND NETWORK OVERVIEW Abstract This white paper provides an introduction to the Ethernet backend network for Dell EMC Isilon scale-out NAS. August 2017 Isilon 1 Isilon Ethernet

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC 2494-6 Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP 14-25 October 2013 High Speed Network for HPC Moreno Baricevic & Stefano Cozzini CNR-IOM DEMOCRITOS Trieste

More information

The Exascale Architecture

The Exascale Architecture The Exascale Architecture Richard Graham HPC Advisory Council China 2013 Overview Programming-model challenges for Exascale Challenges for scaling MPI to Exascale InfiniBand enhancements Dynamically Connected

More information

Introduction to Ethernet Latency

Introduction to Ethernet Latency Introduction to Ethernet Latency An Explanation of Latency and Latency Measurement The primary difference in the various methods of latency measurement is the point in the software stack at which the latency

More information

Upgrading the ATLAS Tile Calorimeter electronics

Upgrading the ATLAS Tile Calorimeter electronics ITIM Upgrading the ATLAS Tile Calorimeter electronics Gabriel Popeneciu, on behalf of the ATLAS Tile Calorimeter System INCDTIM Cluj Napoca, Romania Gabriel Popeneciu PANIC 2014, Hamburg 26th August 2014

More information

Paving the Road to Exascale Computing. Yossi Avni

Paving the Road to Exascale Computing. Yossi Avni Paving the Road to Exascale Computing Yossi Avni HPC@mellanox.com Connectivity Solutions for Efficient Computing Enterprise HPC High-end HPC HPC Clouds ICs Mellanox Interconnect Networking Solutions Adapter

More information

The Tofu Interconnect D

The Tofu Interconnect D The Tofu Interconnect D 11 September 2018 Yuichiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouichi Hirai, Toshiyuki Shimizu, Shinya Hiramoto, Yoshiro Ikeda, Takahide Yoshikawa, Kenji

More information

Benchmarking message queue libraries and network technologies to transport large data volume in

Benchmarking message queue libraries and network technologies to transport large data volume in Benchmarking message queue libraries and network technologies to transport large data volume in the ALICE O 2 system V. Chibante Barroso, U. Fuchs, A. Wegrzynek for the ALICE Collaboration Abstract ALICE

More information

MiniDAQ1 A COMPACT DATA ACQUISITION SYSTEM FOR GBT READOUT OVER 10G ETHERNET 22/05/2017 TIPP PAOLO DURANTE - MINIDAQ1 1

MiniDAQ1 A COMPACT DATA ACQUISITION SYSTEM FOR GBT READOUT OVER 10G ETHERNET 22/05/2017 TIPP PAOLO DURANTE - MINIDAQ1 1 MiniDAQ1 A COMPACT DATA ACQUISITION SYSTEM FOR GBT READOUT OVER 10G ETHERNET 22/05/2017 TIPP 2017 - PAOLO DURANTE - MINIDAQ1 1 Overview LHCb upgrade Optical frontend readout Slow control implementation

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Paving the Road to Exascale August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait for the Data

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks Dr. Vinod Vokkarane Assistant Professor, Computer and Information Science Co-Director, Advanced Computer Networks Lab University

More information

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox InfiniBand Host Channel Adapters (HCA) enable the highest data center

More information

Huawei CloudFabric Solution Optimized for High-Availability/Hyperscale/HPC Environments

Huawei CloudFabric Solution Optimized for High-Availability/Hyperscale/HPC Environments Huawei CloudFabric Solution Optimized for High-Availability/Hyperscale/HPC Environments CloudFabric Solution Optimized for High-Availability/Hyperscale/HPC Environments Internet Finance HPC VPC Industry

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview

More information

Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters

Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters G.Santhanaraman, T. Gangadharappa, S.Narravula, A.Mamidala and D.K.Panda Presented by:

More information

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM Lukas Nellen ICN-UNAM lukas@nucleares.unam.mx 3rd BigData BigNetworks Conference Puerto Vallarta April 23, 2015 Who Am I? ALICE

More information

RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits

RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits Sayantan Sur Hyun-Wook Jin Lei Chai D. K. Panda Network Based Computing Lab, The Ohio State University Presentation

More information

Storage and I/O requirements of the LHC experiments

Storage and I/O requirements of the LHC experiments Storage and I/O requirements of the LHC experiments Sverre Jarp CERN openlab, IT Dept where the Web was born 22 June 2006 OpenFabrics Workshop, Paris 1 Briefly about CERN 22 June 2006 OpenFabrics Workshop,

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Tightly Coupled Accelerators Architecture

Tightly Coupled Accelerators Architecture Tightly Coupled Accelerators Architecture Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba, Japan 1 What is Tightly Coupled Accelerators

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007 Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise

More information

DB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011

DB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011 DB2 purescale: High Performance with High-Speed Fabrics Author: Steve Rees Date: April 5, 2011 www.openfabrics.org IBM 2011 Copyright 1 Agenda Quick DB2 purescale recap DB2 purescale comes to Linux DB2

More information

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010 Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed

More information

Implementation of a PC-based Level 0 Trigger Processor for the NA62 Experiment

Implementation of a PC-based Level 0 Trigger Processor for the NA62 Experiment Implementation of a PC-based Level 0 Trigger Processor for the NA62 Experiment M Pivanti 1, S F Schifano 2, P Dalpiaz 1, E Gamberini 1, A Gianoli 1, M Sozzi 3 1 Physics Dept and INFN, Ferrara University,

More information

A Fast Ethernet Tester Using FPGAs and Handel-C

A Fast Ethernet Tester Using FPGAs and Handel-C A Fast Ethernet Tester Using FPGAs and Handel-C R. Beuran, R.W. Dobinson, S. Haas, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu Copyright 2000 OPNET Technologies, Inc. The Large Hadron Collider at CERN

More information

Port Mapping the Cisco Nexus 9K Switch

Port Mapping the Cisco Nexus 9K Switch February 2018 Port Mapping the Cisco Nexus 9K Switch Mapping 36 Port Spine Blades to a Centralized Patching Location with the HD Flex Fiber Cabling System Introduction Customers seek to leverage the Cisco*

More information

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its

More information

Flexible Architecture Research Machine (FARM)

Flexible Architecture Research Machine (FARM) Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense

More information

CERN openlab Summer 2006: Networking Overview

CERN openlab Summer 2006: Networking Overview CERN openlab Summer 2006: Networking Overview Martin Swany, Ph.D. Assistant Professor, Computer and Information Sciences, U. Delaware, USA Visiting Helsinki Institute of Physics (HIP) at CERN swany@cis.udel.edu,

More information

New Test Results for the ALICE High Level Trigger. Timm M. Steinbeck - Kirchhoff Institute of Physics - University Heidelberg - DPG 2005 HK 41.

New Test Results for the ALICE High Level Trigger. Timm M. Steinbeck - Kirchhoff Institute of Physics - University Heidelberg - DPG 2005 HK 41. New Test Results for the ALICE High Level Trigger Timm M. Steinbeck - Kirchhoff Institute of Physics - University Heidelberg - DPG 2005 HK 41.2 1 ALICE High Level Trigger Overview ALICE: LHC heavy-ion

More information

arxiv: v2 [nucl-ex] 6 Nov 2008

arxiv: v2 [nucl-ex] 6 Nov 2008 The TRB for HADES and FAIR experiments at GSI 1 I. FRÖHLICH, J. MICHEL, C. SCHRADER, H. STRÖBELE, J. STROTH, A.TARANTOLA Institut für Kernphysik, Goethe-Universität, 60486 Frankfurt, Germany arxiv:0810.4723v2

More information

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

Module 2 Storage Network Architecture

Module 2 Storage Network Architecture Module 2 Storage Network Architecture 1. SCSI 2. FC Protocol Stack 3. SAN:FC SAN 4. IP Storage 5. Infiniband and Virtual Interfaces FIBRE CHANNEL SAN 1. First consider the three FC topologies pointto-point,

More information

The LHCb upgrade. Outline: Present LHCb detector and trigger LHCb upgrade main drivers Overview of the sub-detector modifications Conclusions

The LHCb upgrade. Outline: Present LHCb detector and trigger LHCb upgrade main drivers Overview of the sub-detector modifications Conclusions The LHCb upgrade Burkhard Schmidt for the LHCb Collaboration Outline: Present LHCb detector and trigger LHCb upgrade main drivers Overview of the sub-detector modifications Conclusions OT IT coverage 1.9

More information

2008 JINST 3 S Online System. Chapter System decomposition and architecture. 8.2 Data Acquisition System

2008 JINST 3 S Online System. Chapter System decomposition and architecture. 8.2 Data Acquisition System Chapter 8 Online System The task of the Online system is to ensure the transfer of data from the front-end electronics to permanent storage under known and controlled conditions. This includes not only

More information

Update on PRad GEMs, Readout Electronics & DAQ

Update on PRad GEMs, Readout Electronics & DAQ Update on PRad GEMs, Readout Electronics & DAQ Kondo Gnanvo University of Virginia, Charlottesville, VA Outline PRad GEMs update Upgrade of SRS electronics Integration into JLab DAQ system Cosmic tests

More information

Highest Levels of Scalability Simplified Network Manageability Maximum System Productivity

Highest Levels of Scalability Simplified Network Manageability Maximum System Productivity InfiniBand Brochure Highest Levels of Scalability Simplified Network Manageability Maximum System Productivity 40/56/100/200Gb/s InfiniBand Switch System Family MELLANOX SMART INFINIBAND SWITCH SYSTEMS

More information

Visionary Technology Presentations

Visionary Technology Presentations Visionary Technology Presentations The path toward C-RAN and V-RAN Philippe Chanclou, 5G WORLD 5G LIVE! THEATRE - DAY ONE JUNE 29th 2016 29th 30 th June 2016 London U-K Co-Ax The Radio Access Network architecture

More information

arxiv: v1 [nucl-ex] 26 Oct 2008

arxiv: v1 [nucl-ex] 26 Oct 2008 1 arxiv:0810.4723v1 [nucl-ex] 26 Oct 2008 TRB for HADES and FAIR experiments at GSI I. FROHLICH, C. SCHRADER, H. STROBELE, J. STROTH, A.TARANTOLA Institut fur Kernphysik, Johann Goethe-Universitat, 60486

More information

<Insert Picture Here> Exadata Hardware Configurations and Environmental Information

<Insert Picture Here> Exadata Hardware Configurations and Environmental Information Exadata Hardware Configurations and Environmental Information Revised July 1, 2011 Agenda Exadata Hardware Overview Environmental Information Power InfiniBand Network Ethernet Network

More information

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook)

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Workshop on New Visions for Large-Scale Networks: Research & Applications Vienna, VA, USA, March 12-14, 2001 The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Wu-chun Feng feng@lanl.gov

More information

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits Key Measures of InfiniBand Performance in the Data Center Driving Metrics for End User Benefits Benchmark Subgroup Benchmark Subgroup Charter The InfiniBand Benchmarking Subgroup has been chartered by

More information

Understanding and Improving the Cost of Scaling Distributed Event Processing

Understanding and Improving the Cost of Scaling Distributed Event Processing Understanding and Improving the Cost of Scaling Distributed Event Processing Shoaib Akram, Manolis Marazakis, and Angelos Bilas shbakram@ics.forth.gr Foundation for Research and Technology Hellas (FORTH)

More information

ETHERNET OPTICS TODAY: 25G NRZ

ETHERNET OPTICS TODAY: 25G NRZ ETHERNET OPTICS TODAY: 25G NRZ THE STATE OF ETHERNET OPTICS PANEL Brad Smith, Director of Marketing, LinkX Interconnects, Mellanox March 23, 2016 OFC 2016 Anaheim, CA BradS@Mellanox.com 1GBASE-T CAT5 is

More information

ExtremeSwitching TM. 100Gb Ethernet QSFP28 Transceivers and Cables. Product Overview. Data Sheet. Highlights. 100Gb Ethernet QSFP28 SR4 MMF

ExtremeSwitching TM. 100Gb Ethernet QSFP28 Transceivers and Cables. Product Overview. Data Sheet. Highlights. 100Gb Ethernet QSFP28 SR4 MMF Data Sheet Highlights Compact Quad Small Form-factor Pluggable (QSFP28) products for high density 100Gb Ethernet applications Hot swappable, field serviceable modular interfaces for Extreme Networks switches

More information

100G and Beyond: high-density Ethernet interconnects

100G and Beyond: high-density Ethernet interconnects 100G and Beyond: high-density Ethernet interconnects Kapil Shrikhande Sr. Principal Engineer, CTO Office Force10 Networks MIT MicroPhotonics Center Spring Meeting April 5, 2011 [ 1 ] Ethernet applications

More information

EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab

EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT Konstantinos Alexopoulos ECE NTUA CSLab MOTIVATION HPC, Multi-node & Heterogeneous Systems Communication with low latency

More information

Memory Management Strategies for Data Serving with RDMA

Memory Management Strategies for Data Serving with RDMA Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands

More information

Examining the Fronthaul Network Segment on the 5G Road Why Hybrid Optical WDM Access and Wireless Technologies are required?

Examining the Fronthaul Network Segment on the 5G Road Why Hybrid Optical WDM Access and Wireless Technologies are required? Examining the Fronthaul Network Segment on the 5G Road Why Hybrid Optical WDM Access and Wireless Technologies are required? Philippe Chanclou, Sebastien Randazzo, 18th Annual Next Generation Optical Networking

More information

Avoid Bottlenecks Using PCI Express-Based Embedded Systems

Avoid Bottlenecks Using PCI Express-Based Embedded Systems Avoid Bottlenecks Using PCI Express-Based Embedded Systems Implementing efficient data movement is a critical element in high-performance embedded systems, and the advent of PCI Express has presented us

More information

Solving the Data Transfer Bottleneck in Digitizers

Solving the Data Transfer Bottleneck in Digitizers Solving the Data Transfer Bottleneck in Digitizers With most modern PC based digitizers and data acquisition systems a common problem is caused by the fact that the ADC technology usually runs in advance

More information

HP InfiniBand Options for HP ProLiant and Integrity Servers Overview

HP InfiniBand Options for HP ProLiant and Integrity Servers Overview Overview HP supports 40Gbps 4x Quad Data Rate (QDR) and 20Gbps 4X Double Data Rate (DDR) InfiniBand products that include Host Channel Adapters (HCA), switches, and cables for HP ProLiant servers, and

More information

Arista 7300X and 7250X Series: Q&A

Arista 7300X and 7250X Series: Q&A Arista 7300X and 7250X Series: Q&A Product Overview What are the 7300X and 7250X Family? The Arista 7300X Series are purpose built 10/40GbE data center modular switches in a new category called Spline

More information