Selection of hardware platform for CBM Common Readout Interface

Similar documents
The new detector readout system for the ATLAS experiment

FELIX the new detector readout system for the ATLAS experiment

The ALICE trigger system for LHC Run 3

MiniDAQ1 A COMPACT DATA ACQUISITION SYSTEM FOR GBT READOUT OVER 10G ETHERNET 22/05/2017 TIPP PAOLO DURANTE - MINIDAQ1 1

Data Acquisition in Particle Physics Experiments. Ing. Giuseppe De Robertis INFN Sez. Di Bari

FELIX: the New Detector Readout System for the ATLAS Experiment

The Many Dimensions of SDR Hardware

S2C K7 Prodigy Logic Module Series

New slow-control FPGA IP for GBT based system and status update of the GBT-FPGA project

Timing distribution and Data Flow for the ATLAS Tile Calorimeter Phase II Upgrade

Nutaq. PicoSDR FPGA-based, MIMO-Enabled, tunable RF SDR solutions PRODUCT SHEET I MONTREAL I NEW YORK I. nutaq. .com QUEBEC

FELI. : the detector readout upgrade of the ATLAS experiment. Soo Ryu. Argonne National Laboratory, (on behalf of the FELIX group)

Entry Stage for the CBM First-level Event Selector

The Design and Testing of the Address in Real Time Data Driver Card for the Micromegas Detector of the ATLAS New Small Wheel Upgrade

Xu Wang Hardware Engineer Facebook, Inc.

Optimizing latency in Xilinx FPGA Implementations of the GBT. Jean-Pierre CACHEMICHE

XMC-RFSOC-A. XMC Module Xilinx Zynq UltraScale+ RFSOC. Overview. Key Features. Typical Applications. Advanced Information Subject To Change

RT2016 Phase-I Trigger Readout Electronics Upgrade for the ATLAS Liquid-Argon Calorimeters

XMC-SDR-A. XMC Zynq MPSoC + Dual ADRV9009 module. Preliminary Information Subject To Change. Overview. Key Features. Typical Applications

Frontend Control Electronics for the LHCb upgrade Hardware realization and test

COSMOS Architecture and Key Technologies. June 1 st, 2018 COSMOS Team

BES-III off-detector readout electronics for the GEM detector: an update

Nutaq. PicoSDR FPGA-based, MIMO-Enabled, tunable RF SDR solutions PRODUCT SHEET I MONTREAL I NEW YORK I. nutaq. .com QUEBEC

Scintillator Data Acquisition System

EyeCheck Smart Cameras

VPX3-ZU1. 3U OpenVPX Module Xilinx Zynq UltraScale+ MPSoC with FMC HPC Site. Overview. Key Features. Typical Applications

XMC-ZU1. XMC Module Xilinx Zynq UltraScale+ MPSoC. Overview. Key Features. Typical Applications

arxiv: v2 [nucl-ex] 6 Nov 2008

DHCAL Readout Back End

Ethernet transport protocols for FPGA

Interface electronics

CBMnet as FEE ASIC Backend

Strategies for Deploying RFSoC Technology for SIGINT, DRFM and Radar Applications. Rodger Hosking Pentek, Inc. WInnForum Webinar November 8, 2018

The White Rabbit Project

arxiv: v1 [nucl-ex] 26 Oct 2008

Centre de Physique des Particules de Marseille. The PCIe-based readout system for the LHCb experiment

New! New! New! New! New!

VE883. 4K HDMI Optical Extender (K1, MM) / 10km (K2, SM))

Level-1 Data Driver Card of the ATLAS New Small Wheel Upgrade Compatible with the Phase II 1 MHz Readout

Field Programmable Gate Array (FPGA) Devices

Update on PRad GEMs, Readout Electronics & DAQ

S100 Series. Compact Smart Camera. High Performance: Dual Core Cortex-A9 processor and Xilinx FPGA. acquisition and preprocessing

TE EE-S Starter Kit

Prototyping NGC. First Light. PICNIC Array Image of ESO Messenger Front Page

New! New! New! New! New!

Introduction Technology Equipment Performance Current developments Conclusions. White Rabbit. A quick introduction. Javier Serrano

NVMe-IP Introduction for Xilinx Ver1.7E

Virtex 6 FPGA Broadcast Connectivity Kit FAQ

VPX645. NVMe HBA with RAID (0, 1, 5, 6, 10, 50 and 60), 3U VPX. Key Features. Benefits. 3U VPX NVMe Host Bus Adapter with Full support for RAID

Deployment of the CMS Tracker AMC as backend for the CMS pixel detector

SOFTWARE DEFINED RADIO

Switch v3 Hardware Status and Plans

VPX3-ZU1. 3U OpenVPX Module Xilinx Zynq UltraScale+ MPSoC with FMC HPC Site. Overview. Key Features. Typical Applications

Functional Specification

SRS scalable readout system Status and Outlook.

In-chip and Inter-chip Interconnections and data transportations for Future MPAR Digital Receiving System

Acquisition system for the CLIC Module.

Ron Emerick, Oracle Corporation

FPGA Implementation of RDMA-Based Data Acquisition System Over 100 GbE

A generic firmware core to drive the Front-End GBT-SCAs for the LHCb upgrade

1-Port 10G SFP+ Fiber Optic Network Card - PCIe - Intel Chip - MM

PCIe 10G SFP+ Network Card

ROM Status Update. U. Marconi, INFN Bologna

Validation of the front-end electronics and firmware for LHCb vertex locator.

TTC/TTS Tester (TTT) Module User Manual

THE Large Hadron Collider (LHC) will undergo a series of. FELIX: the New Detector Interface for the ATLAS Experiment

A HT3 Platform for Rapid Prototyping and High Performance Reconfigurable Computing

Improving Packet Processing Performance of a Memory- Bounded Application

INT G bit TCP Offload Engine SOC

PCI Express 4-Port Industrial Serial I/O Cards

MDC Optical Endpoint. Outline Motivation / Aim. Task. Solution. No Summary Discussion. Hardware Details, Sharing Experiences and no Politics!

HPE FlexNetwork Tbps Fabric with 8-port 1/10GbE SFP+ and 2-port 40GbE QSFP+ Main Processing Unit - LSQM1SRP8X2QE0 (JH209A, JH217A)

5051 & 5052 PCIe Card Overview

FPGA based microserver for high performance real-time computing in Adaptive Optics

NVMe-IP Introduction for Xilinx Ver1.8E

DP-8020 Hardware User Guide. UG1328 (v 1.20) December 6, 2018

EMU FED. --- Crate and Electronics. ESR, CERN, November B. Bylsma, S. Durkin, Jason Gilmore, Jianhui Gu, T.Y. Ling. The Ohio State University

Realize the Genius of Your Design

VPX754. Intel Xeon SoC, 3U VPX, PCIe Gen3. Key Features. Benefits VPX754

USCMS HCAL FERU: Front End Readout Unit. Drew Baden University of Maryland February 2000

Signal Conversion in a Modular Open Standard Form Factor. CASPER Workshop August 2017 Saeed Karamooz, VadaTech

. Micro SD Card Socket. SMARC 2.0 Compliant

White Rabbit Module for NI CompactRIO Daniel Florin and David Wolf Physik Institut / Universität Zürich

UltraScale Architecture and Product Overview

White Paper. ORSPI4 Field-Programmable System-on-a-Chip Solves Design Challenges for 10 Gbps Line Cards

IGLOO2 Evaluation Kit Webinar

A new approach to front-end electronics interfacing in the ATLAS experiment

OBSAI Protocol Tester

Radiation-Hard/High-Speed Parallel Optical Links

CMX Hardware Overview

TOE10G-IP Multisession Demo Instruction Rev Nov-16

Copyright 2017 Xilinx.

Physics Procedia 00 (2011) Technology and Instrumentation in Particle Physics 2011

FASTER DAQ Network May 16th 2018 Clermont-Ferrant

FPGA system development What you need to think about. Frédéric Leens, CEO

4-Port Gigabit Ethernet Network Card - PCI Express, Intel I350 NIC

Designing with the Xilinx 7 Series PCIe Embedded Block. Tweet this event: #avtxfest

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

VT988. Key Features. Benefits. High speed 16 ADC at 3 GSPS with Synchronous Capture VT ADC for synchronous capture

LSI SAS e PCI Express to 6Gb/s Serial Attached SCSI (SAS) Host Bus Adapter

Transcription:

Selection of hardware platform for CBM Common Readout Interface W.M.Zabołotnya, G.H.Kasprowicza, A.P.Byszuka, D.Emschermannb, M.Gumińskia, K.T.Poźniaka, R.Romaniuka Institute of Electronic Systems Warsaw University of Technology b GSI-Helmholtzzentrum für Schwerionenforschung GmbH a Presenter: Wojciech M. Zabołotny 1/25

Original concept of the CBM readout CROB GBT links DPB Data hro nc ds, Sy an ck, m m Clo us co tatus no sy s bu CROB Optical link Data FLES computing node Statu resp s, Comm onse a s, Da nd ta Timeslice Builder and FLES InfiniBand CROB FEE Computer center ~700 m Cloc k nizat, Synchr i o com on, Cont man d s ro l Setup and Control Commands, Commands responses ~100 m FLES input node CBM service building FLIB CBM detector cave TFC ECS 2/25

STS-XYTER tester STS Rx CLK DO DI DI DI DI E-Link blackbox model DI 1Gbps Ethernet Interface STS CMD Processor STS/MUCHXYTER2 model STS CMD Tx STS Readout FIFO LVDS Interface LVDS Interface AFCK FPGA IPbus Controller The DPB firmware blocks were tested in the STS-XYTER tester. The tester uses the black-box model of the GBTx and E-Links and IPbus-based readout with limited bandwidth. STS/MUCH-XYTER2 Interface Core Slide presented on the CBM Collaboration meeting 03.2017 3/25

J1B Forth CPU J1B I/O registers J1B I/O registers GBTx on VLDB Chipscope GBT-FPGA The tester firmware uses the GBTFPGA IP Core from CERN (currently version 5.0.0) AFCK FPGA HDMI connectors To test the operation of the GBTX ASIC in modes needed in the STS/MUCH readout (160 Mb/s GBT frame for downlink and 320 Mb/s wide bus for uplink), the dedicated AFCK-GBT-tester has been created. 4.8 Gb/s link USB UART interface Integration of the tester with GBT Testing using both interactive mode and complex routines was possible due to the addition of Forth CPU J1B with SwapForth to the firmware. Slide presented on the CBM Collaboration meeting 03.2017 4/25

Hardware used for GBT testing Test setup based on the AFCK board supplemented with the FMS14 board and the e-link FMC. Two HDMI cables with HDMI-micro to HDMI-mini adapters. Shifted clock routed via separate coaxial cable to the U.FL connector on the AFCK (required disabling of Si570 though!) Slide presented on the CBM Collaboration meeting 03.2017 5/25

FADE Interface 1Gbps Ethernet Interface IPbus Controller STS Rx 10 Gbps Ethernet Interface STS CMD Tx STS CMD Processor AFCK FPGA STS Readout FIFO GBT-compatible controller GBT-FPGA STS/MUCHXYTER2 model LVDS Interface GBTx 4.8 Gb/s link Current state of the GBT-based STS-XYTER tester STS/MUCH-XYTER2 Interface Core 6/25

Previous concept of the CBM readout CROB GBT links DPB Data hro nc ds, Sy an ck, m m Clo us co tatus no sy s bu CROB Optical link Data FLES computing node Statu resp s, Comm onse a s, Da nd ta Timeslice Builder and FLES InfiniBand CROB FEE Computer center ~700 m FPGA! Cloc k nizat, Synchr i o com on, Cont man d s ro l Setup and Control Commands, Commands responses ~100 m FLES input node CBM service building FLIB CBM detector cave PC TFC ECS 7/25

New concept of the CBM readout CROB FEE CBM service building ~100 m Cloc k nizat, Synchr i o com on, Cont man d s ro l Statu resp s, Comm onse a s, Da nd ta CROB ~700 m FPGA! Timeslice Builder and FLES InfiniBand Optical link PCIe CRI Data Data TFC Ethernet Setup and Control Commands, Commands responses CROB Clock, Synchronous commands, busy status GBT links Computer center PC FLES computing node CBM detector cave PC ECS 8/25

Requirements to the CRI board Handling of the required number of GBT links to receive the data from the CROBs Providing the sufficient PCIe bandwidth to send the received data to the FLES input node Providing the sufficient amount of logical resources to concentrate the data 9/25

CROB board x x (STS) (TRD) ROB-3 A ROB-3 will interface to 3 MGTs on the DPB. Based on the slide presented by David Emschermann @ CBM ASIC workshop in Darjeeling 22.02.2017 10/25

Available optical transceivers Source: https://www.finisar.com/optical-transceiver s/ftl410qd2c QSFP+: Finisar FTL410QD2C 22 with approximate price of $300 for 4 duplex channels. 12-channel - Foxconn MiniPOD 23 modules with the approximate price of $360 for simplex channels (either Tx or Rx). Source: Foxconn 12-channel - Foxconn MicroPOD 24 modules with the approximate price of $130 for 12 Tx channels and $280 for 12 Rx channels. Source: Foxconn 24-channel - Finisar FBOTD10SL1C00 25 with the approximate price of $860 for 24 duplex channels. The RX only FBRTP10SL1C00 modules are also available. Source: Foxconn 11/25

Optical input possibilities It seems that the best cost per channel may be achieved with the 12-channel MicroPOD modules. The smallest configurations with a reasonable utilization of optical channels are the following: With two 12-channel receivers and one 12-channel transmitter, we can connect 8 CROBs to a single CRI. In this case, we waste 4 transmit channels (may be used as spares). With three 12-channel receivers and one 12-channel transmitter, we can connect 12 CROBs to a single CRI. In this case, we fully utilize optical channels. 12/25

Input data bandwidth analysis The number of handled CROBs defines also the required PCIe bandwidth Worst case analysis: 4.48 Gbps/GBT link, 13.44 Gbps/CROB, 108 Gbps for 8 CROBs, 162 Gbps for 12 CROBs Optimistic analysis: 9.41 Mhit/s/SMX2 chip, 20 bits of non-aggregated data/hit, 40 SMX2 chips/crob, 7.53 Gbps/CROB, 60.22 Gb/s for 8 CROBs, 90.34 Gb/s for 12 CROBs. 13/25

PCIe bandwidth analysis PCIe 3.0 or 3.1: 7.877 Gbps/lane, UltraScale+ devices may use up to 16 lanes. For pessimistic assessment: For optimistic assessment: we need 16 lanes for 8 CROBs (bandwidth 126 Gb/s - margin of 14%) we need two links each with 16 lanes for 12 CROBs (bandwidth of 252 Gb/s margin of 36%). a single link with 8 lanes should be sufficient for 8 CROBs (bandwidth of 63 Gb/s) but the margin of 4% is probably too low, and therefore the 16-lanes link will be needed (margin over 50%). The same link will be needed in case of 12 CROBs (margin of 28%). The PCIe 4.0: 15.752 Gbps/lane, UltraScale+ devices may use up to 8 lanes. For pessimistic assessment: one 8-lanes link for 8 CROBs two 8-lanes links for 12 CROBs For optimistic assessment one 8-lanes link both for 8 and 12 CROBs 14/25

Required MGT resources #CROBs #lanes PCIe 3 #lanes PCIe 4 opt. pess. opt. pess. #GBTs 8 CROBs 24 16 16 8 8 12 CROBs 36 16 32 8 16 15/25

Total number of required MGT resources (+1 for TFC) For PCIe 3 For PCIe 4 #CROBs opt. pess. opt. pess. 8 CROBs 41 41 33 33 12 CROBs 53 69 45 53 16/25

Available FPGAs KU11P KU15P ZU11EG ZU17EG ZU19EG System Logic Cells 653,100 1,143,450 653,100 926,124 1,143,450 Block RAM Blocks 600 984 600 796 984 UltraRAM Blocks 80 128 80 102 128 Enclosure FFVE 1517 FFVE 1517 FFVA 1760 FFVC 1760 FFVC 1760 FFVD 1760 FFVC 1760 FFVD 1760 Number of transceivers GTH+GTY 32+20 32+24 44+32 32+16 32+16 44+28 32+16 44+28 Total number of transceivers 52 56 76 48 48 72 Example price $3500 $4600 $7900 $5000? $3900 $4900 $5400 48 72 $4400 $4900 17/25

Cost analysis The estimated cost per CROB (assuming $130 for MicroPOD Tx and $280 for MicroPOD Rx): (2 $280 + $130 + $3500) /8CROBS $524 in case of 8CROB version with XCKU11P in FFVE1517 enclosure. (3 $280 + $130 + $4600) /12CROBS $464 in case of 12CROB version with XCKU15P in FFVA1517 enclosure (optimistic assessment). For ZynqMP chips: (2 $280 + $130 + $3900) /8CROBS $574 in case of 8CROB version with ZU11EG in FFVC1760 enclosure. (3 $280 + $130 + $4900) /12CROBS $489 in case of 12CROB version with ZU17EG in FFVD1760 enclosure. 18/25

Available FPGAs KU11P KU15P ZU11EG ZU17EG ZU19EG System Logic Cells 653,100 1,143,450 653,100 926,124 1,143,450 Block RAM Blocks 600 984 600 796 984 UltraRAM Blocks 80 128 80 102 128 Enclosure FFVE 1517 FFVE 1517 FFVA 1760 FFVC 1760 FFVC 1760 FFVD 1760 FFVC 1760 FFVD 1760 Number of transceivers GTH+GTY 32+20 32+24 44+32 32+16 32+16 44+28 32+16 44+28 Total number of transceivers 52 56 76 48 48 72 Example price $3500 $4600 $7900 $5000? $3900 $4900 $5400 48 72 $4400 $4900 19/25

Availability of resources Preliminary analysis of resources defines number of BRAMs (for data sorting): 320 BRAMs for 8 CROBs, 480 BRAMs for 12 CROBs Consumption of other resources will be assessed after the first CRI firmwares are created. Upgrade possibilities (in case of unsufficient resources, in the same package): XCKU11P XCKU15P ZU11EG ZU17EG ZU17EG ZU19EG 20/25

Distribution of time and fast control The CROB modules (and further boards) require precise synchronization Time synchronization (PPS) Reference clock signal synchronous control commands. Possible solutions: Support for White Rabbit or similar protocol in every CRI board. The clock is received from the optical signal from the SFP transceiver. Dedicated receiver board use of a standard WR node in the form of a PCIe card. The clock and timing signals distributed via pear-to-pear or daisychain connectors (e.g. minisas). Delivery of both clock and control messages using dedicated optical or copper links to each CRI board from external distribution unit (e.g. by the TTC-PON system developed at CERN) Selected FPGAs may implement all those options, but additional hardware (e.g. the jitter cleaner must be provided by the board) 21/25

Is it reasonable to use ZynqMP? We get quad-core ARM almost for free! DDR4 RAM must be mounted 2400MT/s throughput not sufficient to pass the data through the RAM even for 64 bit data width Use of ARM core for control Without the DDR4 RAM in bare metal configuration with 1MB L2 memory With slow 32-bit LPDDR memory even under control of Linux OS Usability of ZynqMP chips must be evaluated 22/25

Visualization of the possible CRI board 23/25

Conclusions It is possible to create the CRI board supporting 8 or 12 CROBs in a PCIe card form factor, based on new UltraScale+ FPGAs from Xilinx The 12 CROB version, gives sligthtly lower cost/channel (12%) but is more technologically challenging Information from the team responsible for FLES input nodes also suggest to use the 8 CROB version Final decision will be possible after the protype firmware is implemented. 24/25

Thank you for your attention! 25/25