Computing Infrastructure for Online Monitoring and Control of High-throughput DAQ Electronics

Size: px

Start display at page:

Download "Computing Infrastructure for Online Monitoring and Control of High-throughput DAQ Electronics"

Oswin Curtis
6 years ago
Views:

Computing Infrastructure for Online Monitoring and Control of High-throughput DAQ S. Chilingaryan, M. Caselle, T. Dritschler, T. Farago, A. Kopmann, U. Stevanovic, M.

1 Computing Infrastructure for Online Monitoring and Control of High-throughput DAQ S. Chilingaryan, M. Caselle, T. Dritschler, T. Farago, A. Kopmann, U. Stevanovic, M. Vogelgesang Hardware, Software, and Network Organization Picosecond Sampling for Terahertz Synchrotron Radiation KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Prototype of Streaming PCIe Camera Developed in house

2 Requirements Handling of Sensors with data rates up to ~ 8 GB/s (8-12 bit) Real-time control loop based on 2D Images + Online compression In-flow 8 GB/s, unpacked up to 16 GB/s (16 bit) Slow control loop based on 3D Tomographic Images In-flow 4 GB/s, unpacked 32 GB/s (single-precision floating-point) Raw data storage at full speed, i.e. 4 GB/s Long-term storage at 1 GB/s Integration with Tango Control System Low administrative effort Transfer up to 8 GB/s rates 4 GB/s 4 GB/s 1 GB/s 0,25 GB/s Outer control loop Camera Camera PC PCIe + GPUDirect 2 UFO Control System Infiniband Real-Time Storage iser Long-term Storage GlusterFS NFS Archiving Real-time control loop

Concepts Programmable DAQ electronics with PCI-express interface Distributed control system based on Infiniband interconnects GPU-based computing Multiple levels of scalability Cheap off-the-shelf

3 Concepts Programmable DAQ electronics with PCI-express interface Distributed control system based on Infiniband interconnects GPU-based computing Multiple levels of scalability Cheap off-the-shelf components GFlops Prototype of Streaming PCIe Camera Developed in house Poster FPI Xeon/SP GeForce/SP Xeon/DP GeForce/DP 2012 Tesla/SP Historical trends of CPU and GPU performance 2014 Tesla/DP Easily scalable Quad-SLI 35 Tflops for ~ 5000 EUR

UFO Control Network Control Room Beam-line Compute Center

(Optical) 10 Gig Ethernet Infiniband (Electrical) Storage

Facility Camera Station Ethernet IB router Storage Node

4 UFO Control Network Control Room Beam-line Compute Center PCIe x8 gen3 Master Server (lots of memory) Infiniband (Optical) 10 Gig Ethernet Infiniband (Electrical) Storage Node OpenCL Node OpenCL Node Archive Large Scale Data Facility Camera Station Ethernet IB router Storage Node Talks on Data Analysis Lab: Scalable Control Network 4 WCO202 FCO202

5 Master Server Cameras FDR Infiniband Ethernet 56 Gbit/s 10 Gb/s Storage LSDF Large Scale Data Facility Internal PCO.edge PCO.dimax. SuperMicro 7047GR-TRF (Intel C602 Chipset) CPU: 2 x Xeon E5-2680v2 ( total 20 cores at 2.8 Ghz) GPUs: 7 x NVIDIA GTX Titan Memory: 256 GB (512GB max) Network: Intel 82598EB (10 Gb/s) Infiniband: 2 x Mellanox ConnectX-3 VPI Storage: Areca ARC-1880-ix-12 SAS Raid 8 x Samsung 840 Pro 510 (Raid0) High amount of memory Fast SSD-based Raid for overflow data 5

Master Server Cameras Internal PCO.edge PCO.dimax. FDR Infiniband Ethernet 56 Gbit/s 10 Gb/s External PCIe x16 (16 GB/s) Storage LSDF Large Scale Data Facility SFF8088 (2.

6 Master Server Cameras Internal PCO.edge PCO.dimax. FDR Infiniband Ethernet 56 Gbit/s 10 Gb/s External PCIe x16 (16 GB/s) Storage LSDF Large Scale Data Facility SFF8088 (2.4 GB/s) SuperMicro 7047GR-TRF (Intel C602 Chipset) CPU: 2 x Xeon E5-2680v2 ( total 20 cores at 2.8 Ghz) GPUs: 7 x NVIDIA GTX Titan Memory: 256 GB (512GB max) Network: Intel 82598EB (10 Gb/s) Infiniband: 2 x Mellanox ConnectX-3 VPI Storage: Areca ARC-1880-ix-12 SAS Raid 8 x Samsung 840 Pro 510 (Raid0) 16 x Hitachi A7K200 (Raid6) High amount of memory Fast SSD-based Raid for overflow data Easy scalability with external PCI express and SAS 6

7 Caching large data sets RamdDisk Samsung 840 (8) Intel X25E (1) 140 A7K2000 (16) A7K2000 (1) MB/s Using SSD drives may significantly increase random access performance to the data sets which are not fitting in memory completely. The big arrays of magnetic hard drives will not help unless multiple readers involved. 7

Camera Station Camera Link 850 MB/s Infiniband 56 Gbit/s PCI express (x8) Up to 8 GB/s Asus Z9PA-U8 (Intel C602 Chipset) CPU: Xeon E5-1620v2 ( total 4 cores at 3.

8 Camera Station Camera Link 850 MB/s Infiniband 56 Gbit/s PCI express (x8) Up to 8 GB/s Asus Z9PA-U8 (Intel C602 Chipset) CPU: Xeon E5-1620v2 ( total 4 cores at 3.7 Ghz) GPUs: NVIDIA GTX Titan Memory: 32 GB (128GB max) Infiniband: Mellanox ConnectX-3 VPI High-speed 4-channel memory IPMI-based remote control Optional fast SSD-based storage 4x high speed PCI express slots 8

NVIDIA GPUDirect memcpy performance Core i7

H2H all cores D2D/GPUDirect D2D 1 core 0 5

9b throughput Direct communication between

9 NVIDIA GPUDirect memcpy performance Core i7 950 Core i7-980x Core i x E H2H all cores D2D/GPUDirect D2D 1 core GB/s MVAPICH2 1.9b throughput Direct communication between GPUs, Network, and other devices on PCI express bus 9

Cluster Node Asus Z87-WS (Intel Z87 PCH Chipset) CPU: Core i5-4670 ( total 4 cores at 3.

10 Cluster Node Asus Z87-WS (Intel Z87 PCH Chipset) CPU: Core i ( total 4 cores at 3.4 Ghz) GPUs: 3 x NVIDIA GTX Titan Infiniband: Mellanox ConnectX-3 VPI Memory: 16 GB (32GB max) NVIDIA GTX Titan Memory: 6 GB at 288 GB/s Single-precision Gflops: 4500 Double-precision Gflops: Way SLI Low Price 10

11 Storage Protocols Network FS NFS Samba SSHFS Slow Cluster FS Lustre (patched kernel) Gluster FhGFS (close-sourced) Slow if few nodes Network Devices iscsi (slow) iser OCFS2 Gluster (over XFS) FhGFS (over XFS) OCFS2/iSER OCFS2/Local XFS/iSER XFS/Local iser Raw Write 11 Read MB/s

12 Handling High-speed Storage Data Write Read Buffer Cache 0 Buffered Storage Default data flow in Linux Direct AIO MB/s Buffer cache significantly limits maximal write performance Kernel AIO may be used to program IO scheduler to issue read requests without delays Optimizing I/O for maximum streaming performance using a single data source/receiver 13

13 UFO Storage Subsystem Master Server Client 1 High-speed Streaming storage Read: 2.3 GB/s Write: 2.5 GB/s Client 2 Compute Node 1 NFS 16 TB, XFS Compute Node 2 SoftRaid Level 0 8 TB GlusterFS 8 TB Compute Node 3 iser iser 8 TB 20 TB Storage Node 1 Raid6: 16 Hitachi 7K300, 28TB 14 8 TB 20 TB Storage Node 2 Raid6: 16 Hitachi 7K300, 28TB

14 Software Stack Device Device Motors Other slow device Tango Control KIRO ALPS Advanced Linux PCI Services LibPCO UFO Master Server Concert Control System SoftRaid Python XFS Gobject-Introspection LibUCA Unified Camera Access UFO GPU Image Processing Framework iser FastWriter Streaming Library PCO Drivers Camera Station OpenCL MPI UFO Storage Cluster Computing Cluster 15

High-Throughput Data Acquisition in a TANGO Environment

15 KIRO: Integration with Tango Tango over Corba over TCP over Infiniband is slow Tango/KIRO Poster FPI01 Using InfiniBand for High-Throughput Data Acquisition in a TANGO Environment Tango/Standard MB/s KIRO Architecture 16

16 Summary Only easy to get off-the-shelf components are used Our architecture can be easily scaled from a single PC to the cluster with performance in hundreds of teraflops. The reliable storage for data streaming with rates over 3 GB/s can be easily build based on 1 2 low-end servers. The electronic components can be distributed over large area and connected with high-speed using Optical Infiniband links. The provided software stack allows easily integrate new devices and processing algorithms. The Tango control system is extended to support high-speed communication over Infiniband Check related talks and posters: 17 Poster FPI01 WCO202 Using InfiniBand for High-Throughput Data Acquisition in a TANGO Environment Data Management at the Synchrotron Radiation Facility ANKA Poster FPI02 FCO202 Picosecond Sampling for Terahertz Synchrotron Radiation OpenGL-Based Data Analysis in Virtualized Self-Service Environments

17 Extra Slides 18

Ultra Fast X-ray Imaging of Scientific Processes with UFO On-Line Assessment and Data-Driven Process Control ANKA Optics and sample beam line manipulators Smart highspeed camera Online monitoring and

18 Ultra Fast X-ray Imaging of Scientific Processes with UFO On-Line Assessment and Data-Driven Process Control ANKA Optics and sample beam line manipulators Smart highspeed camera Online monitoring and evaluation Offline storage Goals High speed tomography Increase sample throughput Tomography of temporal processes Allow interactive quality assessment 19 Enable data driven control Auto-tunning optical system Tracking dynamic processes Finding area of interest

UFO Image Processing Framework acquisition flat field correction

................. Preprocessing Executed on CPUs sinogram generation FFT

Processing Glib/GObject, scripting language support with introspection

19 UFO Image Processing Framework acquisition flat field correction Reconstruction Executed on GPUs noise reduction Preprocessing Executed on CPUs sinogram generation FFT filter ifft back projection Storage Segmentation / meshing storage of raw data OpenSource Features Easy Algorithm Exchange Camera Abstraction Pipelined Processing Glib/GObject, scripting language support with introspection OpenCL + automated management of OpenCL buffers Filters Tomography & Laminography FBP, DFI, Algebraic Methods Non Local Means for Noise Reduction Optical Flow 20

ALPS Scripting LabVIEW Bash, Perl, Ruby, Python Control System

Python/GTK User Interface Remote Programming API Event Engine

Register List + Dynamic Registration API PCI Memory Access

Memory Mapping IRQ Handling Driver Access Layer Small & Easy

20 ALPS Scripting LabVIEW Bash, Perl, Ruby, Python Control System Integration pcitool Web Service GUI Command-line tool Python/GTK User Interface Remote Programming API Event Engine IPE Camera DMA Engine Northwest Logic DMA Register Access XML Register List + Dynamic Registration API PCI Memory Access Plain and FIFO Access Serialization, Software Registers DMA Memory Mapping IRQ Handling Driver Access Layer Small & Easy to Maintain PCI / PCI Express Board (Variety of FPGA Boards: KIT Camera, etc) 21

Performance-oriented instrumentation for high-speed synchrotron imaging

Performance-oriented instrumentation for high-speed synchrotron imaging S. Chilingaryan, M. Caselle, T. Dritschler, A. Kopmann, A. Shkarin, M. Vogelgesang acquisition flat field correction noise reduction