A Real Time Controller for E-ELT
|
|
- Georgina McLaughlin
- 5 years ago
- Views:
Transcription
1 A Real Time Controller for E-ELT Addressing the jitter/latency constraints Maxime Lainé, Denis Perret LESIA / Observatoire de Paris Project # funded by European Commission under program H2020-EU coordinated in H2020-FETHPC-2014
2 Green Flash RTC prototypes or E-ELT AO system European project: Horizon 2020 Research and innovation program 3 years project (a year has passed already) Low cost and low energy consumption RTC Different kind of accelerators characterization GPU FPGA XeonPhi Public/private sector partnership
3 E-ELT Numbers & Methods Case Method Dimension Encoding Frequency Size Throughput MCAO SCAO Shack- Hartmann Shack- Hartmann 1.6k x 1.6k 16 bits 500Hz 800 x bits 1000Hz SCAO Pyramid 240 x bits 1000Hz 40.96Mb (5,12Mo) 10.24Mb (1,28Mo) ~1Mb (~125Ko) Gb/s Gb/s 1 Gb/s Sequential Slopes/MVM processing Segmented Slopes/MVM processing
4 E-ELT Numbers & Methods Case Method Dimension Encoding Frequency Size Throughput MCAO SCAO Shack- Hartmann Shack- Hartmann 1.6k x 1.6k 16 bits 500Hz 800 x bits 1000Hz SCAO Pyramid 240 x bits 1000Hz 40.96Mb (5,12Mo) 10.24Mb (1,28Mo) ~1Mb (~125Ko) Gb/s Gb/s 1 Gb/s MCAO Case: 40,96 Mb with 40Gb/s network => 1ms latency
5 Latency and jitter constraints Latency typical optic fiber: 4,9µs/Km Closest city 130km (Antofagasta) implies 0.630ms latency (+transfer time) Jitter Under 10% of overall latency in order to be manageable Too much jitter Too much frame skips Correction stability hard to reach MCAO 2ms/iter, SCAO 1ms/iter The nearer the better The lesser the better
6 Usual Telescope RTC SPARTA Design for VLT scales RTC BOX: x86 CPU FPGA DSP Co-processing Cluster: X86 CPU At ELT scales: needs for a super-calculator in the observatory
7 E-ELT GreenFlash RTC Prototype Sensor(s) Low latency Low jitter DM 40 GbE network High bandwidth Low latency Real Time Controller 40 GbE network High bandwidth Supervisor Telemetry High framerate High bandwidth Low latency Fast storage high throughput High bandwidth High throughput
8 Legacy GPU programming main { setup(); while(run){ recv( ); cudamemcpy(, HostToDevice); computing_kernel<<<>>>( ); cudamemcpy(, DeviceToHost); send( ); } } PCIe GPU GPU RAM 10GbE NIC CPU CPU RAM
9 Legacy GPU Programming cudamemcopy() overhead times (5.12Mo in, 64Ko out) Kernel launches overhead times Both cases : jitter of 20 to 30 µsec
10 Legacy GPU Programming cudamemcopy() overhead times (5.12Mo in, 64Ko out) Kernel launches overhead times Both cases : jitter of 20 to 30 µsec (40 µsec sometimes)
11 Legacy GPU programming MCAO : image 40.96Mb, commands 512Kb Over 40GbE network transfer takes almost 1ms Same for cudamemcpy() operations camera integration transfer i i+1 i+2 i-1 i i+1 transfer handling data / exec handling sequential processing segmented processing Leaves not enough to no time for computations Dec 21st 2016
12 GPUDirect + Custom FPGA NIC Allows third party PCI-e device p2p access Goals : Negates latency overhead and reduce jitter induced by cudamemcpy cudamemcopy() overhead times (5,12Mo in, 64Ko out) Linux Kernel module: expose CUDA buffers phy@
13 Persistent CUDA kernel Exports computation loop on GPU Goals : Reduce kernel launch jitter & start computations as soon as data arrive Kernel launches overhead times Uses memory polling for data arrival detection
14 Persistent kernel + GPUDirect main { setup(); persistent_kernel<<<>>>( ); while(run){ waitgpu( ); startdmatransfer( ); } } persistent_kernel( ){ while(run){ pollmemory( ); notifycpu( ); } } PCIe GPU notify GPU RAM start CPU Problem: GPU can t command FPGA Mapped CPU memory in CUDA for GPU/CPU notification 10GbE FPGA NIC CPU RAM
15 Persistent kernel + GPUDirect FPGA writes/reads directly to/from GPU memory Using only writes would be better though DMA Host ram CPU app Camera control UDP Offload Engine Latency measurement Camera protocol handler DMC protocol handler FPGA NIC measures answers DMA DMA DMA DMA PCI-e start GPU ram Pixels ring buffer DM com buffer FPGA control Sync thread GPU polling kernel compute kernels notify
16 GPUDirect & FPGA NIC - RTT FPGA PLDA XPressG5 GPU Tesla C2070 OS Debian wheezy Camera EVT HS-2000M 10GbE network µsec SCAO Pyramid case: 240 x 240 pixels, encoded on 16b No GPUDirect GPUDirect + persistent kernel Results are coherent with expectations iterations
17 Persistent kernel + GPUDirect If it scales to MCAO case camera integration i i+1 i+2 transfer transfer handling data / exec handling sequential processing segmented processing i-1 i i+1
18 IOMemory mapping (CUDA 7.5) cumemhostregister( ) using CU_MEMHOSTREGISTER_IOMEMORY flag main { setup(); persistent_kernel <<<>>>( ); } persistent_kernel( ){ while(run){ pollmemory( ); actual_computation; startdmatransfer( ); } } PCIe GPU GPU RAM Mapping FPGA addresses into CUDA memory space allows DMA control fromgpu start CPU 10GbE FPGA NIC CPU RAM
19 IOMemory mapping UDP Offload Engine Latency measurement Camera protocol handler DMC protocol handler FPGA NIC measures answers DMA DMA DMA DMA DMA PCI-e start Host ram GPU ram Pixels ring buffer DM com buffer CPU app Camera control FPGA control Meas. Comp. GPU polling kernel compute kernels Little to no improvements, but CPU free for other kind of computations
20 Conclusion / Perspectives Using GPUDirect and a persistent kernel allow efficient data delivery to the Real Time Controller GPUDirect approach can be applied to Supervisor using interruptions from FPGA for CPU execution control (kernel launch) Simulation setup to benchmark ELT SCAO/MCAO scales thoroughly Test with new hardware: PLDA ExpressKUS FPGA and Nvidia K40/80 & P100 GPUs. (and Arria10 FPGAs in a near future) Develop computation modules on FPGA to further reduce GPU load e.g.: Slopes computation for segmented processing
21 Thank you for your attention (Questions?) Project # funded by European Commission under program H2020-EU coordinated in H2020-FETHPC-2014
A Real Time Controller for E-ELT
A Real Time Controller for E-ELT Addressing the jitter/latency constraints Maxime Lainé, Denis Perret LESIA / Observatoire de Paris Project #671662 funded by European Commission under program H2020-EU.1.2.2
More informationGTC 2017 Green Flash Persistent Kernel : Real-Time, Low-Latency and HighPerformance Computation on Pascal. Julien BERNARD
Green Flash Persistent Kernel : Real-Time, Low-Latency and HighPerformance Computation on Pascal Julien BERNARD Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in
More informationEnergy efficient real-time computing for extremely large telescopes with GPU
Energy efficient real-time computing for extremely large telescopes with GPU Florian Ferreira & Damien Gratadour Observatoire de Paris & Université Paris Diderot 1 Project #671662 funded by European Commission
More informationDriving the next generation of Extremely Large Telescopes using Adaptive Optics with GPUs
Driving the next generation of Extremely Large Telescopes using Adaptive Optics with GPUs Damien Gratadour LESIA, Observatoire de Paris Université Paris Diderot LESIA, Observatoire de Paris ANR grant ANR-12-MONU-0022
More informationDown selecting suitable manycore technologies for the ELT AO RTC. David Barr, Alastair Basden, Nigel Dipper and Noah Schwartz
Down selecting suitable manycore technologies for the ELT AO RTC David Barr, Alastair Basden, Nigel Dipper and Noah Schwartz GFLOPS RTC for AO workshop 27/01/2016 AO RTC Complexity 1.E+05 1.E+04 E-ELT
More informationFPGA based microserver for high performance real-time computing in Adaptive Optics
FPGA based microserver for high performance real-time computing in Adaptive Optics C. Patauner a, R. Biasi a, M. Andrighettoni a, G. Angerer a, D. Pescoller a, F. Porta a, D. Gratadour b a Microgate Srl,
More informationMartin Dubois, ing. Contents
Martin Dubois, ing Contents Without OpenNet vs With OpenNet Technical information Possible applications Artificial Intelligence Deep Packet Inspection Image and Video processing Network equipment development
More informationdcuda: Distributed GPU Computing with Hardware Overlap
dcuda: Distributed GPU Computing with Hardware Overlap Tobias Gysi, Jeremia Bär, Lukas Kuster, and Torsten Hoefler GPU computing gained a lot of popularity in various application domains weather & climate
More informationThe SPARTA Platform: Design, Status and. Adaptive Optics Systems (ESO)
The SPARTA Platform: Design, Status and Perspectives Marcos acossuárez Valles aes Adaptive Optics Systems (ESO) msuarez@eso.orgorg SPARTA Platform Targets ESO Standard Platform for Adaptive Optics Real-Time
More information6.9. Communicating to the Outside World: Cluster Networking
6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and
More informationStudying GPU based RTC for TMT NFIRAOS
Studying GPU based RTC for TMT NFIRAOS Lianqi Wang Thirty Meter Telescope Project RTC Workshop Dec 04, 2012 1 Outline Tomography with iterative algorithms on GPUs Matri vector multiply approach Assembling
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More information27 March 2018 Mikael Arguedas and Morgan Quigley
27 March 2018 Mikael Arguedas and Morgan Quigley Separate devices: (prototypes 0-3) Unified camera: (prototypes 4-5) Unified system: (prototypes 6+) USB3 USB Host USB3 USB2 USB3 USB Host PCIe root
More informationELT-scale real-time control on Intel Xeon Phi and many core CPUs
ELT-scale real-time control on Intel Xeon Phi and many core CPUs David R. Jenkins, Alastair G. Basden, and Richard M. Myers CfAI, Department of Physics, Durham University, DH1 3LE, UK ABSTRACT The next
More informationEfficient CPU GPU data transfers CUDA 6.0 Unified Virtual Memory
Institute of Computational Science Efficient CPU GPU data transfers CUDA 6.0 Unified Virtual Memory Juraj Kardoš (University of Lugano) July 9, 2014 Juraj Kardoš Efficient GPU data transfers July 9, 2014
More informationEnergy-Efficient Data Transfers in Radio Astronomy with Software UDP RDMA Third Workshop on Innovating the Network for Data-Intensive Science, INDIS16
Energy-Efficient Data Transfers in Radio Astronomy with Software UDP RDMA Third Workshop on Innovating the Network for Data-Intensive Science, INDIS16 Przemek Lenkiewicz, Researcher@IBM Netherlands Bernard
More informationRealtime Signal Processing on Embedded GPUs
Realtime Signal Processing on Embedded s Dr. Matthias Rosenthal Armin Weiss Dr. Amin Mazloumian Institute of Embedded Systems Realtime Platforms Research Group Zurich University of Applied Sciences Motivation
More informationMPI + X programming. UTK resources: Rho Cluster with GPGPU George Bosilca CS462
MPI + X programming UTK resources: Rho Cluster with GPGPU https://newton.utk.edu/doc/documentation/systems/rhocluster George Bosilca CS462 MPI Each programming paradigm only covers a particular spectrum
More informationThe NE010 iwarp Adapter
The NE010 iwarp Adapter Gary Montry Senior Scientist +1-512-493-3241 GMontry@NetEffect.com Today s Data Center Users Applications networking adapter LAN Ethernet NAS block storage clustering adapter adapter
More informationDCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture
DCS-ctrl: A Fast and Flexible ice-control Mechanism for ice-centric Server Architecture Dongup Kwon 1, Jaehyung Ahn 2, Dongju Chae 2, Mohammadamin Ajdari 2, Jaewon Lee 1, Suheon Bae 1, Youngsok Kim 1,
More informationMoneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010
Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed
More informationFlexible Architecture Research Machine (FARM)
Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense
More informationVirtualization, Xen and Denali
Virtualization, Xen and Denali Susmit Shannigrahi November 9, 2011 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 1 / 70 Introduction Virtualization is the technology to allow two
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationBuilding Real-Time Professional Visualization Solutions on GPUs. Kristof Denolf Samuel Maroy Ronny Dewaele
Building Real-Time Professional Visualization Solutions on GPUs Kristof Denolf Samuel Maroy Ronny Dewaele Page 2 Outline Barco s professional visualization solutions The need for performance portability
More informationMatrox Imaging White Paper
Reliable high bandwidth video capture with Matrox Radient Abstract The constant drive for greater analysis resolution and higher system throughput results in the design of vision systems with multiple
More informationEnd-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet
Hot Interconnects 2014 End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet Green Platform Research Laboratories, NEC, Japan J. Suzuki, Y. Hayashi, M. Kan, S. Miyakawa,
More informationGPU-centric communication for improved efficiency
GPU-centric communication for improved efficiency Benjamin Klenk *, Lena Oden, Holger Fröning * * Heidelberg University, Germany Fraunhofer Institute for Industrial Mathematics, Germany GPCDP Workshop
More informationI/O Systems. Jo, Heeseung
I/O Systems Jo, Heeseung Today's Topics Device characteristics Block device vs. Character device Direct I/O vs. Memory-mapped I/O Polling vs. Interrupts Programmed I/O vs. DMA Blocking vs. Non-blocking
More informationNVIDIA GPUDirect Technology. NVIDIA Corporation 2011
NVIDIA GPUDirect Technology NVIDIA GPUDirect : Eliminating CPU Overhead Accelerated Communication with Network and Storage Devices Peer-to-Peer Communication Between GPUs Direct access to CUDA memory for
More informationLegUp: Accelerating Memcached on Cloud FPGAs
0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are
More informationNVM PCIe Networked Flash Storage
NVM PCIe Networked Flash Storage Peter Onufryk Microsemi Corporation Santa Clara, CA 1 PCI Express (PCIe) Mid-range/High-end Specification defined by PCI-SIG Software compatible with PCI and PCI-X Reliable,
More informationAccelerating Data Centers Using NVMe and CUDA
Accelerating Data Centers Using NVMe and CUDA Stephen Bates, PhD Technical Director, CSTO, PMC-Sierra Santa Clara, CA 1 Project Donard @ PMC-Sierra Donard is a PMC CTO project that leverages NVM Express
More informationIntroduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec
Introduction I/O 1 I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections I/O Device Summary I/O 2 I/O System
More informationPCIe 10G SFP+ Network Card
PCIe 10G SFP+ Network Card User Manual Ver. 1.00 All brand names and trademarks are properties of their respective owners. Contents: Chapter 1: Introduction... 3 1.1 Product Introduction... 3 1.2 Features...
More informationASTRI/CTA data analysis on parallel and low-power platforms
ICT Workshop INAF, Cefalù 2015 Universidade de São Paulo Instituto de Astronomia, Geofisica e Ciencias Atmosferica ASTRI/CTA data analysis on parallel and low-power platforms Alberto Madonna, Michele Mastropietro
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU
More informationInterconnection Network for Tightly Coupled Accelerators Architecture
Interconnection Network for Tightly Coupled Accelerators Architecture Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku, Mitsuhisa Sato Center for Computational Sciences University of Tsukuba, Japan 1 What
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices
More informationStorage. Hwansoo Han
Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics
More informationAn Intelligent NIC Design Xin Song
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) An Intelligent NIC Design Xin Song School of Electronic and Information Engineering Tianjin Vocational
More informationMark Falco Oracle Coherence Development
Achieving the performance benefits of Infiniband in Java Mark Falco Oracle Coherence Development 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationCUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni
CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3
More informationDESIGN AND TESTING OF GPU BASED RTC FOR TMT NFIRAOS
Florence, Italy. Adaptive May 2013 Optics for Extremely Large Telescopes III ISBN: 978-88-908876-0-4 DOI: 10.12839/AO4ELT3.13172 DESIGN AND TESTING OF GPU BASED RTC FOR TMT NFIRAOS Lianqi Wang 1,a, 1 Thirty
More informationReducing adaptive optics latency using Xeon Phi many-core processors
doi:10.1093/mnras/stv1813 Reducing adaptive optics latency using Xeon Phi many-core processors David Barr, 1,2 Alastair Basden, 3 Nigel Dipper 3 andnoahschwartz 1 1 UK Astronomy Technology Centre, Royal
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationAsynchronous Peer-to-Peer Device Communication
13th ANNUAL WORKSHOP 2017 Asynchronous Peer-to-Peer Device Communication Feras Daoud, Leon Romanovsky [ 28 March, 2017 ] Agenda Peer-to-Peer communication PeerDirect technology PeerDirect and PeerDirect
More informationTECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS. Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016
TECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016 MULTI GPU PROGRAMMING Node 0 Node 1 Node N-1 MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM
More informationVideo capture using GigE Vision with MIL. What is GigE Vision
What is GigE Vision GigE Vision is fundamentally a standard for transmitting video from a camera (see Figure 1) or similar device over Ethernet and is primarily intended for industrial imaging applications.
More informationImplementing Ultra Low Latency Data Center Services with Programmable Logic
Implementing Ultra Low Latency Data Center Services with Programmable Logic John W. Lockwood, CEO: Algo-Logic Systems, Inc. http://algo-logic.com Solutions@Algo-Logic.com (408) 707-3740 2255-D Martin Ave.,
More informationParallel waveform extraction algorithms for the Cherenkov Telescope Array Real-Time Analysis
Parallel waveform extraction algorithms for the Cherenkov Telescope Array Real-Time Analysis, a, Andrea Bulgarelli a, Adriano De Rosa a, Alessio Aboudan a, Valentina Fioretti a, Giovanni De Cesare a, Ramin
More informationComputer Systems Laboratory Sungkyunkwan University
I/O System Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction (1) I/O devices can be characterized by Behavior: input, output, storage
More informationEXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab
EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT Konstantinos Alexopoulos ECE NTUA CSLab MOTIVATION HPC, Multi-node & Heterogeneous Systems Communication with low latency
More informationEuropean Organization for Astronomical Research in the Southern
State-of-the-art detector controller for ESO instruments Leander H. Mehrgan, Domingo Alvarez, Dietrich Baade, Claudio Cumani, Siegfried Eschbaumer, Gert Finger, Christoph Geimer, Derek Ives, Manfred Meyer,
More informationExploring System Coherency and Maximizing Performance of Mobile Memory Systems
Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech
More informationChapter 6. Storage and Other I/O Topics
Chapter 6 Storage and Other I/O Topics Introduction I/O devices can be characterized by Behaviour: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections
More informationNext Generation Enterprise Solutions from ARM
Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the
More information7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT
7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT Draft Printed for SECO Murex S.A.S 2012 all rights reserved Murex Analytics Only global vendor of trading, risk management and processing systems focusing also
More informationTightly Coupled Accelerators Architecture
Tightly Coupled Accelerators Architecture Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba, Japan 1 What is Tightly Coupled Accelerators
More informationFast End-to-End Multi-Conjugate AO Simulations Using Graphical Processing Units and the MAOS Simulation Code
Fast End-to-End Multi-Conjugate AO Simulations Using Graphical Processing Units and the MAOS Simulation Code Lianqi Wang 1a and Brent Ellerbroek 1 TMT Observatory Corportaion, 1111 South Arroyo Pkwy Suite
More informationInitial Evaluation of a User-Level Device Driver Framework
Initial Evaluation of a User-Level Device Driver Framework Stefan Götz Karlsruhe University Germany sgoetz@ira.uka.de Kevin Elphinstone National ICT Australia University of New South Wales kevine@cse.unsw.edu.au
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Building systems with GPUs is hard. Why? 2 Goal of
More informationThe Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers
The Missing Piece of Virtualization I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers Agenda 10 GbE Adapters Built for Virtualization I/O Throughput: Virtual & Non-Virtual Servers Case
More informationHigh-Speed NAND Flash
High-Speed NAND Flash Design Considerations to Maximize Performance Presented by: Robert Pierce Sr. Director, NAND Flash Denali Software, Inc. History of NAND Bandwidth Trend MB/s 20 60 80 100 200 The
More informationI/O 1. Devices and I/O. key concepts device registers, device drivers, program-controlled I/O, DMA, polling, disk drives, disk head scheduling
I/O 1 Devices and I/O key concepts device registers, device drivers, program-controlled I/O, DMA, polling, disk drives, disk head scheduling reading Three Easy Pieces: Chapters 36-37 I/O 2 Sys/161 Device
More informationGPUfs: Integrating a file system with GPUs
ASPLOS 2013 GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications
More informationChapter 2 Computer-System Structure
Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual
More informationMessaging Overview. Introduction. Gen-Z Messaging
Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional
More informationOpenOnload. Dave Parry VP of Engineering Steve Pope CTO Dave Riddoch Chief Software Architect
OpenOnload Dave Parry VP of Engineering Steve Pope CTO Dave Riddoch Chief Software Architect Copyright 2012 Solarflare Communications, Inc. All Rights Reserved. OpenOnload Acceleration Software Accelerated
More informationIsoStack Highly Efficient Network Processing on Dedicated Cores
IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single
More informationRealtime Signal Processing on Nvidia TX2 using CUDA
Realtime Signal Processing on Nvidia TX2 using CUDA Armin Weiss Dr. Amin Mazloumian Dr. Matthias Rosenthal Institute of Embedded Systems High Performance Multimedia Research Group Zurich University of
More information2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.
Ethernet Storage Fabrics Using RDMA with Fast NVMe-oF Storage to Reduce Latency and Improve Efficiency Kevin Deierling & Idan Burstein Mellanox Technologies 1 Storage Media Technology Storage Media Access
More informationRDMA and Hardware Support
RDMA and Hardware Support SIGCOMM Topic Preview 2018 Yibo Zhu Microsoft Research 1 The (Traditional) Journey of Data How app developers see the network Under the hood This architecture had been working
More informationTracing embedded heterogeneous systems
Tracing embedded heterogeneous systems P R O G R E S S R E P O R T M E E T I N G, D E C E M B E R 2015 T H O M A S B E R T A U L D D I R E C T E D B Y M I C H E L D A G E N A I S December 10th 2015 TRACING
More informationAn NVMe-based Offload Engine for Storage Acceleration Sean Gibb, Eideticom Stephen Bates, Raithlin
An NVMe-based Offload Engine for Storage Acceleration Sean Gibb, Eideticom Stephen Bates, Raithlin 1 Overview Acceleration for Storage NVMe for Acceleration How are we using (abusing ;-)) NVMe to support
More informationGPU acceleration on IB clusters. Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake
GPU acceleration on IB clusters Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake HPC Advisory Council European Workshop 2011 Why it matters? (Single node GPU acceleration) Control
More informationProgrammable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures
Programmable Logic Design Grzegorz Budzyń Lecture 15: Advanced hardware in FPGA structures Plan Introduction PowerPC block RocketIO Introduction Introduction The larger the logical chip, the more additional
More informationHow Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC
How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator
More informationHA-PACS/TCA: Tightly Coupled Accelerators for Low-Latency Communication between GPUs
HA-PACS/TCA: Tightly Coupled Accelerators for Low-Latency Communication between GPUs Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba,
More informationReal-Time Support for GPU. GPU Management Heechul Yun
Real-Time Support for GPU GPU Management Heechul Yun 1 This Week Topic: Real-Time Support for General Purpose Graphic Processing Unit (GPGPU) Today Background Challenges Real-Time GPU Management Frameworks
More informationUpdate on PRad GEMs, Readout Electronics & DAQ
Update on PRad GEMs, Readout Electronics & DAQ Kondo Gnanvo University of Virginia, Charlottesville, VA Outline PRad GEMs update Upgrade of SRS electronics Integration into JLab DAQ system Cosmic tests
More informationEfficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han
Efficient Memory Mapped File I/O for In-Memory File Systems Jungsik Choi, Jiwon Kim, Hwansoo Han Operations Per Second Storage Latency Close to DRAM SATA/SAS Flash SSD (~00μs) PCIe Flash SSD (~60 μs) D-XPoint
More informationNTRDMA v0.1. An Open Source Driver for PCIe NTB and DMA. Allen Hubbe at Linux Piter 2015 NTRDMA. Messaging App. IB Verbs. dmaengine.h ntb.
Messaging App IB Verbs NTRDMA dmaengine.h ntb.h DMA DMA DMA NTRDMA v0.1 An Open Source Driver for PCIe and DMA Allen Hubbe at Linux Piter 2015 1 INTRODUCTION Allen Hubbe Senior Software Engineer EMC Corporation
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017
ECE 550D Fundamentals of Computer Systems and Engineering Fall 2017 Input/Output (IO) Prof. John Board Duke University Slides are derived from work by Profs. Tyler Bletsch and Andrew Hilton (Duke) IO:
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More informationGemini Observatory. Multi-Conjugate Adaptive Optics Control System. The Gemini MCAO System (ICALEPCS, Geneva, October 2005) 1
Gemini Observatory Multi-Conjugate Adaptive Optics Control System The Gemini MCAO System (ICALEPCS, Geneva, October 2005) 1 The Gemini MCAO System Andy Foster Observatory Sciences Ltd William James House,
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning
More informationPC-based data acquisition I
FYS3240 PC-based instrumentation and microcontrollers PC-based data acquisition I Spring 2016 Lecture #8 Bekkeng, 20.01.2016 General-purpose computer With a Personal Computer (PC) we mean a general-purpose
More informationThe Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration
The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration Dell IP Video Platform Design and Calibration Lab June 2018 H17415 Reference Architecture Dell EMC Solutions Copyright
More information10GE network tests with UDP. Janusz Szuba European XFEL
10GE network tests with UDP Janusz Szuba European XFEL Outline 2 Overview of initial DAQ architecture Slice test hardware specification Initial networking test results DAQ software UDP tests Summary 10GE
More informationLECTURE ON PASCAL GPU ARCHITECTURE. Jiri Kraus, November 14 th 2016
LECTURE ON PASCAL GPU ARCHITECTURE Jiri Kraus, November 14 th 2016 ACCELERATED COMPUTING CPU Optimized for Serial Tasks GPU Accelerator Optimized for Parallel Tasks 2 ACCELERATED COMPUTING CPU Optimized
More informationGPUs as better MPI Citizens
s as better MPI Citizens Author: Dale Southard, NVIDIA Date: 4/6/2011 www.openfabrics.org 1 Technology Conference 2011 October 11-14 San Jose, CA The one event you can t afford to miss Learn about leading-edge
More informationOpenMPDK and unvme User Space Device Driver for Server and Data Center
OpenMPDK and unvme User Space Device Driver for Server and Data Center Open source for maximally utilizing Samsung s state-of-art Storage Solution in shorter development time White Paper 2 Target Audience
More informationQuickSpecs. HP Z 10GbE Dual Port Module. Models
Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or
More informationNLVMUG 16 maart Display protocols in Horizon
NLVMUG 16 maart 2017 Display protocols in Horizon NLVMUG 16 maart 2017 Display protocols in Horizon Topics Introduction Display protocols - Basics PCoIP vs Blast Extreme Optimizing Monitoring Future Recap
More informationWearDrive: Fast and Energy Efficient Storage for Wearables
WearDrive: Fast and Energy Efficient Storage for Wearables Reza Shisheie Cleveland State University CIS 601 Wearable Computing: A New Era 2 Wearable Computing: A New Era Notifications Fitness/Healthcare
More informationPCI Express High Speed Fabric. A complete solution for high speed system connectivity
PCI Express High Speed Fabric A complete solution for high speed system connectivity PCI Express Fabric Introduction Maximizing application performance is a combination of processing, communication, and
More information