Down selecting suitable manycore technologies for the ELT AO RTC. David Barr, Alastair Basden, Nigel Dipper and Noah Schwartz

Size: px
Start display at page:

Download "Down selecting suitable manycore technologies for the ELT AO RTC. David Barr, Alastair Basden, Nigel Dipper and Noah Schwartz"

Transcription

1 Down selecting suitable manycore technologies for the ELT AO RTC David Barr, Alastair Basden, Nigel Dipper and Noah Schwartz

2 GFLOPS RTC for AO workshop 27/01/2016 AO RTC Complexity 1.E+05 1.E+04 E-ELT EPICS 1.E+03 IFS LTAO MOS 1.E+02 VLT AOF IFS SCAO 1.E+01 SPHERE 1.E Year System Telescope Type Channels WFS sub-aps Frequency (Hz) SCAO 1 74x IFS E-ELT LTAO 6 74x MOS E-ELT MOAO 10 74x74 250

3 RTC for AO workshop 27/01/2016 Typical RTC and hardware Wavefront sensor camera Real-time control computer

4 RTC for AO workshop 27/01/2016 Typical RTC and hardware Tilera?

5 RTC for AO workshop 27/01/2016 Typical RTC and hardware Xeon Phi?

6 RTC for AO workshop 27/01/2016 Tilera WF Pixel Processing Pixel calibration Sub Aperture Processing Simple centre of gravity for a Shack-Hartman WFS. Tested for two scenarios Full frame: Pipelining:

7 RTC for AO workshop 27/01/2016 Tilera - Tile Gx-36 Multiple 10 Gbps Ethernet ports 9,16,36,72 cores 1.2 GHz Uses a C/C++ compiler; abstraction, portability. Zero Overhead Linux (ZOL) mode. ZOL mode prevents Linux system level calls on specific cores.

8 RTC for AO workshop 27/01/2016 Full Frame: Mean execution time 74x74 (16x16): 1764 µs Detector: 1200 x x74 (10x10): 734 µs Detector: 800 x 800 (e.g. E-ELT MOS single channel) 74x74 (6x6): 265 µs Detector: 500 x 500 (e.g. E-ELT IFS SCAO)

9 RTC for AO workshop 27/01/2016 Full Frame: Stability σ = 1.28 µs Detector approx. 500 x 500 Execution time 265±6 µs

10 Pipelining To achieve the best performance the pixel processing is started as soon as a row of sub-apertures has arrived. WF processing delay Detector approx. 500 x 500 WF processing delay <50 µs RTC for AO workshop 27/01/2016

11 RTC for AO workshop 27/01/2016 Company Stability and Direction EZchip bought by Mellanox. Facebook has bought some for testing and evaluating. Future of the Tilera cards seem stable.

12 RTC for AO workshop 27/01/2016 Matrix Vector Multiplication Wavefront sensor camera Real-time control computer We are only looking at the Matrix Vector Multiplication (MVM) for control calculation. MVM for E-ELT first light instruments has the highest computation complexity increase with O(D 4 ). MVM is a memory bandwidth limited routine.

13 Xeon Phi Mean performance (MVM) Connects via PCIe (accelerator card, similar to GPUs) Easy to program (similar to CPUs) Good performance Large number of cores (60) High memory bandwidth (320 GB/s) RTC for AO workshop 27/01/2016

14 Xeon Phi Stability (MVM) Good scalability Multiple Xeon Phis allows speed up by approx Poor stability Due to how the data transfer over PCIe is handled More details in (Barr et al, MNRAS 2015) RTC for AO workshop 27/01/2016

15 Memory Bandwidth Dual Xeon E (CPU) NVIDIA K40 (1) Xeon Phi 5110p K80 (2) Next Gen. Xeon Phi (3) Advertised (GB/s) 2x ~500 Achieved (GB/s) Percentage ~ % 79.5 % 52.0% Low High 250 >400 (1) Reguly I. Z. et al, PMAM 2014 (2) Deakin. T. et al, (2015) (3) Intel datasheet Next Generation Xeon Phi moving to a integrated CPU Removing the need for transfer data over PCIe Assumption for next gen: same achievable memory BW Low: 50% of memory Bandwidth Achievable. High: Intel s benchmark of >400 GB/s RTC for AO workshop 27/01/2016

16 Xeon Phi Next gen. performance RTC for AO workshop 27/01/2016

17 RTC for AO workshop 27/01/2016 Xeon Phi Next gen. performance 40 x 40 (µs) 74 x 74 (µs) Xeon Phi 5110p Next Gen Xeon Phi Low High K K

18 Xeon Phi: Power Consumption Processor Release Power Max (Watts) Intel Xeon Phi 5110p Q Intel Xeon Phi (Next Gen.) Intel Xeon E5-2650V3 Q NVIDIA K40 Q NVIDIA K80 Q Tile-Gx36 Q Tile-Mx >30(?) Next generation Xeon Phi Moving to Intel Atom Cores Reducing power W while increasing performance. RTC for AO workshop 27/01/2016

19 Tilera: Power Consumption Processor Release Power Max (Watts) Intel Xeon Phi 5110p Q Intel Xeon Phi (Next Gen.) Intel Xeon E5-2650V3 Q NVIDIA K40 Q NVIDIA K80 Q Tile-Gx36 Q Tile-Mx >30(?) Next generation Tilera Moving to ARM processors Known for low power (~300 mw per core) RTC for AO workshop 27/01/2016

20 Overall performance estimate Example SCAO E-ELT first light instrument Valid sub-apertures: ~4K (74x74) Detector approx. 500x500 (6x6 per sub-aperture) Latency requirement: 1500 µs Current Gen. Next Gen. Image process & centre of gravity: TILERA 50 µs <50 µs MVM: Single Xeon Phi (Dual Xeon Phi) 1140 µs (850 µs) Low 500 µs High 320 µs Time available for rest of loop ~950 µs 1500 µs RTC for AO workshop 27/01/2016

21 Conclusions TILERA Programmability/portability similar to CPU Very good stability Next gen. will have more memory BW and cores ELT ready! Xeon Phi Programmability/portability similar to CPU Poor stability mainly due to data transfer Next gen. More memory bandwidth No transfer (i.e. better stability) Should be ELT ready (needs testing) RTC for AO workshop 27/01/2016

22 Thanks for Listening Any Questions

Reducing adaptive optics latency using Xeon Phi many-core processors

Reducing adaptive optics latency using Xeon Phi many-core processors doi:10.1093/mnras/stv1813 Reducing adaptive optics latency using Xeon Phi many-core processors David Barr, 1,2 Alastair Basden, 3 Nigel Dipper 3 andnoahschwartz 1 1 UK Astronomy Technology Centre, Royal

More information

ELT-scale real-time control on Intel Xeon Phi and many core CPUs

ELT-scale real-time control on Intel Xeon Phi and many core CPUs ELT-scale real-time control on Intel Xeon Phi and many core CPUs David R. Jenkins, Alastair G. Basden, and Richard M. Myers CfAI, Department of Physics, Durham University, DH1 3LE, UK ABSTRACT The next

More information

The SPARTA Platform: Design, Status and. Adaptive Optics Systems (ESO)

The SPARTA Platform: Design, Status and. Adaptive Optics Systems (ESO) The SPARTA Platform: Design, Status and Perspectives Marcos acossuárez Valles aes Adaptive Optics Systems (ESO) msuarez@eso.orgorg SPARTA Platform Targets ESO Standard Platform for Adaptive Optics Real-Time

More information

A Real Time Controller for E-ELT

A Real Time Controller for E-ELT A Real Time Controller for E-ELT Addressing the jitter/latency constraints Maxime Lainé, Denis Perret LESIA / Observatoire de Paris Project #671662 funded by European Commission under program H2020-EU.1.2.2

More information

A Real Time Controller for E-ELT

A Real Time Controller for E-ELT A Real Time Controller for E-ELT Addressing the jitter/latency constraints Maxime Lainé, Denis Perret LESIA / Observatoire de Paris Project #671662 funded by European Commission under program H2020-EU.1.2.2

More information

FPGA based microserver for high performance real-time computing in Adaptive Optics

FPGA based microserver for high performance real-time computing in Adaptive Optics FPGA based microserver for high performance real-time computing in Adaptive Optics C. Patauner a, R. Biasi a, M. Andrighettoni a, G. Angerer a, D. Pescoller a, F. Porta a, D. Gratadour b a Microgate Srl,

More information

GTC 2017 Green Flash Persistent Kernel : Real-Time, Low-Latency and HighPerformance Computation on Pascal. Julien BERNARD

GTC 2017 Green Flash Persistent Kernel : Real-Time, Low-Latency and HighPerformance Computation on Pascal. Julien BERNARD Green Flash Persistent Kernel : Real-Time, Low-Latency and HighPerformance Computation on Pascal Julien BERNARD Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in

More information

Studying GPU based RTC for TMT NFIRAOS

Studying GPU based RTC for TMT NFIRAOS Studying GPU based RTC for TMT NFIRAOS Lianqi Wang Thirty Meter Telescope Project RTC Workshop Dec 04, 2012 1 Outline Tomography with iterative algorithms on GPUs Matri vector multiply approach Assembling

More information

DESIGN AND TESTING OF GPU BASED RTC FOR TMT NFIRAOS

DESIGN AND TESTING OF GPU BASED RTC FOR TMT NFIRAOS Florence, Italy. Adaptive May 2013 Optics for Extremely Large Telescopes III ISBN: 978-88-908876-0-4 DOI: 10.12839/AO4ELT3.13172 DESIGN AND TESTING OF GPU BASED RTC FOR TMT NFIRAOS Lianqi Wang 1,a, 1 Thirty

More information

10. Adapter-Rotator Requirements

10. Adapter-Rotator Requirements 10. 479 10.1 Requirements The adapter-rotator is the interface between the telescope and an instrument. It has to fulfil the following requirements. Provide the sensors for the wavefront control. These

More information

Fast End-to-End Multi-Conjugate AO Simulations Using Graphical Processing Units and the MAOS Simulation Code

Fast End-to-End Multi-Conjugate AO Simulations Using Graphical Processing Units and the MAOS Simulation Code Fast End-to-End Multi-Conjugate AO Simulations Using Graphical Processing Units and the MAOS Simulation Code Lianqi Wang 1a and Brent Ellerbroek 1 TMT Observatory Corportaion, 1111 South Arroyo Pkwy Suite

More information

n N c CIni.o ewsrg.au

n N c CIni.o ewsrg.au @NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU

More information

An FPGA-based High Speed Parallel Signal Processing System for Adaptive Optics Testbed

An FPGA-based High Speed Parallel Signal Processing System for Adaptive Optics Testbed An FPGA-based High Speed Parallel Signal Processing System for Adaptive Optics Testbed Hong Bong Kim 1 Hanwha Thales. Co., Ltd. Republic of Korea Young Soo Choi and Yu Kyung Yang Agency for Defense Development,

More information

Driving the next generation of Extremely Large Telescopes using Adaptive Optics with GPUs

Driving the next generation of Extremely Large Telescopes using Adaptive Optics with GPUs Driving the next generation of Extremely Large Telescopes using Adaptive Optics with GPUs Damien Gratadour LESIA, Observatoire de Paris Université Paris Diderot LESIA, Observatoire de Paris ANR grant ANR-12-MONU-0022

More information

Game-changing Extreme GPU computing with The Dell PowerEdge C4130

Game-changing Extreme GPU computing with The Dell PowerEdge C4130 Game-changing Extreme GPU computing with The Dell PowerEdge C4130 A Dell Technical White Paper This white paper describes the system architecture and performance characterization of the PowerEdge C4130.

More information

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further

More information

The rcuda middleware and applications

The rcuda middleware and applications The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,

More information

Portable Power/Performance Benchmarking and Analysis with WattProf

Portable Power/Performance Benchmarking and Analysis with WattProf Portable Power/Performance Benchmarking and Analysis with WattProf Amir Farzad, Boyana Norris University of Oregon Mohammad Rashti RNET Technologies, Inc. Motivation Energy efficiency is becoming increasingly

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

Quasi-real-time end-to-end adaptive optics simulations at the E-ELT scale

Quasi-real-time end-to-end adaptive optics simulations at the E-ELT scale Quasi-real-time end-to-end adaptive optics simulations at the E-ELT scale Damien Gratadour 1a, Arnaud Sevin 1, Eric Gendron 1, and Gerard Rousset 1 Laboratoire d Etudes Spatiales et d Instrumentation en

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

DIAMOND RINGS ACKNOWLEDGED EVENT PROPAGATION IN MANY-CORE PROCESSORS

DIAMOND RINGS ACKNOWLEDGED EVENT PROPAGATION IN MANY-CORE PROCESSORS th August DIAMOND RINGS ACKNOWLEDGED EVENT PROPAGATION IN MANY-CORE PROCESSORS Stefan Nürnberger, Randolf Rotta, Gabor Drescher, Daniel Danner, Jörg Nolte ACKNOWLEDGED EVENT PROPAGATION What does it do?

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization

More information

Energy efficient real-time computing for extremely large telescopes with GPU

Energy efficient real-time computing for extremely large telescopes with GPU Energy efficient real-time computing for extremely large telescopes with GPU Florian Ferreira & Damien Gratadour Observatoire de Paris & Université Paris Diderot 1 Project #671662 funded by European Commission

More information

Tile Processor (TILEPro64)

Tile Processor (TILEPro64) Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth

More information

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration

More information

ParaFormance TM : An Advanced Refactoring Tool for Parallelising C++ Programs Part 1

ParaFormance TM : An Advanced Refactoring Tool for Parallelising C++ Programs Part 1 ParaFormance TM : An Advanced Refactoring Tool for Parallelising C++ Programs Part 1 Chris Brown, Vladimir Janjic, Kenneth MacKenzie, Kevin Hammond University of St Andrews, Scotland @chrismarkbrown @rephrase_eu

More information

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1

More information

Simplify System Complexity

Simplify System Complexity 1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

NI Smart Cameras PRODUCT FLYER CONTENTS. Have a question? Contact Us.

NI Smart Cameras PRODUCT FLYER CONTENTS. Have a question? Contact Us. Have a question? Contact Us. PRODUCT FLYER NI Smart Cameras CONTENTS NI Smart Cameras Detailed View of ISC-178x Key Features Vision Software Hardware Services Page 1 ni.com NI Smart Cameras NI Smart Cameras

More information

E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè. ARM64 and GPGPU

E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè. ARM64 and GPGPU E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè ARM64 and GPGPU 1 E4 Computer Engineering Company E4 Computer Engineering S.p.A. specializes in the manufacturing of high performance IT systems of medium

More information

OCTOPUS Performance Benchmark and Profiling. June 2015

OCTOPUS Performance Benchmark and Profiling. June 2015 OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the

More information

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices Jonas Hahnfeld 1, Christian Terboven 1, James Price 2, Hans Joachim Pflug 1, Matthias S. Müller

More information

ASTRI/CTA data analysis on parallel and low-power platforms

ASTRI/CTA data analysis on parallel and low-power platforms ICT Workshop INAF, Cefalù 2015 Universidade de São Paulo Instituto de Astronomia, Geofisica e Ciencias Atmosferica ASTRI/CTA data analysis on parallel and low-power platforms Alberto Madonna, Michele Mastropietro

More information

World s most advanced data center accelerator for PCIe-based servers

World s most advanced data center accelerator for PCIe-based servers NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying

More information

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB

More information

Control Center 15 Performance Reference Guide

Control Center 15 Performance Reference Guide Control Center 15 Performance Reference Guide Control Center front-end application This guide provides information about Control Center 15 components that may be useful when planning a system. System specifications

More information

Large Scale Debugging

Large Scale Debugging Large Scale Debugging Project Meeting Report - December 2015 Didier Nadeau Under the supervision of Michel Dagenais Distributed Open Reliable Systems Analysis Lab École Polytechnique de Montréal Table

More information

LAMMPSCUDA GPU Performance. April 2011

LAMMPSCUDA GPU Performance. April 2011 LAMMPSCUDA GPU Performance April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory Council

More information

NEW ADVANCES IN GPU LINEAR ALGEBRA

NEW ADVANCES IN GPU LINEAR ALGEBRA GTC 2012: NEW ADVANCES IN GPU LINEAR ALGEBRA Kyle Spagnoli EM Photonics 5/16/2012 QUICK ABOUT US» HPC/GPU Consulting Firm» Specializations in:» Electromagnetics» Image Processing» Fluid Dynamics» Linear

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated

More information

Simplify System Complexity

Simplify System Complexity Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint

More information

TECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS. Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016

TECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS. Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016 TECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016 MULTI GPU PROGRAMMING Node 0 Node 1 Node N-1 MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM

More information

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big

More information

Technological Developments at ESO. Adrian Russell Director of Programmes

Technological Developments at ESO. Adrian Russell Director of Programmes Technological Developments at ESO Adrian Russell Director of Programmes Philosophy Delivery of maximum science Need to be at cutting edge of technology in key areas (detectors, AO, optics, lasers) Risk

More information

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker CUDA on ARM Update Developing Accelerated Applications on ARM Bas Aarts and Donald Becker CUDA on ARM: a forward-looking development platform for high performance, energy efficient hybrid computing It

More information

Increasing the efficiency of your GPU-enabled cluster with rcuda. Federico Silla Technical University of Valencia Spain

Increasing the efficiency of your GPU-enabled cluster with rcuda. Federico Silla Technical University of Valencia Spain Increasing the efficiency of your -enabled cluster with rcuda Federico Silla Technical University of Valencia Spain Outline Why remote virtualization? How does rcuda work? The performance of the rcuda

More information

소프트웨어기반고성능침입탐지시스템설계및구현

소프트웨어기반고성능침입탐지시스템설계및구현 소프트웨어기반고성능침입탐지시스템설계및구현 KyoungSoo Park Department of Electrical Engineering, KAIST M. Asim Jamshed *, Jihyung Lee*, Sangwoo Moon*, Insu Yun *, Deokjin Kim, Sungryoul Lee, Yung Yi* Department of Electrical

More information

TFLOP Performance for ANSYS Mechanical

TFLOP Performance for ANSYS Mechanical TFLOP Performance for ANSYS Mechanical Dr. Herbert Güttler Engineering GmbH Holunderweg 8 89182 Bernstadt www.microconsult-engineering.de Engineering H. Güttler 19.06.2013 Seite 1 May 2009, Ansys12, 512

More information

STATE OF THE ART ADAPTIVE OPTICS. Philippe Feautrier WAVEFRONT SENSOR CAMERAS AT FIRST LIGHT IMAGING.

STATE OF THE ART ADAPTIVE OPTICS. Philippe Feautrier WAVEFRONT SENSOR CAMERAS AT FIRST LIGHT IMAGING. STATE OF THE ART ADAPTIVE OPTICS WAVEFRONT SENSOR CAMERAS AT FIRST LIGHT IMAGING Philippe Feautrier philippe.feautrier@firstlight.fr LBTO UM Vis and IR WFS cameras at FLI 1 First Light Imaging: our origins

More information

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes six radio telescope algorithms on

More information

Gemini Observatory. Multi-Conjugate Adaptive Optics Control System. The Gemini MCAO System (ICALEPCS, Geneva, October 2005) 1

Gemini Observatory. Multi-Conjugate Adaptive Optics Control System. The Gemini MCAO System (ICALEPCS, Geneva, October 2005) 1 Gemini Observatory Multi-Conjugate Adaptive Optics Control System The Gemini MCAO System (ICALEPCS, Geneva, October 2005) 1 The Gemini MCAO System Andy Foster Observatory Sciences Ltd William James House,

More information

An Alternative to GPU Acceleration For Mobile Platforms

An Alternative to GPU Acceleration For Mobile Platforms Inventing the Future of Computing An Alternative to GPU Acceleration For Mobile Platforms Andreas Olofsson andreas@adapteva.com 50 th DAC June 5th, Austin, TX Adapteva Achieves 3 World Firsts 1. First

More information

Dell EMC PowerEdge R740xd as a Dedicated Milestone Server, Using Nvidia GPU Hardware Acceleration

Dell EMC PowerEdge R740xd as a Dedicated Milestone Server, Using Nvidia GPU Hardware Acceleration Dell EMC PowerEdge R740xd as a Dedicated Milestone Server, Using Nvidia GPU Hardware Acceleration Dell IP Video Platform Design and Calibration Lab June 2018 H17250 Reference Architecture Abstract This

More information

ATS-GPU Real Time Signal Processing Software

ATS-GPU Real Time Signal Processing Software Transfer A/D data to at high speed Up to 4 GB/s transfer rate for PCIe Gen 3 digitizer boards Supports CUDA compute capability 2.0+ Designed to work with AlazarTech PCI Express waveform digitizers Optional

More information

Computing Challenges in Adaptive Optics for the Thirty Meter Telescope. Corinne Boyer ICALEPCS Grenoble, France October 10, 2011

Computing Challenges in Adaptive Optics for the Thirty Meter Telescope. Corinne Boyer ICALEPCS Grenoble, France October 10, 2011 Computing Challenges in Adaptive Optics for the Thirty Meter Telescope Corinne Boyer ICALEPCS Grenoble, France October 10, 2011 1 This Talk Introduction to the Thirty Meter Telescope (TMT) Adaptive Optics

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

Simulation using MIC co-processor on Helios

Simulation using MIC co-processor on Helios Simulation using MIC co-processor on Helios Serhiy Mochalskyy, Roman Hatzky PRACE PATC Course: Intel MIC Programming Workshop High Level Support Team Max-Planck-Institut für Plasmaphysik Boltzmannstr.

More information

The BioHPC Nucleus Cluster & Future Developments

The BioHPC Nucleus Cluster & Future Developments 1 The BioHPC Nucleus Cluster & Future Developments Overview Today we ll talk about the BioHPC Nucleus HPC cluster with some technical details for those interested! How is it designed? What hardware does

More information

A real-time simulation facility for astronomical adaptive optics

A real-time simulation facility for astronomical adaptive optics Advance Access publication 2014 February 18 doi:10.1093/mnras/stu143 A real-time simulation facility for astronomical adaptive optics Alastair Basden Department of Physics, South Road, Durham DH1 3LE,

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPGPU. Peter Laurens 1st-year PhD Student, NSC GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing

More information

Single-Points of Performance

Single-Points of Performance Single-Points of Performance Mellanox Technologies Inc. 29 Stender Way, Santa Clara, CA 9554 Tel: 48-97-34 Fax: 48-97-343 http://www.mellanox.com High-performance computations are rapidly becoming a critical

More information

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction

More information

A GPU Implementation of Tiled Belief Propagation on Markov Random Fields. Hassan Eslami Theodoros Kasampalis Maria Kotsifakou

A GPU Implementation of Tiled Belief Propagation on Markov Random Fields. Hassan Eslami Theodoros Kasampalis Maria Kotsifakou A GPU Implementation of Tiled Belief Propagation on Markov Random Fields Hassan Eslami Theodoros Kasampalis Maria Kotsifakou BP-M AND TILED-BP 2 BP-M 3 Tiled BP T 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 4 Tiled

More information

Contour Detection on Mobile Platforms

Contour Detection on Mobile Platforms Contour Detection on Mobile Platforms Bor-Yiing Su, subrian@eecs.berkeley.edu Prof. Kurt Keutzer, keutzer@eecs.berkeley.edu Parallel Computing Lab, University of California, Berkeley 1/26 Diagnosing Power/Performance

More information

Addressing Heterogeneity in Manycore Applications

Addressing Heterogeneity in Manycore Applications Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction

More information

NAMD GPU Performance Benchmark. March 2011

NAMD GPU Performance Benchmark. March 2011 NAMD GPU Performance Benchmark March 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory

More information

Intel Xeon Phi Coprocessors

Intel Xeon Phi Coprocessors Intel Xeon Phi Coprocessors Reference: Parallel Programming and Optimization with Intel Xeon Phi Coprocessors, by A. Vladimirov and V. Karpusenko, 2013 Ring Bus on Intel Xeon Phi Example with 8 cores Xeon

More information

40K Television. David J. Brady. The Duke Imaging and Spectroscopy Program 1

40K Television. David J. Brady. The Duke Imaging and Spectroscopy Program 1 40K Television David J. Brady The Duke Imaging and Spectroscopy Program 1 The Duke Imaging and Spectroscopy Program 2 Traditional Television is a Porthole The displayed image is a literal pixel by pixel

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

Open Fabrics Workshop 2013

Open Fabrics Workshop 2013 Open Fabrics Workshop 2013 OFS Software for the Intel Xeon Phi Bob Woodruff Agenda Intel Coprocessor Communication Link (CCL) Software IBSCIF RDMA from Host to Intel Xeon Phi Direct HCA Access from Intel

More information

A Multi-Tiered Optimization Framework for Heterogeneous Computing

A Multi-Tiered Optimization Framework for Heterogeneous Computing A Multi-Tiered Optimization Framework for Heterogeneous Computing IEEE HPEC 2014 Alan George Professor of ECE University of Florida Herman Lam Assoc. Professor of ECE University of Florida Andrew Milluzzi

More information

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations D. Zheltkov, N. Zamarashkin INM RAS September 24, 2018 Scalability of Lanczos method Notations Matrix order

More information

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2 CSE 820 Graduate Computer Architecture Richard Enbody Dr. Enbody 1 st Day 2 1 Why Computer Architecture? Improve coding. Knowledge to make architectural choices. Ability to understand articles about architecture.

More information

Hardware NVMe implementation on cache and storage systems

Hardware NVMe implementation on cache and storage systems Hardware NVMe implementation on cache and storage systems Jerome Gaysse, IP-Maker Santa Clara, CA 1 Agenda Hardware architecture NVMe for storage NVMe for cache/application accelerator NVMe for new NVM

More information

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Near Memory Computing Spectral and Sparse Accelerators

Near Memory Computing Spectral and Sparse Accelerators Near Memory Computing Spectral and Sparse Accelerators Franz Franchetti ECE, Carnegie Mellon University www.ece.cmu.edu/~franzf Co-Founder, SpiralGen www.spiralgen.com The work was sponsored by Defense

More information

EyeCheck Smart Cameras

EyeCheck Smart Cameras EyeCheck Smart Cameras 2 3 EyeCheck 9xx & 1xxx series Technical data Memory: DDR RAM 128 MB FLASH 128 MB Interfaces: Ethernet (LAN) RS422, RS232 (not EC900, EC910, EC1000, EC1010) EtherNet / IP PROFINET

More information

rcuda: an approach to provide remote access to GPU computational power

rcuda: an approach to provide remote access to GPU computational power rcuda: an approach to provide remote access to computational power Rafael Mayo Gual Universitat Jaume I Spain (1 of 60) HPC Advisory Council Workshop Outline computing Cost of a node rcuda goals rcuda

More information

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Juan C. Pichel Centro de Investigación en Tecnoloxías da Información (CITIUS) Universidade de Santiago de Compostela, Spain

More information

A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video

A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video Amrita Mazumdar Armin Alaghi Jonathan T. Barron David Gallup Luis Ceze Mark Oskin Steven M. Seitz University of Washington Google

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

OpenPOWER Performance

OpenPOWER Performance OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER IBM SOFTWARE LINUX ECOSYSTEM OPEN SOURCE Solutions with full stack

More information

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid

More information

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

EE 7722 GPU Microarchitecture. Offered by: Prerequisites By Topic: Text EE 7722 GPU Microarchitecture. URL:

EE 7722 GPU Microarchitecture. Offered by: Prerequisites By Topic: Text EE 7722 GPU Microarchitecture. URL: 00 1 EE 7722 GPU Microarchitecture 00 1 EE 7722 GPU Microarchitecture URL: http://www.ece.lsu.edu/gp/. Offered by: David M. Koppelman 345 ERAD, 578-5482, koppel@ece.lsu.edu, http://www.ece.lsu.edu/koppel

More information

THE LEADER IN VISUAL COMPUTING

THE LEADER IN VISUAL COMPUTING MOBILE EMBEDDED THE LEADER IN VISUAL COMPUTING 2 TAKING OUR VISION TO REALITY HPC DESIGN and VISUALIZATION AUTO GAMING 3 BEST DEVELOPER EXPERIENCE Tools for Fast Development Debug and Performance Tuning

More information

LDetector: A low overhead data race detector for GPU programs

LDetector: A low overhead data race detector for GPU programs LDetector: A low overhead data race detector for GPU programs 1 PENGCHENG LI CHEN DING XIAOYU HU TOLGA SOYATA UNIVERSITY OF ROCHESTER 1 Data races in GPU Introduction & Contribution Impact correctness

More information

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

On the Capability and Achievable Performance of FPGAs for HPC Applications "On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies

More information

EDGE / FOG COMPUTING EVOLUTIONS LAURENT REMONT, CTO KONTRON-S&T

EDGE / FOG COMPUTING EVOLUTIONS LAURENT REMONT, CTO KONTRON-S&T EDGE / FOG COMPUTING EVOLUTIONS LAURENT REMONT, CTO KONTRON-S&T CURRENT IOT SOLUTIONS LIMITATIONS Sensors, actuators, simple IoT devices Gateway, PLC, Complex IoT device Cloud Not suitable for time critical

More information