Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures

Size: px
Start display at page:

Download "Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures"

Transcription

1 Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures Harshavardhan Reddy Suda NCRA, India Vinay Deshpande NVIDIA, India Bharat Kumar NVIDIA, India

2 What signals we are processing? Digitized baseband signals from 30 dual polarized antennas of GMRT GMRT The Giant Meter-wave Radio Telescope (GMRT) is a world class instrument for studying astrophysical phenomena at low radio frequencies Located 80 km north of Pune, 160 km east of Mumbai Array telescope with 30 antennas of 45 m diameter, operating at meter wavelengths

3 GMRT Supports two modes of operation : - Interferometry (correlator) - Array mode (beamformer) Frequency bands : to 260 MHz to 500 MHz to 900 MHz to 1600 MHz Maximum instantaneous bandwidth : 400 MHz (Legacy GMRT = 32 MHz) Effective collecting area (2-3% of SKA) -30,000 sq m at lower frequencies -20,000 sq m at higher frequencies

4 The Giant Meter-wave Radio Telescope A Google eye view

5 GMRT receiver chain Signal processing in digital back-end Image courtesy : Ajith Kumar, NCRA

6 Computation requirements Antenna Signals(M=64) Sampler Maximum Bandwidth 400 MHz Fourier Transform O(NlogN) 16k point spectral channels 3 TFlops Phase Correction 0.1 TFlops MAC M(M+1)/2 6.6 TFlops Total ~ 10 TFlops

7 Design : Time slicing model

8 Design : Time slicing model A 4-node example Ant 1, Ant Ant 16 : Digitized data of baseband signals of Antennas

9 Implementation 16 Dell T630 machines as Compute Nodes 16 ROACH (FPGA) boards with Atmel/e2v based ADCs developed by CASPER group, Berkeley for digitization and packetization 32 Tesla K40c GPU cards for processing 36 port Mellanox Infiniband switch for data sharing between Compute Nodes and Host Nodes Software : C/C++ and CUDA C programming with OpenMPI and OpenMP directives Developed in collaboration with Swinburne University, Australia

10 Implementation Image courtesy : Irappa Halagalli, NCRA

11 Sample result Image of Coma cluster Legacy GMRT 325 MHz : 350 μjy Upgraded GMRT MHz : 28 μjy Significantly lower noise RMS and better image quality with upgraded GMRT Dharam Vir Lal and Ishwar Chandra, NCRA

12 Computation Performance : K40 Channels FFT (Gflops) MAC (Gflops) No. of antennas : 32 (dual pol) CUDA 7.5

13 Motivation for next generation GPUs Adding more compute intensive applications - Multi-beamforming - Processing on each beam (beam steering) - Gated correlator - FIR filtering with many taps for narrow-band mode implementation Working GMRT system and code provides an excellent testing ground for the features of next generation GPUs Performance measured and compared on GP100 and V100

14 Computation performance K40 vs GP100 Cuda 7.5, ECC off Performance follows CUFFT benchmarks for K40 and P100 Reference for K40 benchmark : CUDA 6.5 performance report, September 2014 Reference for P100 benchmark : CUDA 8 PERFORMANCE OVERVIEW, November 2016

15 Computation performance : K40 vs GP100 Cuda 7.5, ECC off No. of antennas : 32 (dual pol)

16 Computation performance : K40 vs GP100 Cuda 7.5, ECC off Peak Performance : K TFlops GP TFlops Peak Global Memory Bandwidth : K GB / sec GP GB / sec

17 Computation performance as % of Real-time Bandwidth : 200 MHz No. of antennas : 32 (dual pol) Spectral Channels : 16384

18 Computation performance : GP100 vs V100 GP100 on Cuda 7.5 V100 on Cuda 9.1 (using PSG cluster)

19 Computation performance : GP100 vs V100 GP100 on Cuda 7.5 V100 on Cuda 9.1 (using PSG cluster) No. of antennas : 32 (dual pol)

20 Computation performance : GP100 vs V100 GP100 on Cuda 7.5 V100 on Cuda 9.1 (using PSG cluster) Peak Performance : GP TFlops V TFlops Peak Global Memory Bandwidth : GP GB / sec V GB / sec

21 Reasons behind relatively low performance of MAC Non-contiguous Global Memory access at block level Low Arithmetic Intensity MAC input data format

22 GPU kernel improvements FFT : Single Precision to Half Precision floating point MAC : Simplified Index Arithmetic Improved the L2 hit ratio : less then 5% to nearly 86% Vectorized loads Increased ILP (float4) Exposing more parallelism by increasing the occupancy Single Precision to Half Precision floating point No performance gain

23 MAC : Performance gain with optimizations on V100 V100 on Cuda 9.1 (using PSG cluster) No. of antennas : 32 (dual pol)

24 FFT : Performance gain with half precision on V100 V100 on Cuda 9.1 (using PSG cluster)

25 FFT : Error analysis with half precision in power spectrum Spectral Channels : 2048 Batch size : 128

26 FFT : Error analysis with half precision in phase spectrum Spectral Channels : 2048 Batch size : 128

27 Going forward Improving MAC using Tensor cores potential 2x improvement Implementing the MAC optimizations and half-precision floating point FFT in the GMRT code Optimized FIR filtering routines in CUDA for narrow-band mode implementation Implementing multi-beamforming, beam steering and gated correlator

28 Acknowledgements Prof. Yashwant Gupta, Centre Director, NCRA Ajith Kumar B., Back-end group co-ordinator, GMRT, NCRA Sanjay Kudale, GMRT, NCRA Shelton Gnanaraj, GMRT, NCRA Andrew Jameson, Swinburne University, Australia Benjamin Barsdel, Swinburne University, Australia (now at Nvidia) CASPER Group, Berkeley Digital Back-end Group, GMRT, NCRA Computer Group, GMRT, NCRA Control Room, GMRT

29 Thank You

Internal Technical Report CPU-GPU based DIGITAL Backend

Internal Technical Report CPU-GPU based DIGITAL Backend Internal Technical Report CPU-GPU based DIGITAL Backend S. Harshavardhan Reddy & Irappa M. Halagali Ver. 2.0, 11/06/2014. Index 1. INTRODUCTION 2. BLOCK DIAGRAM 3. SPECIFICATIONS a. ugmrt b. GWB-II c.

More information

Concept Design of a Software Correlator for future ALMA. Jongsoo Kim Korea Astronomy and Space Science Institute

Concept Design of a Software Correlator for future ALMA. Jongsoo Kim Korea Astronomy and Space Science Institute Concept Design of a Software Correlator for future ALMA Jongsoo Kim Korea Astronomy and Space Science Institute Technologies for Correlators ASIC (Application-Specific Integrated Circuit) e.g, ALMA 64-antenna

More information

OSKAR: Simulating data from the SKA

OSKAR: Simulating data from the SKA OSKAR: Simulating data from the SKA Oxford e-research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini 1 Overview Simulating interferometer data for SKA: Radio interferometry basics. Measurement

More information

Adaptive selfcalibration for Allen Telescope Array imaging

Adaptive selfcalibration for Allen Telescope Array imaging Adaptive selfcalibration for Allen Telescope Array imaging Garrett Keating, William C. Barott & Melvyn Wright Radio Astronomy laboratory, University of California, Berkeley, CA, 94720 ABSTRACT Planned

More information

SOP for testing PACKETIZED CORRELATOR.

SOP for testing PACKETIZED CORRELATOR. SOP for testing PACKETIZED CORRELATOR. Sandeep C. Chaudhari & Irappa M. Halagali : 31/08/2012. VERSION : 1 Packetized correlator is an general purpose re-configurable digital backend for radio astronomy

More information

Signal processing with heterogeneous digital filterbanks: lessons from the MWA and EDA

Signal processing with heterogeneous digital filterbanks: lessons from the MWA and EDA Signal processing with heterogeneous digital filterbanks: lessons from the MWA and EDA Randall Wayth ICRAR/Curtin University with Marcin Sokolowski, Cathryn Trott Outline "Holy grail of CASPER system is

More information

SOP for testing 4/8 antenna PACKETIZED CORRELATOR.

SOP for testing 4/8 antenna PACKETIZED CORRELATOR. SOP for testing 4/8 antenna PACKETIZED CORRELATOR. By : Sandeep C. Chaudhari & Irappa M. Halagali. VERSION : 2 Dated : 21/03/2013 Packetized correlator is a general purpose re-configurable digital backend

More information

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering

More information

GPUS FOR NGVLA. M Clark, April 2015

GPUS FOR NGVLA. M Clark, April 2015 S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40

More information

PARALLEL PROGRAMMING MANY-CORE COMPUTING FOR THE LOFAR TELESCOPE ROB VAN NIEUWPOORT. Rob van Nieuwpoort

PARALLEL PROGRAMMING MANY-CORE COMPUTING FOR THE LOFAR TELESCOPE ROB VAN NIEUWPOORT. Rob van Nieuwpoort PARALLEL PROGRAMMING MANY-CORE COMPUTING FOR THE LOFAR TELESCOPE ROB VAN NIEUWPOORT Rob van Nieuwpoort rob@cs.vu.nl Who am I 10 years of Grid / Cloud computing 6 years of many-core computing, radio astronomy

More information

AA CORRELATOR SYSTEM CONCEPT DESCRIPTION

AA CORRELATOR SYSTEM CONCEPT DESCRIPTION AA CORRELATOR SYSTEM CONCEPT DESCRIPTION Document number WP2 040.040.010 TD 001 Revision 1 Author. Andrew Faulkner Date.. 2011 03 29 Status.. Approved for release Name Designation Affiliation Date Signature

More information

The Hardware and software used for this tutorial

The Hardware and software used for this tutorial Home Group Documentation Mail Archive About 2 antenna correlator From Casper Jump to: navigation, search Tutorial 9: 2 antenna GPU correlator Author: Harshavardhan Reddy (Version 1). Expected completion

More information

The Breakthrough LISTEN Search for Intelligent Life: A Wideband Data Recorder for the Robert C. Byrd Green Bank Telescope

The Breakthrough LISTEN Search for Intelligent Life: A Wideband Data Recorder for the Robert C. Byrd Green Bank Telescope The Breakthrough LISTEN Search for Intelligent Life: A Wideband Data Recorder for the Robert C. Byrd Green Bank Telescope Dave MacMahon University of California at Berkeley Breakthrough LISTEN SETI Project

More information

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes six radio telescope algorithms on

More information

Fast Holographic Deconvolution

Fast Holographic Deconvolution Precision image-domain deconvolution for radio astronomy Ian Sullivan University of Washington 4/19/2013 Precision imaging Modern imaging algorithms grid visibility data using sophisticated beam models

More information

OSKAR-2: Simulating data from the SKA

OSKAR-2: Simulating data from the SKA OSKAR-2: Simulating data from the SKA AACal 2012, Amsterdam, 13 th July 2012 Fred Dulwich, Ben Mort, Stef Salvini 1 Overview OSKAR-2: Interferometer and beamforming simulator package. Intended for simulations

More information

PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5)

PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5) PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5) Rob van Nieuwpoort Vrije Universiteit Amsterdam & Astron, the Netherlands Institute for Radio Astronomy Why Radio? Credit: NASA/IPAC

More information

n N c CIni.o ewsrg.au

n N c CIni.o ewsrg.au @NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU

More information

Development of Focal-Plane Arrays and Beamforming Networks at DRAO

Development of Focal-Plane Arrays and Beamforming Networks at DRAO Development of Focal-Plane Arrays and Beamforming Networks at DRAO Bruce Veidt Dominion Radio Astrophysical Observatory Herzberg Institute of Astrophysics National Research Council of Canada Penticton,

More information

SOP for testing 4/8 antenna PACKETIZED CORRELATOR.

SOP for testing 4/8 antenna PACKETIZED CORRELATOR. SOP for testing 4/8 antenna PACKETIZED CORRELATOR. By : Sandeep C. Chaudhari & Irappa M. Halagali. VERSION : 3 Dated : 05/07/2013 Packetized correlator is a general purpose re-configurable digital backend

More information

A Scalable, FPGA Based 8 Station Correlator Based on Modular Hardware and Parameterized Libraries

A Scalable, FPGA Based 8 Station Correlator Based on Modular Hardware and Parameterized Libraries A Scalable, FPGA Based 8 Station Correlator Based on Modular Hardware and Parameterized Libraries Aaron Parsons Space Sciences Lab University of California, Berkeley http://seti.berkeley.edu/ CASPER: Center

More information

Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs

Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs Amit Kalele and Manoj Nambiar April 21, 2014 1 Optimization & Parallelization COE Center of Excellence

More information

Studying GPU based RTC for TMT NFIRAOS

Studying GPU based RTC for TMT NFIRAOS Studying GPU based RTC for TMT NFIRAOS Lianqi Wang Thirty Meter Telescope Project RTC Workshop Dec 04, 2012 1 Outline Tomography with iterative algorithms on GPUs Matri vector multiply approach Assembling

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

Real-Time Support for GPU. GPU Management Heechul Yun

Real-Time Support for GPU. GPU Management Heechul Yun Real-Time Support for GPU GPU Management Heechul Yun 1 This Week Topic: Real-Time Support for General Purpose Graphic Processing Unit (GPGPU) Today Background Challenges Real-Time GPU Management Frameworks

More information

CUDA Experiences: Over-Optimization and Future HPC

CUDA Experiences: Over-Optimization and Future HPC CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

THE VLBA SENSITIVITY UPGRADE

THE VLBA SENSITIVITY UPGRADE THE VLBA SENSITIVITY UPGRADE Craig Walker, NRAO Eleventh Synthesis Imaging Workshop Socorro, June 10-17, 2008 CONTEXT 2 The VLBA is based on 20 year old technology Only limited new capabilities have been

More information

Jason Manley. Internal presentation: Operation overview and drill-down October 2007

Jason Manley. Internal presentation: Operation overview and drill-down October 2007 Jason Manley Internal presentation: Operation overview and drill-down October 2007 System overview Achievements to date ibob F Engine in detail BEE2 X Engine in detail Backend System in detail Future developments

More information

Imaging and Deconvolution

Imaging and Deconvolution Imaging and Deconvolution Urvashi Rau National Radio Astronomy Observatory, Socorro, NM, USA The van-cittert Zernike theorem Ei E V ij u, v = I l, m e sky j 2 i ul vm dldm 2D Fourier transform : Image

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

A Multi-GPU Spectrometer System for Real-time Wide Bandwidth Radio Signal Analysis

A Multi-GPU Spectrometer System for Real-time Wide Bandwidth Radio Signal Analysis A Multi-GPU Spectrometer System for Real-time Wide Bandwidth Radio Signal Analysis Hirofumi Kondo, Eric Heien, Masao Okita, Dan Werthimer and Kenichi Hagihara Graduate School of Information Science and

More information

Technology for a better society. hetcomp.com

Technology for a better society. hetcomp.com Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction

More information

Leveraging Mobile GPUs for Flexible High-speed Wireless Communication

Leveraging Mobile GPUs for Flexible High-speed Wireless Communication 0 Leveraging Mobile GPUs for Flexible High-speed Wireless Communication Qi Zheng, Cao Gao, Trevor Mudge, Ronald Dreslinski *, Ann Arbor The 3 rd International Workshop on Parallelism in Mobile Platforms

More information

Using CUDA to Accelerate Radar Image Processing

Using CUDA to Accelerate Radar Image Processing Using CUDA to Accelerate Radar Image Processing Aaron Rogan Richard Carande 9/23/2010 Approved for Public Release by the Air Force on 14 Sep 2010, Document Number 88 ABW-10-5006 Company Overview Neva Ridge

More information

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has

More information

CUDA 6.0 Performance Report. April 2014

CUDA 6.0 Performance Report. April 2014 CUDA 6. Performance Report April 214 1 CUDA 6 Performance Report CUDART CUDA Runtime Library cufft Fast Fourier Transforms Library cublas Complete BLAS Library cusparse Sparse Matrix Library curand Random

More information

GpuWrapper: A Portable API for Heterogeneous Programming at CGG

GpuWrapper: A Portable API for Heterogeneous Programming at CGG GpuWrapper: A Portable API for Heterogeneous Programming at CGG Victor Arslan, Jean-Yves Blanc, Gina Sitaraman, Marc Tchiboukdjian, Guillaume Thomas-Collignon March 2 nd, 2016 GpuWrapper: Objectives &

More information

Dense Linear Algebra. HPC - Algorithms and Applications

Dense Linear Algebra. HPC - Algorithms and Applications Dense Linear Algebra HPC - Algorithms and Applications Alexander Pöppl Technical University of Munich Chair of Scientific Computing November 6 th 2017 Last Tutorial CUDA Architecture thread hierarchy:

More information

European VLBI Network

European VLBI Network European VLBI Network Cormac Reynolds, JIVE European Radio Interferometry School, Bonn 12 Sept. 2007 EVN Array 15 dissimilar telescopes Observes 3 times a year (approx 60 days per year) Includes some

More information

CASPER AND GPUS MODERATOR: DANNY PRICE, SCRIBE: RICHARD PRESTAGE. Applications correlators, beamformers, spectrometers, FRB

CASPER AND GPUS MODERATOR: DANNY PRICE, SCRIBE: RICHARD PRESTAGE. Applications correlators, beamformers, spectrometers, FRB CASPER AND GPUS MODERATOR: DANNY PRICE, SCRIBE: RICHARD PRESTAGE Frameworks MPI, heterogenous large systems Pipelines hashpipe, psrdata, bifrost, htgs Data transport DPDK, libvma, NTOP Applications correlators,

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

Optimization and Beamforming of a Two Dimensional Sparse Array

Optimization and Beamforming of a Two Dimensional Sparse Array Optimization and Beamforming of a Two Dimensional Sparse Array Mandar A. Chitre Acoustic Research Laboratory National University of Singapore 10 Kent Ridge Crescent, Singapore 119260 email: mandar@arl.nus.edu.sg

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

TRANSFORMATIONAL TECHNOLOGIES

TRANSFORMATIONAL TECHNOLOGIES TRANSFORMATIONAL TECHNOLOGIES FOR THREE-DIMENSIONAL VISUALISATION (AND ANALYSIS) Christopher Fluke ESO 3D2014 CRICOS provider 00111D Thank you Key Collaborators: David Barnes Monash University e-research

More information

VLBI progress Down-under. Tasso Tzioumis Australia Telescope National Facility (ATNF) 25 September 2008

VLBI progress Down-under. Tasso Tzioumis Australia Telescope National Facility (ATNF) 25 September 2008 VLBI progress Down-under Tasso Tzioumis Australia Telescope National Facility (ATNF) 25 September 2008 Outline Down-under == Southern hemisphere VLBI in Australia (LBA) Progress in the last few years Disks

More information

ALMA Memo 386 ALMA+ACA Simulation Tool J. Pety, F. Gueth, S. Guilloteau IRAM, Institut de Radio Astronomie Millimétrique 300 rue de la Piscine, F-3840

ALMA Memo 386 ALMA+ACA Simulation Tool J. Pety, F. Gueth, S. Guilloteau IRAM, Institut de Radio Astronomie Millimétrique 300 rue de la Piscine, F-3840 ALMA Memo 386 ALMA+ACA Simulation Tool J. Pety, F. Gueth, S. Guilloteau IRAM, Institut de Radio Astronomie Millimétrique 300 rue de la Piscine, F-38406 Saint Martin d'h eres August 13, 2001 Abstract This

More information

2011 Signal Processing CoDR: Technology Roadmap W. Turner SPDO. 14 th April 2011

2011 Signal Processing CoDR: Technology Roadmap W. Turner SPDO. 14 th April 2011 2011 Signal Processing CoDR: Technology Roadmap W. Turner SPDO 14 th April 2011 Technology Roadmap Objectives: Identify known potential technologies applicable to the SKA Provide traceable attributes of

More information

GREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES. Nikolay Markovskiy Peter Messmer

GREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES. Nikolay Markovskiy Peter Messmer GREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES Nikolay Markovskiy Peter Messmer ABOUT CP2K Atomistic and molecular simulations of solid state From ab initio DFT and Hartree-Fock

More information

A GPU based brute force de-dispersion algorithm for LOFAR

A GPU based brute force de-dispersion algorithm for LOFAR A GPU based brute force de-dispersion algorithm for LOFAR W. Armour, M. Giles, A. Karastergiou and C. Williams. University of Oxford. 8 th May 2012 1 GPUs Why use GPUs? Latest Kepler/Fermi based cards

More information

S.A. Torchinsky, A. van Ardenne, T. van den Brink-Havinga, A.J.J. van Es, A.J. Faulkner (eds.) 4-6 November 2009, Château de Limelette, Belgium

S.A. Torchinsky, A. van Ardenne, T. van den Brink-Havinga, A.J.J. van Es, A.J. Faulkner (eds.) 4-6 November 2009, Château de Limelette, Belgium WIDEFIELD SCIENCE AND TECHNOLOGY FOR THE SKA SKADS CONFERENCE 2009 S.A. Torchinsky, A. van Ardenne, T. van den Brink-Havinga, A.J.J. van Es, A.J. Faulkner (eds.) 4-6 November 2009, Château de Limelette,

More information

arxiv: v3 [astro-ph.im] 30 Apr 2018

arxiv: v3 [astro-ph.im] 30 Apr 2018 Cobalt: A GPU-based correlator and beamformer for OFAR P. Chris Broekema 1, J. Jan David Mol 1, Ronald Nijboer 1, Alexander S. van Amesfoort 1, Michiel A. Brentjens 1, G. Marcel oose 1, Wouter F. A. Klijn

More information

OPTICAL COHERENCE TOMOGRAPHY:SIGNAL PROCESSING AND ALGORITHM

OPTICAL COHERENCE TOMOGRAPHY:SIGNAL PROCESSING AND ALGORITHM OPTICAL COHERENCE TOMOGRAPHY:SIGNAL PROCESSING AND ALGORITHM OCT Medical imaging modality with 1-10 µ m resolutions and 1-2 mm penetration depths High-resolution, sub-surface non-invasive or minimally

More information

AMBER 11 Performance Benchmark and Profiling. July 2011

AMBER 11 Performance Benchmark and Profiling. July 2011 AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information

ASKAP Data Flow ASKAP & MWA Archives Meeting

ASKAP Data Flow ASKAP & MWA Archives Meeting ASKAP Data Flow ASKAP & MWA Archives Meeting Ben Humphreys ASKAP Software and Computing Project Engineer 25 th March 2013 ASTRONOMY AND SPACE SCIENCE ASKAP @ Murchison Radioastronomy Observatory Australian

More information

FPGA APPLICATIONS FOR SINGLE DISH ACTIVITY AT MEDICINA RADIOTELESCOPES

FPGA APPLICATIONS FOR SINGLE DISH ACTIVITY AT MEDICINA RADIOTELESCOPES MARCO BARTOLINI - BARTOLINI@IRA.INAF.IT TORINO 18 MAY 2016 WORKSHOP: FPGA APPLICATION IN ASTROPHYSICS FPGA APPLICATIONS FOR SINGLE DISH ACTIVITY AT MEDICINA RADIOTELESCOPES TORINO, 18 MAY 2016, INAF FPGA

More information

Game-changing Extreme GPU computing with The Dell PowerEdge C4130

Game-changing Extreme GPU computing with The Dell PowerEdge C4130 Game-changing Extreme GPU computing with The Dell PowerEdge C4130 A Dell Technical White Paper This white paper describes the system architecture and performance characterization of the PowerEdge C4130.

More information

VISUALISATION AND ANALYSIS

VISUALISATION AND ANALYSIS VISUALISATION AND ANALYSIS CHALLENGES FOR WALLABY Christopher Fluke David Barnes, Amr Hassan [ Scientific Computing & Visualisation Group ] CRICOSProductions provider 00111D Swinburne Astronomy WALLABY

More information

OPTIMIZED GPU KERNELS FOR DEEP LEARNING. Amir Khosrowshahi

OPTIMIZED GPU KERNELS FOR DEEP LEARNING. Amir Khosrowshahi OPTIMIZED GPU KERNELS FOR DEEP LEARNING Amir Khosrowshahi GTC 17 Mar 2015 Outline About nervana Optimizing deep learning at assembler level Limited precision for deep learning neon benchmarks 2 About nervana

More information

EVLA Memo #132 Report on the findings of the CASA Terabyte Initiative: Single-node tests

EVLA Memo #132 Report on the findings of the CASA Terabyte Initiative: Single-node tests EVLA Memo #132 Report on the findings of the CASA Terabyte Initiative: Single-node tests S. Bhatnagar NRAO, Socorro May 18, 2009 Abstract This note reports on the findings of the Terabyte-Initiative of

More information

Accelerating the Fast Fourier Transform using Mixed Precision on Tensor Core Hardware

Accelerating the Fast Fourier Transform using Mixed Precision on Tensor Core Hardware NSF REU - 2018: Project Report Accelerating the Fast Fourier Transform using Mixed Precision on Tensor Core Hardware Anumeena Sorna Electronics and Communciation Engineering National Institute of Technology,

More information

NAMD GPU Performance Benchmark. March 2011

NAMD GPU Performance Benchmark. March 2011 NAMD GPU Performance Benchmark March 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory

More information

Advanced Research Computing. ARC3 and GPUs. Mark Dixon

Advanced Research Computing. ARC3 and GPUs. Mark Dixon Advanced Research Computing Mark Dixon m.c.dixon@leeds.ac.uk ARC3 (1st March 217) Included 2 GPU nodes, each with: 24 Intel CPU cores & 128G RAM (same as standard compute node) 2 NVIDIA Tesla K8 24G RAM

More information

NVIDIA Application Lab at Jülich

NVIDIA Application Lab at Jülich Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jülich Dirk Pleiter Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800

More information

A Standalone Package for Bringing Graphics Processor Acceleration to GNU Radio: GRGPU

A Standalone Package for Bringing Graphics Processor Acceleration to GNU Radio: GRGPU A Standalone Package for Bringing Graphics Processor Acceleration to GNU Radio: GRGPU William Plishker University of Maryland plishker@umd.edu 1/25 Outline Introduction GPU Background Graphics Processor

More information

GPU-centric communication for improved efficiency

GPU-centric communication for improved efficiency GPU-centric communication for improved efficiency Benjamin Klenk *, Lena Oden, Holger Fröning * * Heidelberg University, Germany Fraunhofer Institute for Industrial Mathematics, Germany GPCDP Workshop

More information

SPEAD Recommended Practice

SPEAD Recommended Practice SPEAD Recommended Practice Document number: Revision: Classification: C Open Source, GPL Author: J. Manley, M. Welz, A. Parsons, S. Ratcliffe, R. van Rooyen Date: Document History Revision Date of Issue

More information

Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen

Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen GPU Computing Applications Here's what Nvidia says its Tesla K20(X) card excels at doing - Seismic processing, CFD, CAE, Financial computing,

More information

SKA Computing and Software

SKA Computing and Software SKA Computing and Software Nick Rees 18 May 2016 Summary Introduc)on System overview Compu)ng Elements of the SKA Telescope Manager Low Frequency Aperture Array Central Signal Processor Science Data Processor

More information

High Performance Computing with Accelerators

High Performance Computing with Accelerators High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

The Implementation of a Real-time Polyphase Filter

The Implementation of a Real-time Polyphase Filter WDS'14 Proceedings of Contributed Papers Physics, 9 14, 2014. ISBN 978-80-7378-276-4 MATFYZPRESS The Implementation of a Real-time Polyphase Filter K. Adámek and J. Novotný Institute of Physics, Faculty

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

Evaluating the Potential of Graphics Processors for High Performance Embedded Computing

Evaluating the Potential of Graphics Processors for High Performance Embedded Computing Evaluating the Potential of Graphics Processors for High Performance Embedded Computing Shuai Mu, Chenxi Wang, Ming Liu, Yangdong Deng Department of Micro-/Nano-electronics Tsinghua University Outline

More information

CUDA Toolkit 4.0 Performance Report. June, 2011

CUDA Toolkit 4.0 Performance Report. June, 2011 CUDA Toolkit 4. Performance Report June, 211 CUDA Math Libraries High performance math routines for your applications: cufft Fast Fourier Transforms Library cublas Complete BLAS Library cusparse Sparse

More information

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &

More information

CASA. Algorithms R&D. S. Bhatnagar. NRAO, Socorro

CASA. Algorithms R&D. S. Bhatnagar. NRAO, Socorro Algorithms R&D S. Bhatnagar NRAO, Socorro Outline Broad areas of work 1. Processing for wide-field wide-band imaging Full-beam, Mosaic, wide-band, full-polarization Wide-band continuum and spectral-line

More information

CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION

CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION WHAT YOU WILL LEARN An iterative method to optimize your GPU code Some common bottlenecks to look out for Performance diagnostics with NVIDIA Nsight

More information

Nvidia Tesla The Personal Supercomputer

Nvidia Tesla The Personal Supercomputer International Journal of Allied Practice, Research and Review Website: www.ijaprr.com (ISSN 2350-1294) Nvidia Tesla The Personal Supercomputer Sameer Ahmad 1, Umer Amin 2, Mr. Zubair M Paul 3 1 Student,

More information

MeerKAT Data Architecture. Simon Ratcliffe

MeerKAT Data Architecture. Simon Ratcliffe MeerKAT Data Architecture Simon Ratcliffe MeerKAT Signal Path MeerKAT Data Rates Online System The online system receives raw visibilities from the correlator at a sufficiently high dump rate to facilitate

More information

Dr. Evaldas Stankevičius, Regulatory and Security Expert.

Dr. Evaldas Stankevičius, Regulatory and Security Expert. 2018-08-23 Dr. Evaldas Stankevičius, Regulatory and Security Expert Email: evaldas.stankevicius@tele2.com 1G: purely analog system. 2G: voice and SMS. 3G: packet switching communication. 4G: enhanced mobile

More information

RASDRWin Companion Software for RASDR. Paul Oxley Retired AT&T Microwave Engineer David Fields Stan Kurtz

RASDRWin Companion Software for RASDR. Paul Oxley Retired AT&T Microwave Engineer David Fields Stan Kurtz RASDRWin Companion Software for RASDR Paul Oxley Retired AT&T Microwave Engineer David Fields Stan Kurtz Abstract: An update of the RASDR project will be presented. The paper demonstrates Windows control

More information

Developing A Universal Radio Astronomy Backend. Dr. Ewan Barr, MPIfR Backend Development Group

Developing A Universal Radio Astronomy Backend. Dr. Ewan Barr, MPIfR Backend Development Group Developing A Universal Radio Astronomy Backend Dr. Ewan Barr, MPIfR Backend Development Group Overview Why is it needed? What should it do? Key concepts and technologies Case studies: MeerKAT FBF and APSUSE

More information

Optimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition

Optimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition Optimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition CUG 2018, Stockholm Andreas Jocksch, Matthias Kraushaar (CSCS), David Daverio (University of Cambridge,

More information

The Control System for the Caltech Millimeter Array. Steve Scott OVRO

The Control System for the Caltech Millimeter Array. Steve Scott OVRO The Control System for the Caltech Millimeter Array Steve Scott OVRO Caltech Millimeter Wave Array 6 telescopes, 10 meters in diameter Simultaneous dual receivers (1mm & 3mm) 4GHz IF bandwidth 2x1GHz continuum

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and

More information

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate

More information

Research on performance dependence of cluster computing system based on GPU accelerators on architecture and number of cluster nodes

Research on performance dependence of cluster computing system based on GPU accelerators on architecture and number of cluster nodes Research on performance dependence of cluster computing system based on GPU accelerators on architecture and number of cluster nodes D. Akhmedov, S. Yelubayev, T. Bopeyev, F. Abdoldina, D. Muratov, R.

More information

SPIRAL, FFTX, and the Path to SpectralPACK

SPIRAL, FFTX, and the Path to SpectralPACK SPIRAL, FFTX, and the Path to SpectralPACK Franz Franchetti Carnegie Mellon University www.spiral.net In collaboration with the SPIRAL and FFTX team @ CMU and LBL This work was supported by DOE ECP and

More information

ERROR RECOGNITION and IMAGE ANALYSIS

ERROR RECOGNITION and IMAGE ANALYSIS PREAMBLE TO ERROR RECOGNITION and IMAGE ANALYSIS 2 Why are these two topics in the same lecture? ERROR RECOGNITION and IMAGE ANALYSIS Ed Fomalont Error recognition is used to determine defects in the data

More information

GPU-ACCELERATED SPECKLE MASKING RECONSTRUCTION ALGORITHM

GPU-ACCELERATED SPECKLE MASKING RECONSTRUCTION ALGORITHM Journal of the Korean Astronomical Society https://doi.org/10.5303/jkas.2018.51.3.65 51: 65 71, 2018 June pissn: 1225-4614 eissn: 2288-890X c 2018. The Korean Astronomical Society. All rights reserved.

More information

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D

More information

Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy

Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy Outline Motivation / Overview Participants / Industry Partners Documentation Architecture Current Status and Services

More information

What s 5G? Dr Dean Economou Chief Transport Strategist, Telstra

What s 5G? Dr Dean Economou Chief Transport Strategist, Telstra What s 5G? Dr Dean Economou Chief Transport Strategist, Telstra Spoiler alert Page 2 5G key features Higher speeds for more users at once More consistent and reliable connections Lower delay (latency)

More information