Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures
|
|
- Augustus Blake
- 5 years ago
- Views:
Transcription
1 Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures Harshavardhan Reddy Suda NCRA, India Vinay Deshpande NVIDIA, India Bharat Kumar NVIDIA, India
2 What signals we are processing? Digitized baseband signals from 30 dual polarized antennas of GMRT GMRT The Giant Meter-wave Radio Telescope (GMRT) is a world class instrument for studying astrophysical phenomena at low radio frequencies Located 80 km north of Pune, 160 km east of Mumbai Array telescope with 30 antennas of 45 m diameter, operating at meter wavelengths
3 GMRT Supports two modes of operation : - Interferometry (correlator) - Array mode (beamformer) Frequency bands : to 260 MHz to 500 MHz to 900 MHz to 1600 MHz Maximum instantaneous bandwidth : 400 MHz (Legacy GMRT = 32 MHz) Effective collecting area (2-3% of SKA) -30,000 sq m at lower frequencies -20,000 sq m at higher frequencies
4 The Giant Meter-wave Radio Telescope A Google eye view
5 GMRT receiver chain Signal processing in digital back-end Image courtesy : Ajith Kumar, NCRA
6 Computation requirements Antenna Signals(M=64) Sampler Maximum Bandwidth 400 MHz Fourier Transform O(NlogN) 16k point spectral channels 3 TFlops Phase Correction 0.1 TFlops MAC M(M+1)/2 6.6 TFlops Total ~ 10 TFlops
7 Design : Time slicing model
8 Design : Time slicing model A 4-node example Ant 1, Ant Ant 16 : Digitized data of baseband signals of Antennas
9 Implementation 16 Dell T630 machines as Compute Nodes 16 ROACH (FPGA) boards with Atmel/e2v based ADCs developed by CASPER group, Berkeley for digitization and packetization 32 Tesla K40c GPU cards for processing 36 port Mellanox Infiniband switch for data sharing between Compute Nodes and Host Nodes Software : C/C++ and CUDA C programming with OpenMPI and OpenMP directives Developed in collaboration with Swinburne University, Australia
10 Implementation Image courtesy : Irappa Halagalli, NCRA
11 Sample result Image of Coma cluster Legacy GMRT 325 MHz : 350 μjy Upgraded GMRT MHz : 28 μjy Significantly lower noise RMS and better image quality with upgraded GMRT Dharam Vir Lal and Ishwar Chandra, NCRA
12 Computation Performance : K40 Channels FFT (Gflops) MAC (Gflops) No. of antennas : 32 (dual pol) CUDA 7.5
13 Motivation for next generation GPUs Adding more compute intensive applications - Multi-beamforming - Processing on each beam (beam steering) - Gated correlator - FIR filtering with many taps for narrow-band mode implementation Working GMRT system and code provides an excellent testing ground for the features of next generation GPUs Performance measured and compared on GP100 and V100
14 Computation performance K40 vs GP100 Cuda 7.5, ECC off Performance follows CUFFT benchmarks for K40 and P100 Reference for K40 benchmark : CUDA 6.5 performance report, September 2014 Reference for P100 benchmark : CUDA 8 PERFORMANCE OVERVIEW, November 2016
15 Computation performance : K40 vs GP100 Cuda 7.5, ECC off No. of antennas : 32 (dual pol)
16 Computation performance : K40 vs GP100 Cuda 7.5, ECC off Peak Performance : K TFlops GP TFlops Peak Global Memory Bandwidth : K GB / sec GP GB / sec
17 Computation performance as % of Real-time Bandwidth : 200 MHz No. of antennas : 32 (dual pol) Spectral Channels : 16384
18 Computation performance : GP100 vs V100 GP100 on Cuda 7.5 V100 on Cuda 9.1 (using PSG cluster)
19 Computation performance : GP100 vs V100 GP100 on Cuda 7.5 V100 on Cuda 9.1 (using PSG cluster) No. of antennas : 32 (dual pol)
20 Computation performance : GP100 vs V100 GP100 on Cuda 7.5 V100 on Cuda 9.1 (using PSG cluster) Peak Performance : GP TFlops V TFlops Peak Global Memory Bandwidth : GP GB / sec V GB / sec
21 Reasons behind relatively low performance of MAC Non-contiguous Global Memory access at block level Low Arithmetic Intensity MAC input data format
22 GPU kernel improvements FFT : Single Precision to Half Precision floating point MAC : Simplified Index Arithmetic Improved the L2 hit ratio : less then 5% to nearly 86% Vectorized loads Increased ILP (float4) Exposing more parallelism by increasing the occupancy Single Precision to Half Precision floating point No performance gain
23 MAC : Performance gain with optimizations on V100 V100 on Cuda 9.1 (using PSG cluster) No. of antennas : 32 (dual pol)
24 FFT : Performance gain with half precision on V100 V100 on Cuda 9.1 (using PSG cluster)
25 FFT : Error analysis with half precision in power spectrum Spectral Channels : 2048 Batch size : 128
26 FFT : Error analysis with half precision in phase spectrum Spectral Channels : 2048 Batch size : 128
27 Going forward Improving MAC using Tensor cores potential 2x improvement Implementing the MAC optimizations and half-precision floating point FFT in the GMRT code Optimized FIR filtering routines in CUDA for narrow-band mode implementation Implementing multi-beamforming, beam steering and gated correlator
28 Acknowledgements Prof. Yashwant Gupta, Centre Director, NCRA Ajith Kumar B., Back-end group co-ordinator, GMRT, NCRA Sanjay Kudale, GMRT, NCRA Shelton Gnanaraj, GMRT, NCRA Andrew Jameson, Swinburne University, Australia Benjamin Barsdel, Swinburne University, Australia (now at Nvidia) CASPER Group, Berkeley Digital Back-end Group, GMRT, NCRA Computer Group, GMRT, NCRA Control Room, GMRT
29 Thank You
Internal Technical Report CPU-GPU based DIGITAL Backend
Internal Technical Report CPU-GPU based DIGITAL Backend S. Harshavardhan Reddy & Irappa M. Halagali Ver. 2.0, 11/06/2014. Index 1. INTRODUCTION 2. BLOCK DIAGRAM 3. SPECIFICATIONS a. ugmrt b. GWB-II c.
More informationConcept Design of a Software Correlator for future ALMA. Jongsoo Kim Korea Astronomy and Space Science Institute
Concept Design of a Software Correlator for future ALMA Jongsoo Kim Korea Astronomy and Space Science Institute Technologies for Correlators ASIC (Application-Specific Integrated Circuit) e.g, ALMA 64-antenna
More informationOSKAR: Simulating data from the SKA
OSKAR: Simulating data from the SKA Oxford e-research Centre, 4 June 2014 Fred Dulwich, Ben Mort, Stef Salvini 1 Overview Simulating interferometer data for SKA: Radio interferometry basics. Measurement
More informationAdaptive selfcalibration for Allen Telescope Array imaging
Adaptive selfcalibration for Allen Telescope Array imaging Garrett Keating, William C. Barott & Melvyn Wright Radio Astronomy laboratory, University of California, Berkeley, CA, 94720 ABSTRACT Planned
More informationSOP for testing PACKETIZED CORRELATOR.
SOP for testing PACKETIZED CORRELATOR. Sandeep C. Chaudhari & Irappa M. Halagali : 31/08/2012. VERSION : 1 Packetized correlator is an general purpose re-configurable digital backend for radio astronomy
More informationSignal processing with heterogeneous digital filterbanks: lessons from the MWA and EDA
Signal processing with heterogeneous digital filterbanks: lessons from the MWA and EDA Randall Wayth ICRAR/Curtin University with Marcin Sokolowski, Cathryn Trott Outline "Holy grail of CASPER system is
More informationSOP for testing 4/8 antenna PACKETIZED CORRELATOR.
SOP for testing 4/8 antenna PACKETIZED CORRELATOR. By : Sandeep C. Chaudhari & Irappa M. Halagali. VERSION : 2 Dated : 21/03/2013 Packetized correlator is a general purpose re-configurable digital backend
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationGPUS FOR NGVLA. M Clark, April 2015
S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40
More informationPARALLEL PROGRAMMING MANY-CORE COMPUTING FOR THE LOFAR TELESCOPE ROB VAN NIEUWPOORT. Rob van Nieuwpoort
PARALLEL PROGRAMMING MANY-CORE COMPUTING FOR THE LOFAR TELESCOPE ROB VAN NIEUWPOORT Rob van Nieuwpoort rob@cs.vu.nl Who am I 10 years of Grid / Cloud computing 6 years of many-core computing, radio astronomy
More informationAA CORRELATOR SYSTEM CONCEPT DESCRIPTION
AA CORRELATOR SYSTEM CONCEPT DESCRIPTION Document number WP2 040.040.010 TD 001 Revision 1 Author. Andrew Faulkner Date.. 2011 03 29 Status.. Approved for release Name Designation Affiliation Date Signature
More informationThe Hardware and software used for this tutorial
Home Group Documentation Mail Archive About 2 antenna correlator From Casper Jump to: navigation, search Tutorial 9: 2 antenna GPU correlator Author: Harshavardhan Reddy (Version 1). Expected completion
More informationThe Breakthrough LISTEN Search for Intelligent Life: A Wideband Data Recorder for the Robert C. Byrd Green Bank Telescope
The Breakthrough LISTEN Search for Intelligent Life: A Wideband Data Recorder for the Robert C. Byrd Green Bank Telescope Dave MacMahon University of California at Berkeley Breakthrough LISTEN SETI Project
More informationJohn W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands
Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes six radio telescope algorithms on
More informationFast Holographic Deconvolution
Precision image-domain deconvolution for radio astronomy Ian Sullivan University of Washington 4/19/2013 Precision imaging Modern imaging algorithms grid visibility data using sophisticated beam models
More informationOSKAR-2: Simulating data from the SKA
OSKAR-2: Simulating data from the SKA AACal 2012, Amsterdam, 13 th July 2012 Fred Dulwich, Ben Mort, Stef Salvini 1 Overview OSKAR-2: Interferometer and beamforming simulator package. Intended for simulations
More informationPARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5)
PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5) Rob van Nieuwpoort Vrije Universiteit Amsterdam & Astron, the Netherlands Institute for Radio Astronomy Why Radio? Credit: NASA/IPAC
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationDevelopment of Focal-Plane Arrays and Beamforming Networks at DRAO
Development of Focal-Plane Arrays and Beamforming Networks at DRAO Bruce Veidt Dominion Radio Astrophysical Observatory Herzberg Institute of Astrophysics National Research Council of Canada Penticton,
More informationSOP for testing 4/8 antenna PACKETIZED CORRELATOR.
SOP for testing 4/8 antenna PACKETIZED CORRELATOR. By : Sandeep C. Chaudhari & Irappa M. Halagali. VERSION : 3 Dated : 05/07/2013 Packetized correlator is a general purpose re-configurable digital backend
More informationA Scalable, FPGA Based 8 Station Correlator Based on Modular Hardware and Parameterized Libraries
A Scalable, FPGA Based 8 Station Correlator Based on Modular Hardware and Parameterized Libraries Aaron Parsons Space Sciences Lab University of California, Berkeley http://seti.berkeley.edu/ CASPER: Center
More informationIncremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs
Incremental Risk Charge With cufft: A Case Study Of Enabling Multi Dimensional Gain With Few GPUs Amit Kalele and Manoj Nambiar April 21, 2014 1 Optimization & Parallelization COE Center of Excellence
More informationStudying GPU based RTC for TMT NFIRAOS
Studying GPU based RTC for TMT NFIRAOS Lianqi Wang Thirty Meter Telescope Project RTC Workshop Dec 04, 2012 1 Outline Tomography with iterative algorithms on GPUs Matri vector multiply approach Assembling
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationReal-Time Support for GPU. GPU Management Heechul Yun
Real-Time Support for GPU GPU Management Heechul Yun 1 This Week Topic: Real-Time Support for General Purpose Graphic Processing Unit (GPGPU) Today Background Challenges Real-Time GPU Management Frameworks
More informationCUDA Experiences: Over-Optimization and Future HPC
CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationTHE VLBA SENSITIVITY UPGRADE
THE VLBA SENSITIVITY UPGRADE Craig Walker, NRAO Eleventh Synthesis Imaging Workshop Socorro, June 10-17, 2008 CONTEXT 2 The VLBA is based on 20 year old technology Only limited new capabilities have been
More informationJason Manley. Internal presentation: Operation overview and drill-down October 2007
Jason Manley Internal presentation: Operation overview and drill-down October 2007 System overview Achievements to date ibob F Engine in detail BEE2 X Engine in detail Backend System in detail Future developments
More informationImaging and Deconvolution
Imaging and Deconvolution Urvashi Rau National Radio Astronomy Observatory, Socorro, NM, USA The van-cittert Zernike theorem Ei E V ij u, v = I l, m e sky j 2 i ul vm dldm 2D Fourier transform : Image
More informationVersal: AI Engine & Programming Environment
Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY
More informationA Multi-GPU Spectrometer System for Real-time Wide Bandwidth Radio Signal Analysis
A Multi-GPU Spectrometer System for Real-time Wide Bandwidth Radio Signal Analysis Hirofumi Kondo, Eric Heien, Masao Okita, Dan Werthimer and Kenichi Hagihara Graduate School of Information Science and
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationLeveraging Mobile GPUs for Flexible High-speed Wireless Communication
0 Leveraging Mobile GPUs for Flexible High-speed Wireless Communication Qi Zheng, Cao Gao, Trevor Mudge, Ronald Dreslinski *, Ann Arbor The 3 rd International Workshop on Parallelism in Mobile Platforms
More informationUsing CUDA to Accelerate Radar Image Processing
Using CUDA to Accelerate Radar Image Processing Aaron Rogan Richard Carande 9/23/2010 Approved for Public Release by the Air Force on 14 Sep 2010, Document Number 88 ABW-10-5006 Company Overview Neva Ridge
More informationSystem Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.
System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has
More informationCUDA 6.0 Performance Report. April 2014
CUDA 6. Performance Report April 214 1 CUDA 6 Performance Report CUDART CUDA Runtime Library cufft Fast Fourier Transforms Library cublas Complete BLAS Library cusparse Sparse Matrix Library curand Random
More informationGpuWrapper: A Portable API for Heterogeneous Programming at CGG
GpuWrapper: A Portable API for Heterogeneous Programming at CGG Victor Arslan, Jean-Yves Blanc, Gina Sitaraman, Marc Tchiboukdjian, Guillaume Thomas-Collignon March 2 nd, 2016 GpuWrapper: Objectives &
More informationDense Linear Algebra. HPC - Algorithms and Applications
Dense Linear Algebra HPC - Algorithms and Applications Alexander Pöppl Technical University of Munich Chair of Scientific Computing November 6 th 2017 Last Tutorial CUDA Architecture thread hierarchy:
More informationEuropean VLBI Network
European VLBI Network Cormac Reynolds, JIVE European Radio Interferometry School, Bonn 12 Sept. 2007 EVN Array 15 dissimilar telescopes Observes 3 times a year (approx 60 days per year) Includes some
More informationCASPER AND GPUS MODERATOR: DANNY PRICE, SCRIBE: RICHARD PRESTAGE. Applications correlators, beamformers, spectrometers, FRB
CASPER AND GPUS MODERATOR: DANNY PRICE, SCRIBE: RICHARD PRESTAGE Frameworks MPI, heterogenous large systems Pipelines hashpipe, psrdata, bifrost, htgs Data transport DPDK, libvma, NTOP Applications correlators,
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationCS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter
More informationParallel FFT Program Optimizations on Heterogeneous Computers
Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid
More informationOptimization and Beamforming of a Two Dimensional Sparse Array
Optimization and Beamforming of a Two Dimensional Sparse Array Mandar A. Chitre Acoustic Research Laboratory National University of Singapore 10 Kent Ridge Crescent, Singapore 119260 email: mandar@arl.nus.edu.sg
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationTRANSFORMATIONAL TECHNOLOGIES
TRANSFORMATIONAL TECHNOLOGIES FOR THREE-DIMENSIONAL VISUALISATION (AND ANALYSIS) Christopher Fluke ESO 3D2014 CRICOS provider 00111D Thank you Key Collaborators: David Barnes Monash University e-research
More informationVLBI progress Down-under. Tasso Tzioumis Australia Telescope National Facility (ATNF) 25 September 2008
VLBI progress Down-under Tasso Tzioumis Australia Telescope National Facility (ATNF) 25 September 2008 Outline Down-under == Southern hemisphere VLBI in Australia (LBA) Progress in the last few years Disks
More informationALMA Memo 386 ALMA+ACA Simulation Tool J. Pety, F. Gueth, S. Guilloteau IRAM, Institut de Radio Astronomie Millimétrique 300 rue de la Piscine, F-3840
ALMA Memo 386 ALMA+ACA Simulation Tool J. Pety, F. Gueth, S. Guilloteau IRAM, Institut de Radio Astronomie Millimétrique 300 rue de la Piscine, F-38406 Saint Martin d'h eres August 13, 2001 Abstract This
More information2011 Signal Processing CoDR: Technology Roadmap W. Turner SPDO. 14 th April 2011
2011 Signal Processing CoDR: Technology Roadmap W. Turner SPDO 14 th April 2011 Technology Roadmap Objectives: Identify known potential technologies applicable to the SKA Provide traceable attributes of
More informationGREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES. Nikolay Markovskiy Peter Messmer
GREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES Nikolay Markovskiy Peter Messmer ABOUT CP2K Atomistic and molecular simulations of solid state From ab initio DFT and Hartree-Fock
More informationA GPU based brute force de-dispersion algorithm for LOFAR
A GPU based brute force de-dispersion algorithm for LOFAR W. Armour, M. Giles, A. Karastergiou and C. Williams. University of Oxford. 8 th May 2012 1 GPUs Why use GPUs? Latest Kepler/Fermi based cards
More informationS.A. Torchinsky, A. van Ardenne, T. van den Brink-Havinga, A.J.J. van Es, A.J. Faulkner (eds.) 4-6 November 2009, Château de Limelette, Belgium
WIDEFIELD SCIENCE AND TECHNOLOGY FOR THE SKA SKADS CONFERENCE 2009 S.A. Torchinsky, A. van Ardenne, T. van den Brink-Havinga, A.J.J. van Es, A.J. Faulkner (eds.) 4-6 November 2009, Château de Limelette,
More informationarxiv: v3 [astro-ph.im] 30 Apr 2018
Cobalt: A GPU-based correlator and beamformer for OFAR P. Chris Broekema 1, J. Jan David Mol 1, Ronald Nijboer 1, Alexander S. van Amesfoort 1, Michiel A. Brentjens 1, G. Marcel oose 1, Wouter F. A. Klijn
More informationOPTICAL COHERENCE TOMOGRAPHY:SIGNAL PROCESSING AND ALGORITHM
OPTICAL COHERENCE TOMOGRAPHY:SIGNAL PROCESSING AND ALGORITHM OCT Medical imaging modality with 1-10 µ m resolutions and 1-2 mm penetration depths High-resolution, sub-surface non-invasive or minimally
More informationAMBER 11 Performance Benchmark and Profiling. July 2011
AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -
More informationASKAP Data Flow ASKAP & MWA Archives Meeting
ASKAP Data Flow ASKAP & MWA Archives Meeting Ben Humphreys ASKAP Software and Computing Project Engineer 25 th March 2013 ASTRONOMY AND SPACE SCIENCE ASKAP @ Murchison Radioastronomy Observatory Australian
More informationFPGA APPLICATIONS FOR SINGLE DISH ACTIVITY AT MEDICINA RADIOTELESCOPES
MARCO BARTOLINI - BARTOLINI@IRA.INAF.IT TORINO 18 MAY 2016 WORKSHOP: FPGA APPLICATION IN ASTROPHYSICS FPGA APPLICATIONS FOR SINGLE DISH ACTIVITY AT MEDICINA RADIOTELESCOPES TORINO, 18 MAY 2016, INAF FPGA
More informationGame-changing Extreme GPU computing with The Dell PowerEdge C4130
Game-changing Extreme GPU computing with The Dell PowerEdge C4130 A Dell Technical White Paper This white paper describes the system architecture and performance characterization of the PowerEdge C4130.
More informationVISUALISATION AND ANALYSIS
VISUALISATION AND ANALYSIS CHALLENGES FOR WALLABY Christopher Fluke David Barnes, Amr Hassan [ Scientific Computing & Visualisation Group ] CRICOSProductions provider 00111D Swinburne Astronomy WALLABY
More informationOPTIMIZED GPU KERNELS FOR DEEP LEARNING. Amir Khosrowshahi
OPTIMIZED GPU KERNELS FOR DEEP LEARNING Amir Khosrowshahi GTC 17 Mar 2015 Outline About nervana Optimizing deep learning at assembler level Limited precision for deep learning neon benchmarks 2 About nervana
More informationEVLA Memo #132 Report on the findings of the CASA Terabyte Initiative: Single-node tests
EVLA Memo #132 Report on the findings of the CASA Terabyte Initiative: Single-node tests S. Bhatnagar NRAO, Socorro May 18, 2009 Abstract This note reports on the findings of the Terabyte-Initiative of
More informationAccelerating the Fast Fourier Transform using Mixed Precision on Tensor Core Hardware
NSF REU - 2018: Project Report Accelerating the Fast Fourier Transform using Mixed Precision on Tensor Core Hardware Anumeena Sorna Electronics and Communciation Engineering National Institute of Technology,
More informationNAMD GPU Performance Benchmark. March 2011
NAMD GPU Performance Benchmark March 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory
More informationAdvanced Research Computing. ARC3 and GPUs. Mark Dixon
Advanced Research Computing Mark Dixon m.c.dixon@leeds.ac.uk ARC3 (1st March 217) Included 2 GPU nodes, each with: 24 Intel CPU cores & 128G RAM (same as standard compute node) 2 NVIDIA Tesla K8 24G RAM
More informationNVIDIA Application Lab at Jülich
Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jülich Dirk Pleiter Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800
More informationA Standalone Package for Bringing Graphics Processor Acceleration to GNU Radio: GRGPU
A Standalone Package for Bringing Graphics Processor Acceleration to GNU Radio: GRGPU William Plishker University of Maryland plishker@umd.edu 1/25 Outline Introduction GPU Background Graphics Processor
More informationGPU-centric communication for improved efficiency
GPU-centric communication for improved efficiency Benjamin Klenk *, Lena Oden, Holger Fröning * * Heidelberg University, Germany Fraunhofer Institute for Industrial Mathematics, Germany GPCDP Workshop
More informationSPEAD Recommended Practice
SPEAD Recommended Practice Document number: Revision: Classification: C Open Source, GPL Author: J. Manley, M. Welz, A. Parsons, S. Ratcliffe, R. van Rooyen Date: Document History Revision Date of Issue
More informationSelecting the right Tesla/GTX GPU from a Drunken Baker's Dozen
Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen GPU Computing Applications Here's what Nvidia says its Tesla K20(X) card excels at doing - Seismic processing, CFD, CAE, Financial computing,
More informationSKA Computing and Software
SKA Computing and Software Nick Rees 18 May 2016 Summary Introduc)on System overview Compu)ng Elements of the SKA Telescope Manager Low Frequency Aperture Array Central Signal Processor Science Data Processor
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationThe Implementation of a Real-time Polyphase Filter
WDS'14 Proceedings of Contributed Papers Physics, 9 14, 2014. ISBN 978-80-7378-276-4 MATFYZPRESS The Implementation of a Real-time Polyphase Filter K. Adámek and J. Novotný Institute of Physics, Faculty
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationEvaluating the Potential of Graphics Processors for High Performance Embedded Computing
Evaluating the Potential of Graphics Processors for High Performance Embedded Computing Shuai Mu, Chenxi Wang, Ming Liu, Yangdong Deng Department of Micro-/Nano-electronics Tsinghua University Outline
More informationCUDA Toolkit 4.0 Performance Report. June, 2011
CUDA Toolkit 4. Performance Report June, 211 CUDA Math Libraries High performance math routines for your applications: cufft Fast Fourier Transforms Library cublas Complete BLAS Library cusparse Sparse
More informationPerformance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster
Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &
More informationCASA. Algorithms R&D. S. Bhatnagar. NRAO, Socorro
Algorithms R&D S. Bhatnagar NRAO, Socorro Outline Broad areas of work 1. Processing for wide-field wide-band imaging Full-beam, Mosaic, wide-band, full-polarization Wide-band continuum and spectral-line
More informationCUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION
CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION WHAT YOU WILL LEARN An iterative method to optimize your GPU code Some common bottlenecks to look out for Performance diagnostics with NVIDIA Nsight
More informationNvidia Tesla The Personal Supercomputer
International Journal of Allied Practice, Research and Review Website: www.ijaprr.com (ISSN 2350-1294) Nvidia Tesla The Personal Supercomputer Sameer Ahmad 1, Umer Amin 2, Mr. Zubair M Paul 3 1 Student,
More informationMeerKAT Data Architecture. Simon Ratcliffe
MeerKAT Data Architecture Simon Ratcliffe MeerKAT Signal Path MeerKAT Data Rates Online System The online system receives raw visibilities from the correlator at a sufficiently high dump rate to facilitate
More informationDr. Evaldas Stankevičius, Regulatory and Security Expert.
2018-08-23 Dr. Evaldas Stankevičius, Regulatory and Security Expert Email: evaldas.stankevicius@tele2.com 1G: purely analog system. 2G: voice and SMS. 3G: packet switching communication. 4G: enhanced mobile
More informationRASDRWin Companion Software for RASDR. Paul Oxley Retired AT&T Microwave Engineer David Fields Stan Kurtz
RASDRWin Companion Software for RASDR Paul Oxley Retired AT&T Microwave Engineer David Fields Stan Kurtz Abstract: An update of the RASDR project will be presented. The paper demonstrates Windows control
More informationDeveloping A Universal Radio Astronomy Backend. Dr. Ewan Barr, MPIfR Backend Development Group
Developing A Universal Radio Astronomy Backend Dr. Ewan Barr, MPIfR Backend Development Group Overview Why is it needed? What should it do? Key concepts and technologies Case studies: MeerKAT FBF and APSUSE
More informationOptimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition
Optimised all-to-all communication on multicore architectures applied to FFTs with pencil decomposition CUG 2018, Stockholm Andreas Jocksch, Matthias Kraushaar (CSCS), David Daverio (University of Cambridge,
More informationThe Control System for the Caltech Millimeter Array. Steve Scott OVRO
The Control System for the Caltech Millimeter Array Steve Scott OVRO Caltech Millimeter Wave Array 6 telescopes, 10 meters in diameter Simultaneous dual receivers (1mm & 3mm) 4GHz IF bandwidth 2x1GHz continuum
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationManycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT
Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate
More informationResearch on performance dependence of cluster computing system based on GPU accelerators on architecture and number of cluster nodes
Research on performance dependence of cluster computing system based on GPU accelerators on architecture and number of cluster nodes D. Akhmedov, S. Yelubayev, T. Bopeyev, F. Abdoldina, D. Muratov, R.
More informationSPIRAL, FFTX, and the Path to SpectralPACK
SPIRAL, FFTX, and the Path to SpectralPACK Franz Franchetti Carnegie Mellon University www.spiral.net In collaboration with the SPIRAL and FFTX team @ CMU and LBL This work was supported by DOE ECP and
More informationERROR RECOGNITION and IMAGE ANALYSIS
PREAMBLE TO ERROR RECOGNITION and IMAGE ANALYSIS 2 Why are these two topics in the same lecture? ERROR RECOGNITION and IMAGE ANALYSIS Ed Fomalont Error recognition is used to determine defects in the data
More informationGPU-ACCELERATED SPECKLE MASKING RECONSTRUCTION ALGORITHM
Journal of the Korean Astronomical Society https://doi.org/10.5303/jkas.2018.51.3.65 51: 65 71, 2018 June pissn: 1225-4614 eissn: 2288-890X c 2018. The Korean Astronomical Society. All rights reserved.
More informationWhat is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms
CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D
More informationCase Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy
Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy Outline Motivation / Overview Participants / Industry Partners Documentation Architecture Current Status and Services
More informationWhat s 5G? Dr Dean Economou Chief Transport Strategist, Telstra
What s 5G? Dr Dean Economou Chief Transport Strategist, Telstra Spoiler alert Page 2 5G key features Higher speeds for more users at once More consistent and reliable connections Lower delay (latency)
More information