Applying Graphics Processor Acceleration in a Software Defined Radio Prototyping Environment

Similar documents
A Standalone Package for Bringing Graphics Processor Acceleration to GNU Radio: GRGPU

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms

Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs

MatCL - OpenCL MATLAB Interface

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Introduction to GPU hardware and to CUDA

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Seamless Dynamic Runtime Reconfiguration in a Software-Defined Radio

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Introduction to CUDA programming

RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Technology for a better society. hetcomp.com

Real-time Graphics 9. GPGPU

What s New with GPGPU?

Using Graphics Chips for General Purpose Computation

Generating high-performance multiplatform finite element solvers using the Manycore Form Compiler and OP2

CS 179: GPU Programming

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Prospects for a more robust, simpler and more efficient shader cross-compilation pipeline in Unity with SPIR-V

Real-time Graphics 9. GPGPU

Accelerated Machine Learning Algorithms in Python

CS427 Multicore Architecture and Parallel Computing

RFNoC Deep Dive: Host Side Martin Braun 5/28/2015

General Purpose Computing on Graphical Processing Units (GPGPU(

GNU Radio Technical Update

Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17

Today s Content. Lecture 7. Trends. Factors contributed to the growth of Beowulf class computers. Introduction. CUDA Programming CUDA (I)

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

c 2010 Nicholas C. Bray

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Graphics Hardware. Instructor Stephen J. Guy

Shaders. Slide credit to Prof. Zwicker

Parallel Execution of Kahn Process Networks in the GPU

Our Technology Expertise for Software Engineering Services. AceThought Services Your Partner in Innovation

Trends in HPC (hardware complexity and software challenges)

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

Towards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc.

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

GpuPy: Accelerating NumPy With a GPU

OpenCPI & GNU Radio Integration for Heterogeneous Processing

Accelerating CFD with Graphics Hardware

Accelerating image registration on GPUs

JCudaMP: OpenMP/Java on CUDA

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

CUDA Architecture & Programming Model

Using GPUs. Visualization of Complex Functions. September 26, 2012 Khaldoon Ghanem German Research School for Simulation Sciences

GPU programming. Dr. Bernhard Kainz

VSC Users Day 2018 Start to GPU Ehsan Moravveji

High Performance Computing with Accelerators

SIMD Exploitation in (JIT) Compilers

GPGPU, 4th Meeting Mordechai Butrashvily, CEO GASS Company for Advanced Supercomputing Solutions

More performance options

PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent

Scientific Computing with Python and CUDA

Introduction to Runtime Systems

Musemage. The Revolution of Image Processing

The Heterogeneous Programming Jungle. Service d Expérimentation et de développement Centre Inria Bordeaux Sud-Ouest

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Current Trends in Computer Graphics Hardware

FPGA-based Supercomputing: New Opportunities and Challenges

Introduction to CUDA

A Multi-Tiered Optimization Framework for Heterogeneous Computing

Presenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs

CUDA GPGPU Workshop 2012

Compiler Theory Introduction and Course Outline Sandro Spina Department of Computer Science

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

Enhancing GNU Radio for Run-Time Assembly of FPGA-Based Accelerators

OP2 FOR MANY-CORE ARCHITECTURES

SPOC : GPGPU programming through Stream Processing with OCaml

CSC573: TSHA Introduction to Accelerators

OpenMP and GPU Programming

GPGPU. Peter Laurens 1st-year PhD Student, NSC

Plattformübergreifende Softwareentwicklung für heterogene Multicore-Systeme

General Purpose GPU Computing in Partial Wave Analysis

Manycore Processors. Manycore Chip: A chip having many small CPUs, typically statically scheduled and 2-way superscalar or scalar.

rcuda: an approach to provide remote access to GPU computational power

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

EE 7722 GPU Microarchitecture. Offered by: Prerequisites By Topic: Text EE 7722 GPU Microarchitecture. URL:

ATS-GPU Real Time Signal Processing Software

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

An Evaluation of Unified Memory Technology on NVIDIA GPUs

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

Offloading Java to Graphics Processors

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

GPU Computing Master Clss. Development Tools

frame buffer depth buffer stencil buffer

CME 213 S PRING Eric Darve

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands

Automated Finite Element Computations in the FEniCS Framework using GPUs

Getting Started with Intel SDK for OpenCL Applications

Profiling High Level Heterogeneous Programs

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS

Transcription:

Applying Graphics Processor Acceleration in a Software Defined Radio Prototyping Environment GNU Radio with Graphics Processor Acceleration as a Standalone Package Will Plishker, George F. Zaki, Shuvra S. Bhattacharyya University of Maryland Charles Clancy, John Kuykendall Laboratory for Telecommunication Sciences 1/23

Outline Introduction Related Work GNU Radio Background Graphics Processor Integration with GNU Radio Evaluation Summary 2/23

Motivation High-Performance prototyping with GPUs is tempting Potentially fast design times Flop/$ but hard. Learn a new language Expose or refactor parallelism 3/23

A brief history of GPU Programming OpenGL (1992)/DirectX(1995) + shader programming in assembly Cg, GLSL, HLSL (early 2000s) Sh, Brook (2004) CUDA (2006) OpenCL (2008) 4/23

Goals for a GPU Rapid Prototyping Tool Reusable Portable Transparent High-performance 5/23

Related Work GNU Radio Acceleration Cell FPGAs Tilera General Purpose Processor (GPP) Multicore CUDA Integration with interpreted languages PyCUDA MATLAB Multicore Toolbox 6/23

Related Work PyCUDA 7/23

Related Work MATLAB 8/23

Background GNU Radio Design Flow GNU Radio Companion (GRC) GNU Radio Python Flowgraph class top(gr.top_block): def init (self): gr.top_block. init (self) ntaps = 256 # Something vaguely like floating point ops self.flop = 2 * ntaps * options.npipelines src = gr.null_source(gr.sizeof_float) head = gr.head(gr.sizeof_float, int(options.nsamples)) self.connect(src, head) for n in range(options.npipelines): self.connect(head, pipeline(options.nstages, ntaps)) Universal Software Radio Peripheral (USRP) 9/23

Proposed Design Flow GRC GR DIF Start from an unambiguous application description based on the application area itself Dataflow Dataflow Interchange Format Package Manual Mapping Use tools to optimize scheduling, assignment Generate an accelerated functional description GRGPU GNU Radio Engine 10/23

GNU Radio for Graphics Processing Units (GRGPU) GNU Radio C++ Block with call to GPU Kernel (.cc) GRGPU device_work() GPU Kernel in C++ (.cc) libtool GPU Kernel in CUDA (.cu) CUDA Synthesizer (nvcc) libcudart libcutil CUDA Libraries Stand alone python package 11/23

Build Process make-cuda :.cu.cc nvcc as a software synthesizer transform cuda code into target specific GPU code wrap functions in extern C make Resulting cc s are just more inputs to existing build structure link against CUDA runtime (cudart) and CUDA utils (cutil) 12/23

Using GRGPU Used just like gr in GNU Radio flowgraphs import grgpu op = grgpu.op_cuda() Dependent blocks Host/device data movement handled by other blocks Has device pointers as interface 13/23

GRGPU Example class GPU_Thread(threading.Thread): """GPU Thread""" def init(self): self.tb = gr.top_block () src_data = list(math.sin(x) for x in range(total_length)) src = gr.vector_source_f (src_data) h2d = grgpu.h2d_cuda() taps = [0.1]*60 op = grgpu.fir_filter_fff_cuda(taps) d2h = grgpu.d2h_cuda() self.dst = gr.vector_sink_f () self.tb.connect(src, h2d) self.tb.connect(h2d, op) self.tb.connect(op, d2h) self.tb.connect(d2h, self.dst) source H2D FIR (GPU accel) D2H sink 14/23

Multicore with GRGPU GPP 1 GPU GPP 2 15/23

Evaluation Benchmark GPU options with GNU Radio multicore benchmark (mp-sched) 2 Intel Xeons (3GHz) and 1 GTX 260 GPU and GPP accelerated FIR CUDA and SSE acceleration Vary: size and assignment Measure: total execution time 16/23

Evaluation Application # of Stages FIR FIR FIR SRC FIR FIR FIR SNK FIR FIR FIR # of Pipelines FIR FIR FIR 17/23

Evaluation Application Half CPU # of Stages GPP GPU SRC # of Pipelines GPP GPP GPP GPU GPU GPU SNK 18/23

Evaluation Application One GPP # of Stages GPU SRC # of Pipelines GPU GPU GPP SNK 19/23

Evaluation Application One GPU # of Stages GPU SRC # of Pipelines GPP GPP GPP SNK 20/23

Results 1200 Execution time (ms) 1000 800 600 400 200 0 1x20 2x20 3x20 4x20 Block Assignment Scheme 21/23

Conclusions GRGPU provides GPU acceleration to GNU Radio as a stand alone library Facilitates design space exploration Status Planned to be contributed back Few but growing number of CUDA accelerated blocks Future Work Automated Mapping MultiGPU 22/23

Questions? 23/23

Create GRGPU Block python script./create_new_cuda_block.py <new_block_cuda> Creates and internally renames:./grc/grgpu_new_block_cuda.xml./swig/grgpu_new_block_cuda.i./lib/grgpu_new_block_cuda_kernel.cu./lib/qa_grgpu_new_block_cuda.cc./lib/grgpu_new_block_cuda.cc./lib/qa_grgpu_new_block_cuda.h./lib/grgpu_new_block_cuda.h Rewrites:./lib/Makefile.am./lib/make-cuda./swig/grgpu.i./swig/Makefile.am 24/23