RT 3D FDTD Simulation of LF and MF Room Acoustics
|
|
- Laura Francis
- 5 years ago
- Views:
Transcription
1 RT 3D FDTD Simulation of LF and MF Room Acoustics ANDREA EMANUELE GRECO Id ADVANCED COMPUTER ARCHITECTURES (A.A. 2010/11) Prof.Ing. Cristina Silvano Dr.Ing. Vittorio Zaccaria
2 Computer modeling techniques 2 Main Applications: spatialization of sound or speech, computer games, architectural design tools, Auralization and Room Acoustic Simulations, etc. Level Editor, Half Life Symphony Hall, Boston Level of accuracy Depends strongly on the model used, on the application requirements and on the computational resources.
3 Goals of the Research 3 Goal: to show that it's possible to perform room acoustics real time simulation on a limited bandwith (Low-Mid Frequencies) with the help of parallel computation capabilities of a modern GPU. Strategy: use of a RT FDTD model for a modest-size geometry (ca. 100m 3 ) implemented with a GPU architecture. Results: the system is able to handle several simultaneous sound sources and a moving listener with no additional cost; simulation performed up to 7KHz sampling rate (considering dispersion error limit the actual bandiwth is 1.5kHz); performance comparison of different schemes (SRL vs. IWB) and different geometries shapes and sizes.
4 Sound Propagation Modeling Geometrical Methods (Ray-Based Modeling) efficient at high frequencies Image-Source and Beam-Tracing techniques are widely used. 4 PRO: Fast, Efficient, Simple CON: Lack of diffraction properties (typical of LF-MF behavior) and neglect of sound waves phase. Numerical Methods (Wave-Based Modeling) low frequency behavior model 3D Wave Equation, Several schemes (IDWM, ARD, FDTD). PRO: High levels of detail, efficient, well-suited to parallel architectures such as GPUs CON: High computational expense (but unavoidable, due to physical considerations). High f s small Δx hign n.of nodes High Comp. Load Ideal Approach wave-based method at high sample rates or hybrid models that apply the two different approaches for different frequency bands.
5 GPU-Enhanced Room Acoustic Modeling 5 GPUs: Targeted mainly for graphics. New Trend: to increase the programmability and use for non-graphics tasks General Purpose GPU (GPGPU) GPU almost 70 fold performance gain over a CPU implementation in a 2D case. The parallelization gain is linear (doubling the number of processors doubles the performance as well). Specific Algorithms have been developed to be more generally parallelizable in order to be suitable for multi-core processor architectures (wave-based methods) FDTD vs. GPUs: out of the wave-based techniques the FDTD method is the most straightforward to parallelize: the computation can be distributed to several processors operating independently from each other.
6 FDTD Compact Explicit Schemes Main Assumptions: rectangular grids and compact schemes The space is discretized and modeled as a regular grid in which only the nearest neighbors of a node (depending on the scheme) are needed in the computation of its new value. The 3D mesh equation depends on several coefficients (λ, a, b), determined by the chosen FDTD scheme. The sampling rate of the mesh is f s = c/λδx, where Δx is the grid spacing. The Digital Waveguide Mesh Methods form a subset of the FDTD schemes in which the relation of c and Δx is fixed based on the mesh topology. 6 SRL scheme: computationally efficient (only one of the d i coefficients, d 1, is non-zero such that only 6 neighbors are involved in the computation) IWB scheme: covers the widest frequency range still having least dispersion, thus suiting best for Real-Time auralization.
7 Modern GPU Structure 7 Programming a GPU: most popular API CUDA API by NVIDIA. SIMT Interface: the programmer writes a kernel for each thread; then enough threads has to be launched to accomplish the desired task. The underlying CUDA runtime runs those threads in parallel. Warps: threads are grouped into each SM such that all of them have the same execution pattern no extra performance penalty. For performance reasons threads in the same warp will access memory locations close to each other (Spatial Locality Principle). Advantage: This architecture is suitable for data-parallel problems (e.g. FDTD simulations), where it is sufficient to have a kernel that computes the actual value of the FDTD equation in one node, and then launch one thread for each node in the mesh.
8 Performance Penalty Issues Memory Bandwith Limit: between the global memory and SMs bottleneck A FDTD simulation (10 6 nodes, f s = 44.1kHz) would need a data rate of at least 8 500GB/s (4 bytes/node * 10 6 nodes/layer * 3 layers/update * updates/sec), while the current memory bus bandwidths are around GB/s. Latency: fetching data from the global memory (the on-chip memory is more complicated to use and often only a part of it is used, to store constants (constant memory). Solutions: To hide the memory latencies many threads as possible in execution at a time (some threads are executed while the others wait for their memory fetches to finish). Common advised value: thousands of threads in parallel. The use of Cache Memory provides fast access to the most often needed data items (Fermi architecture by NVIDIA provides a 2-level cache hierarchy).
9 Implementation System 9 System's Workflow: Audio Input Stream Downsampling Feeding the signal to the sound sources nodes Mesh Update Output signal from the listener nodes Upsampling Audio Output Device (mono). CPU: handle audio input and output, performs the required sampling rate conversions (integer factor) and copute the required filters. GPU: performs the FDTD simulation iteratively. 1 time step = 2 kernel launches. The first one updates the normal mesh nodes (internal, source, listeners). Launch of N threads (N is the number of nodes in the mesh); The second launch is used to update the DIFs (the number of threads equals the number of boundary filters). Computer Setup: Intel Pentium Dual CPU E2180 (2.00GHz), 2GB RAM, Nvidia Quadro FX 5800 (4GB RAM) connected to the PCIe bus (2 GB/s bandwidth). Audio playback: Windows Wave-Out API. GPU code: NVIDIA CUDA library.
10 FDTD modeling on GPU P(n) computation only two separate memory areas, instead of three (n+1, n). Node Stored Information: position, p(n), node type (source, listener, boundary). Global Memory Alllocation: mesh memory, node types, DIFs. Kernel: separate kernels for the 2 schemes (SRL fetch of only 6 neighbor values; IWB fetch of 27 values needed). 10 Sound Sources and Listeners: treated by the same kernel but with their own buffers for input and output signals. Sound Source: the new excitation value is read from the input buffer and set to the actual value of the node). Listeners are transparent (updated similarly to other mesh interior nodes, but the value of time step n is stored to the listener output buffer). Advanced technique: two listener nodes for one listener smooth movements in the scene allowed (the actual output signal is computed as a linear cross-fade of the two listener signals to avoide transients when a listener moves from one node to another).
11 Boundary Model and DIFs 11 Boundary Model: the ghost points that lie outside of the actual mesh are eliminated in the final equation, being replaced by a DIF (IIR filter) 0 th -order filter frequency independent impedance boundary condition. Higher filter orders frequency dependent boundary conditions. Implementation of frequency independent boundaries is less memory consuming than that of higher-order DIFs since there is no actual need for the actual DIF filter. Mesh Geometry: one DIF for each boundary node, with order one or higher. DIF update kernel: one kernel will handle one DIF. Computation time: increases with the order of the filter. Memory Allocation: the coefficients of the impedance filters are precomputed (constant memory). The memory needed for the actual filters is allocated from the global memory.
12 Simulation Results: Mesh Nodes Geometries: two different rooms (living room sized, and concert hall sized) 12 Real-Time Performance: the simulation runs for 512 steps. The maximum update frequency f s is searched by gradually decreasing Δx. Dispersion: the chosen schemes can not be compared just by looking at the maximum f s since they have different dispersion characteristics threshold for the maximum allowed dispersion (10%), used to get the upper limit frequency (f l ) describing the actual valid bandwidth. Frequency limits: SRL f l = 0.16fs, IWB f l = 0.37fs. Audibility of dispersion: depends heavily on the distance from the sound source. The number of sound sources does not affect the performance in practice. In this setup boundary nodes are set to a frequency-independent impedance (Effects Superposition Principle).
13 Simulation Results: Boundary Nodes 13 Geometry: For testing the performance of the DIFs a third geometry has been used (modified concert hall: the volume is the same, but the space is divided into 12 smaller rooms more boundary nodes (Δx = 28cm 500k nodes). Reference Result: simulation of 512 time steps with a mesh with the same number of nodes but all of them of normal type (no boundary nodes). Simulation Results: simulation performed iteratively for different filter orders up to the 10th. For each case, the mesh computation time is recorded. 0th order DIFs: the additional computational cost is minimal (<12%) 1st order DIFs: the computation time increases remarkably (increasing the filter order above one increases the computation time only modestly). Hall Model: the relative increase is smaller (less boundary nodes) SRL scheme: larger relative cost (filter update costs are equal in both schemes but the cost of the actual node update is smaller in the SRL scheme).
14 References 14 Lauri Savioja, Real-Time 3D Finite-Difference Time-Domain Simulation of Low-and-Mid Frequency Room Acoustics in Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, Jaakko S. Juntunen and Theodoros D. Tsiboukis, Reduction of Numerical Dispersion in FDTD Method Through Artificial Anisotropy, IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 48, NO. 4, APRIL 2000 Konrad Kowalczyk and Maarten van Walstijn, Room Acoustics Simulation Using 3-D Compact Explicit FDTD Schemes, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 1, JANUARY 2011 Craig J. Webb and Stefan Bilbao, Computing room acoustics with CUDA - 3D FDTD schemes with boundary losses and viscosity
15 Thank You 15
GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA
GPGPU LAB Case study: Finite-Difference Time- Domain Method on CUDA Ana Balevic IPVS 1 Finite-Difference Time-Domain Method Numerical computation of solutions to partial differential equations Explicit
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 9, 20 http://acousticalsociety.org/ ICA 20 Montreal Montreal, Canada 2-7 June 20 Noise Session pnsc: Joint Poster Session on Noise and Architectural Acoustics
More informationB. Tech. Project Second Stage Report on
B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic
More informationAcoustic Simulation. COMP 768 Presentation Lakulish Antani April 9, 2009
Acoustic Simulation COMP 768 Presentation Lakulish Antani April 9, 2009 Acoustic Simulation Sound Synthesis Sound Propagation Sound Rendering 2 Goal Simulate the propagation of sound in an environment
More informationGPU Based Sound Simulation and Visualization
GPU Based Sound Simulation and Visualization Torbjorn Loken, Sergiu M. Dascalu, and Frederick C Harris, Jr. Department of Computer Science and Engineering University of Nevada Reno, Nevada, USA Fred.Harris@cse.unr.edu
More informationFast BVH Construction on GPUs
Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California
More informationAcknowledgements. Prof. Dan Negrut Prof. Darryl Thelen Prof. Michael Zinn. SBEL Colleagues: Hammad Mazar, Toby Heyn, Manoj Kumar
Philipp Hahn Acknowledgements Prof. Dan Negrut Prof. Darryl Thelen Prof. Michael Zinn SBEL Colleagues: Hammad Mazar, Toby Heyn, Manoj Kumar 2 Outline Motivation Lumped Mass Model Model properties Simulation
More informationHow to Optimize Geometric Multigrid Methods on GPUs
How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient
More informationScalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009
Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation
More informationCUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni
CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3
More informationCUDA. GPU Computing. K. Cooper 1. 1 Department of Mathematics. Washington State University
GPU Computing K. Cooper 1 1 Department of Mathematics Washington State University 2014 Review of Parallel Paradigms MIMD Computing Multiple Instruction Multiple Data Several separate program streams, each
More informationACHIEVING REALISTIC AURALISATIONS USING AN EFFICIENT HYBRID 2D MULTI-PLANE FDTD ACOUSTIC MODEL
ACHIEVING REALISTIC AURALISATIONS USING AN EFFICIENT HYBRID 2D MULTI-PLANE FDTD ACOUSTIC MODEL Stephen Oxnard, University of York Audio Lab, Department of Electronics York, UK so523@york.ac.uk Damian Murphy,
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationReal-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010
1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.
More informationAccelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include
3.1 Overview Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include GPUs (Graphics Processing Units) AMD/ATI
More informationAdaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA
Adaptive Mesh Astrophysical Fluid Simulations on GPU San Jose 10/2/2009 Peng Wang, NVIDIA Overview Astrophysical motivation & the Enzo code Finite volume method and adaptive mesh refinement (AMR) CUDA
More informationNVIDIA. Interacting with Particle Simulation in Maya using CUDA & Maximus. Wil Braithwaite NVIDIA Applied Engineering Digital Film
NVIDIA Interacting with Particle Simulation in Maya using CUDA & Maximus Wil Braithwaite NVIDIA Applied Engineering Digital Film Some particle milestones FX Rendering Physics 1982 - First CG particle FX
More informationProfiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency
Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu
More informationLecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1
Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationOpenACC programming for GPGPUs: Rotor wake simulation
DLR.de Chart 1 OpenACC programming for GPGPUs: Rotor wake simulation Melven Röhrig-Zöllner, Achim Basermann Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU) GPU computing
More informationGraphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE
18th World Conference on Nondestructive Testing, 16-20 April 2012, Durban, South Africa Graphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE Nahas CHERUVALLYKUDY, Krishnan BALASUBRAMANIAM
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationGPU Fundamentals Jeff Larkin November 14, 2016
GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate
More information1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.
1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 1 Structure of a Graphics Adapter Video Memory Graphics
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationFundamental CUDA Optimization. NVIDIA Corporation
Fundamental CUDA Optimization NVIDIA Corporation Outline! Fermi Architecture! Kernel optimizations! Launch configuration! Global memory throughput! Shared memory access! Instruction throughput / control
More informationRoom Acoustics. CMSC 828D / Spring 2006 Lecture 20
Room Acoustics CMSC 828D / Spring 2006 Lecture 20 Lecture Plan Room acoustics basics Structure of room impulse response Characterization of room acoustics Modeling of reverberant response Basics All our
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationPorting a parallel rotor wake simulation to GPGPU accelerators using OpenACC
DLR.de Chart 1 Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC Melven Röhrig-Zöllner DLR, Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU)
More informationACCELERATION OF IMAGE RESTORATION ALGORITHMS FOR DYNAMIC MEASUREMENTS IN COORDINATE METROLOGY BY USING OPENCV GPU FRAMEWORK
URN (Paper): urn:nbn:de:gbv:ilm1-2014iwk-140:6 58 th ILMENAU SCIENTIFIC COLLOQUIUM Technische Universität Ilmenau, 08 12 September 2014 URN: urn:nbn:de:gbv:ilm1-2014iwk:3 ACCELERATION OF IMAGE RESTORATION
More informationS WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS. Jakob Progsch, Mathias Wagner GTC 2018
S8630 - WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS Jakob Progsch, Mathias Wagner GTC 2018 1. Know your hardware BEFORE YOU START What are the target machines, how many nodes? Machine-specific
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationDigital Sound Ming C. Lin & Zhimin Ren
Digital Sound Ming C. Lin & Zhimin Ren Department of Computer Science University of North Carolina http://gamma.cs.unc.edu/sound How can it be done? Foley artists manually make and record the sound from
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction
More informationGPU Performance Optimisation. Alan Gray EPCC The University of Edinburgh
GPU Performance Optimisation EPCC The University of Edinburgh Hardware NVIDIA accelerated system: Memory Memory GPU vs CPU: Theoretical Peak capabilities NVIDIA Fermi AMD Magny-Cours (6172) Cores 448 (1.15GHz)
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationFundamental CUDA Optimization. NVIDIA Corporation
Fundamental CUDA Optimization NVIDIA Corporation Outline Fermi/Kepler Architecture Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationCUDA OPTIMIZATIONS ISC 2011 Tutorial
CUDA OPTIMIZATIONS ISC 2011 Tutorial Tim C. Schroeder, NVIDIA Corporation Outline Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control
More informationConcert hall geometry optimization with parametric modeling tools and wave-based acoustic simulations
Toronto, Canada International Symposium on Room Acoustics 2013 June 9--11 Concert hall geometry optimization with parametric modeling tools and wave-based acoustic simulations Philip W. Robinson (philip.robinson@aalto.fi)
More informationEE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin
EE382 (20): Computer Architecture - ism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez The University of Texas at Austin 1 Recap 2 Streaming model 1. Use many slimmed down cores to run in parallel
More informationAspects of RF Simulation and Analysis Software Methods. David Carpenter. Remcom. B = t. D t. Remcom (Europe)
Remcom (Europe) Central Boulevard Blythe Valley Park Solihull West Midlands England, B90 8AG www.remcom.com +44 870 351 7640 +44 870 351 7641 (fax) Aspects of RF Simulation and Analysis Software Methods
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationWarps and Reduction Algorithms
Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum
More informationImplementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU
Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.
More informationREDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS
BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the
More informationNVIDIA Fermi Architecture
Administrivia NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 4 grades returned Project checkpoint on Monday Post an update on your blog beforehand Poster
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationMassively Parallel Architectures
Massively Parallel Architectures A Take on Cell Processor and GPU programming Joel Falcou - LRI joel.falcou@lri.fr Bat. 490 - Bureau 104 20 janvier 2009 Motivation The CELL processor Harder,Better,Faster,Stronger
More informationMulti Agent Navigation on GPU. Avi Bleiweiss
Multi Agent Navigation on GPU Avi Bleiweiss Reasoning Explicit Implicit Script, storytelling State machine, serial Compute intensive Fits SIMT architecture well Navigation planning Collision avoidance
More informationModelling, Auralization and Acoustic Virtual Reality ERIK MOLIN
Modelling, Auralization and Acoustic Virtual Reality ERIK MOLIN Overview Auralization Overview & motivation Audio sources Room models Receiver modelling Auralization what and why? For a given space, sound
More informationCOMPUTATIONAL OPTIMIZATION OF A TIME-DOMAIN BEAMFORMING ALGORITHM USING CPU AND GPU
BeBeC-214-9 COMPUTATIONAL OPTIMIZATION OF A TIME-DOMAIN BEAMFORMING ALGORITHM USING CPU AND GPU Johannes Stier, Christopher Hahn, Gero Zechel and Michael Beitelschmidt Technische Universität Dresden, Institute
More informationVery fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards
Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied
More informationOptimizing Multiple GPU FDTD Simulations in CUDA
Center of Applied Electromagnetic Systems Research (CAESR) Optimizing Multiple GPU FDTD Simulations in CUDA Matthew J. Inman mjinman@olemiss.edu Atef Z. Elsherbeni Center For Applied Electromagnetics Systems
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU
More informationFundamental Optimizations in CUDA Peng Wang, Developer Technology, NVIDIA
Fundamental Optimizations in CUDA Peng Wang, Developer Technology, NVIDIA Optimization Overview GPU architecture Kernel optimization Memory optimization Latency optimization Instruction optimization CPU-GPU
More informationOpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data
OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data Andrew Miller Computer Vision Group Research Developer 3-D TERRAIN RECONSTRUCTION
More informationDense Linear Algebra. HPC - Algorithms and Applications
Dense Linear Algebra HPC - Algorithms and Applications Alexander Pöppl Technical University of Munich Chair of Scientific Computing November 6 th 2017 Last Tutorial CUDA Architecture thread hierarchy:
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationBy : Veenus A V, Associate GM & Lead NeST-NVIDIA Center for GPU computing, Trivandrum, India Office: NeST/SFO Technologies, San Jose, CA,
By : Veenus A V, Associate GM & Lead NeST-NVIDIA Center for GPU computing, Trivandrum, India Office: NeST/SFO Technologies, San Jose, CA, www.nestsoftware.com veenusav @ gmail. com Sri Buddha Do not simply
More informationCartoon parallel architectures; CPUs and GPUs
Cartoon parallel architectures; CPUs and GPUs CSE 6230, Fall 2014 Th Sep 11! Thanks to Jee Choi (a senior PhD student) for a big assist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ~ socket 14 ~ core 14 ~ HWMT+SIMD
More informationHigh-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs
High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea
More informationFOURTH-ORDER AND OPTIMISED FINITE DIFFERENCE SCHEMES FOR THE 2-D WAVE EQUATION
Proc. of the 6 th Int. Conference on Digital Audio Effects (DAFx-), Maynooth, Ireland, September -6, FOURTH-ORDER AND OPTIMISED FINITE DIFFERENCE SCHEMES FOR THE -D WAVE EQUATION Brian Hamilton, Acoustics
More informationAuralization and Geometric acoustics ERIK MOLIN, HANNA AUTIO
Auralization and Geometric acoustics ERIK MOLIN, HANNA AUTIO Auralization what and why? For a given acoustic situation (space, sound source(s), listener position ), what sound does the listener hear? Auralization
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationCS 179: GPU Computing
CS 179: GPU Computing LECTURE 2: INTRO TO THE SIMD LIFESTYLE AND GPU INTERNALS Recap Can use GPU to solve highly parallelizable problems Straightforward extension to C++ Separate CUDA code into.cu and.cuh
More informationAalto Universtiy, Department of Computer Sciences, Espoo, Finland Federal University of Santa Maria, Laboratory of Acoustics, Santa Maria, Brazil
J Saarelma G Greco Aalto Universtiy, Department of Computer Sciences, Espoo, Finland Federal University of Santa Maria, Laboratory of Acoustics, Santa Maria, Brazil Visualization is an intuitive way to
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationA TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE
A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA
More informationSupporting Data Parallelism in Matcloud: Final Report
Supporting Data Parallelism in Matcloud: Final Report Yongpeng Zhang, Xing Wu 1 Overview Matcloud is an on-line service to run Matlab-like script on client s web browser. Internally it is accelerated by
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationMartin Dubois, ing. Contents
Martin Dubois, ing Contents Without OpenNet vs With OpenNet Technical information Possible applications Artificial Intelligence Deep Packet Inspection Image and Video processing Network equipment development
More informationAn Architecture Using a Finite Difference Method to Calculate Realistic Sound Equalization in Games
An Architecture Using a Finite Difference Method to Calculate Realistic Sound Equalization in Games B. Moreira E.W. C. Gonzales M. Kischinhevsky MediaLab Computer Departament Universidade Federal Fluminense
More informationGPU-Based Simulation of Spiking Neural Networks with Real-Time Performance & High Accuracy
GPU-Based Simulation of Spiking Neural Networks with Real-Time Performance & High Accuracy Dmitri Yudanov, Muhammad Shaaban, Roy Melton, Leon Reznik Department of Computer Engineering Rochester Institute
More informationEfficient Tridiagonal Solvers for ADI methods and Fluid Simulation
Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular
More informationCUDA and GPU Performance Tuning Fundamentals: A hands-on introduction. Francesco Rossi University of Bologna and INFN
CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction Francesco Rossi University of Bologna and INFN * Using this terminology since you ve already heard of SIMD and SPMD at this school
More informationCS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS
CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in
More informationGraphics Processing Unit Architecture (GPU Arch)
Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics
More informationCUDA Experiences: Over-Optimization and Future HPC
CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign
More informationCUDA PROGRAMMING MODEL. Carlo Nardone Sr. Solution Architect, NVIDIA EMEA
CUDA PROGRAMMING MODEL Carlo Nardone Sr. Solution Architect, NVIDIA EMEA CUDA: COMMON UNIFIED DEVICE ARCHITECTURE Parallel computing architecture and programming model GPU Computing Application Includes
More informationIntroduction to Multicore Programming
Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming
More informationPOST-SIEVING ON GPUs
POST-SIEVING ON GPUs Andrea Miele 1, Joppe W Bos 2, Thorsten Kleinjung 1, Arjen K Lenstra 1 1 LACAL, EPFL, Lausanne, Switzerland 2 NXP Semiconductors, Leuven, Belgium 1/18 NUMBER FIELD SIEVE (NFS) Asymptotically
More informationImproving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm
Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationGeneric Polyphase Filterbanks with CUDA
Generic Polyphase Filterbanks with CUDA Jan Krämer German Aerospace Center Communication and Navigation Satellite Networks Weßling 04.02.2017 Knowledge for Tomorrow www.dlr.de Slide 1 of 27 > Generic Polyphase
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationOptical Flow Estimation with CUDA. Mikhail Smirnov
Optical Flow Estimation with CUDA Mikhail Smirnov msmirnov@nvidia.com Document Change History Version Date Responsible Reason for Change Mikhail Smirnov Initial release Abstract Optical flow is the apparent
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationMaximizing Face Detection Performance
Maximizing Face Detection Performance Paulius Micikevicius Developer Technology Engineer, NVIDIA GTC 2015 1 Outline Very brief review of cascaded-classifiers Parallelization choices Reducing the amount
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview
More informationScientific Computing on GPUs: GPU Architecture Overview
Scientific Computing on GPUs: GPU Architecture Overview Dominik Göddeke, Jakub Kurzak, Jan-Philipp Weiß, André Heidekrüger and Tim Schröder PPAM 2011 Tutorial Toruń, Poland, September 11 http://gpgpu.org/ppam11
More information