RT 3D FDTD Simulation of LF and MF Room Acoustics

Size: px

Start display at page:

Download "RT 3D FDTD Simulation of LF and MF Room Acoustics"

Laura Francis
5 years ago
Views:

1 RT 3D FDTD Simulation of LF and MF Room Acoustics ANDREA EMANUELE GRECO Id ADVANCED COMPUTER ARCHITECTURES (A.A. 2010/11) Prof.Ing. Cristina Silvano Dr.Ing. Vittorio Zaccaria

2 Computer modeling techniques 2 Main Applications: spatialization of sound or speech, computer games, architectural design tools, Auralization and Room Acoustic Simulations, etc. Level Editor, Half Life Symphony Hall, Boston Level of accuracy Depends strongly on the model used, on the application requirements and on the computational resources.

3 Goals of the Research 3 Goal: to show that it's possible to perform room acoustics real time simulation on a limited bandwith (Low-Mid Frequencies) with the help of parallel computation capabilities of a modern GPU. Strategy: use of a RT FDTD model for a modest-size geometry (ca. 100m 3 ) implemented with a GPU architecture. Results: the system is able to handle several simultaneous sound sources and a moving listener with no additional cost; simulation performed up to 7KHz sampling rate (considering dispersion error limit the actual bandiwth is 1.5kHz); performance comparison of different schemes (SRL vs. IWB) and different geometries shapes and sizes.

4 Sound Propagation Modeling Geometrical Methods (Ray-Based Modeling) efficient at high frequencies Image-Source and Beam-Tracing techniques are widely used. 4 PRO: Fast, Efficient, Simple CON: Lack of diffraction properties (typical of LF-MF behavior) and neglect of sound waves phase. Numerical Methods (Wave-Based Modeling) low frequency behavior model 3D Wave Equation, Several schemes (IDWM, ARD, FDTD). PRO: High levels of detail, efficient, well-suited to parallel architectures such as GPUs CON: High computational expense (but unavoidable, due to physical considerations). High f s small Δx hign n.of nodes High Comp. Load Ideal Approach wave-based method at high sample rates or hybrid models that apply the two different approaches for different frequency bands.

5 GPU-Enhanced Room Acoustic Modeling 5 GPUs: Targeted mainly for graphics. New Trend: to increase the programmability and use for non-graphics tasks General Purpose GPU (GPGPU) GPU almost 70 fold performance gain over a CPU implementation in a 2D case. The parallelization gain is linear (doubling the number of processors doubles the performance as well). Specific Algorithms have been developed to be more generally parallelizable in order to be suitable for multi-core processor architectures (wave-based methods) FDTD vs. GPUs: out of the wave-based techniques the FDTD method is the most straightforward to parallelize: the computation can be distributed to several processors operating independently from each other.

6 FDTD Compact Explicit Schemes Main Assumptions: rectangular grids and compact schemes The space is discretized and modeled as a regular grid in which only the nearest neighbors of a node (depending on the scheme) are needed in the computation of its new value. The 3D mesh equation depends on several coefficients (λ, a, b), determined by the chosen FDTD scheme. The sampling rate of the mesh is f s = c/λδx, where Δx is the grid spacing. The Digital Waveguide Mesh Methods form a subset of the FDTD schemes in which the relation of c and Δx is fixed based on the mesh topology. 6 SRL scheme: computationally efficient (only one of the d i coefficients, d 1, is non-zero such that only 6 neighbors are involved in the computation) IWB scheme: covers the widest frequency range still having least dispersion, thus suiting best for Real-Time auralization.

7 Modern GPU Structure 7 Programming a GPU: most popular API CUDA API by NVIDIA. SIMT Interface: the programmer writes a kernel for each thread; then enough threads has to be launched to accomplish the desired task. The underlying CUDA runtime runs those threads in parallel. Warps: threads are grouped into each SM such that all of them have the same execution pattern no extra performance penalty. For performance reasons threads in the same warp will access memory locations close to each other (Spatial Locality Principle). Advantage: This architecture is suitable for data-parallel problems (e.g. FDTD simulations), where it is sufficient to have a kernel that computes the actual value of the FDTD equation in one node, and then launch one thread for each node in the mesh.

8 Performance Penalty Issues Memory Bandwith Limit: between the global memory and SMs bottleneck A FDTD simulation (10 6 nodes, f s = 44.1kHz) would need a data rate of at least 8 500GB/s (4 bytes/node * 10 6 nodes/layer * 3 layers/update * updates/sec), while the current memory bus bandwidths are around GB/s. Latency: fetching data from the global memory (the on-chip memory is more complicated to use and often only a part of it is used, to store constants (constant memory). Solutions: To hide the memory latencies many threads as possible in execution at a time (some threads are executed while the others wait for their memory fetches to finish). Common advised value: thousands of threads in parallel. The use of Cache Memory provides fast access to the most often needed data items (Fermi architecture by NVIDIA provides a 2-level cache hierarchy).

9 Implementation System 9 System's Workflow: Audio Input Stream Downsampling Feeding the signal to the sound sources nodes Mesh Update Output signal from the listener nodes Upsampling Audio Output Device (mono). CPU: handle audio input and output, performs the required sampling rate conversions (integer factor) and copute the required filters. GPU: performs the FDTD simulation iteratively. 1 time step = 2 kernel launches. The first one updates the normal mesh nodes (internal, source, listeners). Launch of N threads (N is the number of nodes in the mesh); The second launch is used to update the DIFs (the number of threads equals the number of boundary filters). Computer Setup: Intel Pentium Dual CPU E2180 (2.00GHz), 2GB RAM, Nvidia Quadro FX 5800 (4GB RAM) connected to the PCIe bus (2 GB/s bandwidth). Audio playback: Windows Wave-Out API. GPU code: NVIDIA CUDA library.

10 FDTD modeling on GPU P(n) computation only two separate memory areas, instead of three (n+1, n). Node Stored Information: position, p(n), node type (source, listener, boundary). Global Memory Alllocation: mesh memory, node types, DIFs. Kernel: separate kernels for the 2 schemes (SRL fetch of only 6 neighbor values; IWB fetch of 27 values needed). 10 Sound Sources and Listeners: treated by the same kernel but with their own buffers for input and output signals. Sound Source: the new excitation value is read from the input buffer and set to the actual value of the node). Listeners are transparent (updated similarly to other mesh interior nodes, but the value of time step n is stored to the listener output buffer). Advanced technique: two listener nodes for one listener smooth movements in the scene allowed (the actual output signal is computed as a linear cross-fade of the two listener signals to avoide transients when a listener moves from one node to another).

11 Boundary Model and DIFs 11 Boundary Model: the ghost points that lie outside of the actual mesh are eliminated in the final equation, being replaced by a DIF (IIR filter) 0 th -order filter frequency independent impedance boundary condition. Higher filter orders frequency dependent boundary conditions. Implementation of frequency independent boundaries is less memory consuming than that of higher-order DIFs since there is no actual need for the actual DIF filter. Mesh Geometry: one DIF for each boundary node, with order one or higher. DIF update kernel: one kernel will handle one DIF. Computation time: increases with the order of the filter. Memory Allocation: the coefficients of the impedance filters are precomputed (constant memory). The memory needed for the actual filters is allocated from the global memory.

12 Simulation Results: Mesh Nodes Geometries: two different rooms (living room sized, and concert hall sized) 12 Real-Time Performance: the simulation runs for 512 steps. The maximum update frequency f s is searched by gradually decreasing Δx. Dispersion: the chosen schemes can not be compared just by looking at the maximum f s since they have different dispersion characteristics threshold for the maximum allowed dispersion (10%), used to get the upper limit frequency (f l ) describing the actual valid bandwidth. Frequency limits: SRL f l = 0.16fs, IWB f l = 0.37fs. Audibility of dispersion: depends heavily on the distance from the sound source. The number of sound sources does not affect the performance in practice. In this setup boundary nodes are set to a frequency-independent impedance (Effects Superposition Principle).

13 Simulation Results: Boundary Nodes 13 Geometry: For testing the performance of the DIFs a third geometry has been used (modified concert hall: the volume is the same, but the space is divided into 12 smaller rooms more boundary nodes (Δx = 28cm 500k nodes). Reference Result: simulation of 512 time steps with a mesh with the same number of nodes but all of them of normal type (no boundary nodes). Simulation Results: simulation performed iteratively for different filter orders up to the 10th. For each case, the mesh computation time is recorded. 0th order DIFs: the additional computational cost is minimal (<12%) 1st order DIFs: the computation time increases remarkably (increasing the filter order above one increases the computation time only modestly). Hall Model: the relative increase is smaller (less boundary nodes) SRL scheme: larger relative cost (filter update costs are equal in both schemes but the cost of the actual node update is smaller in the SRL scheme).

14 References 14 Lauri Savioja, Real-Time 3D Finite-Difference Time-Domain Simulation of Low-and-Mid Frequency Room Acoustics in Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, Jaakko S. Juntunen and Theodoros D. Tsiboukis, Reduction of Numerical Dispersion in FDTD Method Through Artificial Anisotropy, IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 48, NO. 4, APRIL 2000 Konrad Kowalczyk and Maarten van Walstijn, Room Acoustics Simulation Using 3-D Compact Explicit FDTD Schemes, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 1, JANUARY 2011 Craig J. Webb and Stefan Bilbao, Computing room acoustics with CUDA - 3D FDTD schemes with boundary losses and viscosity

15 Thank You 15

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA GPGPU LAB Case study: Finite-Difference Time- Domain Method on CUDA Ana Balevic IPVS 1 Finite-Difference Time-Domain Method Numerical computation of solutions to partial differential equations Explicit