A Flexible IIR Filtering Implementation for Audio Processing Juergen Schmidt, Technicolor R&I, Hannover
|
|
- Melissa Gilbert
- 6 years ago
- Views:
Transcription
1 A Flexible IIR Filtering Implementation for Audio Processing Juergen Schmidt, Technicolor R&I, Hannover
2 Motivation 3D audio 2
3 Motivation - Loudspeaker Equalization 3
4 Outline Infinite Impulse Response (IIR) Filter Definition, Decomposition IIR-Filter Architecture Derivation Recursion Problem Second Order System Application IIR-Filter Architecture for OpenCL Implementation Benchmarks, Performance Discussion 4
5 IIR-Filter Definition IIR-Filter (Infinite Impulse Response-Filter, Recursive-Filter) Equation ( ) = ( ) ( ) = Characteristics: Output y = f(input x, output y) Stability, convergence not guaranteed Double precision float for high quality audio applications Low demand on processing power 100Hz Butterworth LP-filter: 5
6 IIR-Filter Structure & Decomposition Direct IIR-filter structure Direct & recursive path Coefficient quantization stability problem with increasing filter order Decomposition into 2 nd order section filter products: Biquads partial fraction decomposition Pole/Zero analysis Complex pole pairs real coefficients Coefficient quantization noise ++ Stability Standard solution for audio processing 6
7 IIR-Filter Decomposition Example Loudspeaker equalization Averaged transfer function Apply tolerances and target TF IIR Filter calculation 40th order Stability! Biquad decomposition 20 Biquads (BQ) Stable Filtering Listening Room Example: 36 channels x 20 BQ = 720 BQ Theoretically min required ~10 FP-OPs/Biquad/sample ~350 MFLOPS CPU? GPU? 7
8 IIR-Filter Floating point precision comparison Single precision float Double precision float Samples Samples 8
9 IIR-Filter Architecture Recursion Problem How to implement IIR-filter on massiv parallel machines - GPU? First idea: Parallelization on samples & channels Problem: Acces on parallel calculated results necessary No recursion support on GPU! Parallelization on sample base inapplicable 9
10 IIR-Filter Architecture Second Order System Application 10
11 IIR-Filter Architecture Workgroup Organization 11
12 IIR-Filter Architecture Data Delay 12
13 IIR-Filter Architecture Data Delay and Block Sizes Cumulated processing delay very important for real-time applications Block size has high impact on delay Typical audio application: delay < 1 frame (20ms) Small block sizes Resources: Memory size ~ delay Arbitrary frame fizes and block sizes supported Applicable for all audio frame sizes Block-size 13
14 Implementation GPU Benchmarks: Data Block Size Variation Single GPU machine OpenCL processing times Increased processing power for small block size Large block sizes causes strong variation in processing times Optimal block size 4~8 Samples Biquads 14
15 Implementation GPU Benchmarks: Filter & Channel Variation Single midrange GPU Normalized OpenCL processing times Balanced Processing power Memory access causes no overhead Significant overhead for small block sizes Strong variation for very high number of Biquads Realtime application for all filter sizes 15
16 Implementation Comparison: CPU Benchmarks Dual Quad-Core CPUs Unchanged OpenCL code and implementation used Normalized OpenCL processing times Balanced processing power Memory access causes overhead Real-time application up to ~500 Biquads 16
17 Implementation Performance Discussion Audio signal processing requires double precision float Often not supported on low end GPUs OpenCL enables both processing on GPU and CPU Preferred processing: GPU High filter count Application profits from CPU load removal GPU execution well balanced (excellent compiler!) CPU execution a bit unbalanced due to memory access Real time requirements Delay small block size Efficient processing block size > 2 Recommended: block size of 4 or 8 samples 17
18 Summary Filter architecture to enable IIR filtering on GPUs High order IIR filters Many audio channels IIR filter problems Stability, noise, delay OpenCL architecture Dedicated structure with specialized buffers First known implementation Performance issues Application for GPU and CPU Realtime audio processing on midrange GPU 18
19 Many thanks you for your attention! 19
Exploring the features of OpenCL 2.0
Exploring the features of OpenCL 2.0 Saoni Mukherjee, Xiang Gong, Leiming Yu, Carter McCardwell, Yash Ukidave, Tuan Dao, Fanny Paravecino, David Kaeli Northeastern University Outline Introduction and evolution
More informationImplementing Biquad IIR filters with the ASN Filter Designer and the ARM CMSIS DSP software framework
Implementing Biquad IIR filters with the ASN Filter Designer and the ARM CMSIS DSP software framework Application note (ASN-AN05) November 07 (Rev 4) SYNOPSIS Infinite impulse response (IIR) filters are
More informationEvaluating MMX Technology Using DSP and Multimedia Applications
Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical
More informationFatima Michael College of Engineering & Technology
DEPARTMENT OF ECE V SEMESTER ECE QUESTION BANK EC6502 PRINCIPLES OF DIGITAL SIGNAL PROCESSING UNIT I DISCRETE FOURIER TRANSFORM PART A 1. Obtain the circular convolution of the following sequences x(n)
More informationSection M6: Filter blocks
Section M: Filter blocks These blocks appear at the top of the simulation area Table of blocks Block notation PZ-Placement PZ-Plot FIR Design IIR Design Kaiser Parks-McClellan LMS Freq Samp. Description
More informationNumerical Robustness. The implementation of adaptive filtering algorithms on a digital computer, which inevitably operates using finite word-lengths,
1. Introduction Adaptive filtering techniques are used in a wide range of applications, including echo cancellation, adaptive equalization, adaptive noise cancellation, and adaptive beamforming. These
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationD. Richard Brown III Associate Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department
D. Richard Brown III Associate Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department drb@ece.wpi.edu 3-November-2008 Analog To Digital Conversion analog signal ADC digital
More informationD. Richard Brown III Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department
D. Richard Brown III Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department drb@ece.wpi.edu Lecture 2 Some Challenges of Real-Time DSP Analog to digital conversion Are
More informationAND9105/D. Biquad Filters in ON Semiconductor Pre-configured Digital Hybrids APPLICATION NOTE. a 0 = 1. x[n] y[n] a 1. b 1. z 1. a 2.
Biquad Filters in ONSemiconductor Pre-configured Digital Hybrids Introduction Pre-configured products offered by ON Semiconductor offer great flexibility in adjusting input/output characteristics as well
More informationImage Processing Tricks in OpenGL. Simon Green NVIDIA Corporation
Image Processing Tricks in OpenGL Simon Green NVIDIA Corporation Overview Image Processing in Games Histograms Recursive filters JPEG Discrete Cosine Transform Image Processing in Games Image processing
More informationDriver Filter Design for Software-Implemented Loudspeaker Crossovers
ARCHIVES OF ACOUSTICS Vol.39,No.4, pp.59 597(204) Copyright c 204byPAN IPPT DOI: 0.2478/aoa-204-0063 Driver Filter Design for Software-Implemented Loudspeaker Crossovers Shu-Nung YAO School of Electronic,
More informationCOMPARISON OF DIFFERENT REALIZATION TECHNIQUES OF IIR FILTERS USING SYSTEM GENERATOR
COMPARISON OF DIFFERENT REALIZATION TECHNIQUES OF IIR FILTERS USING SYSTEM GENERATOR Prof. SunayanaPatil* Pratik Pramod Bari**, VivekAnandSakla***, Rohit Ashok Shah****, DharmilAshwin Shah***** *(sunayana@vcet.edu.in)
More informationDCN23 Digital Crossover with 2 inputs and 3 outputs
DCN23 Digital Crossover with 2 inputs and 3 outputs Features High performance Burr-Brown converters 24bit resolution 96kHz sampling frequency XOverWizard software Optical isolated USB interface 48 biquads
More informationCSE 141 Summer 2016 Homework 2
CSE 141 Summer 2016 Homework 2 PID: Name: 1. A matrix multiplication program can spend 10% of its execution time in reading inputs from a disk, 10% of its execution time in parsing and creating arrays
More informationGet the Second-Order Section Coefficients
MATLAB Design Functions Three sections will be required, with one section really only being first-order The MATLAB function we need to use is either zp2sos or ss2sos zp2sos converts a zero-pole form (zp)
More informationInternational Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications
IIR filter design using CSA for DSP applications Sagara.K.S 1, Ravi L.S 2 1 PG Student, Dept. of ECE, RIT, Hassan, 2 Assistant Professor Dept of ECE, RIT, Hassan Abstract- In this paper, a design methodology
More informationThe Parks McClellan algorithm: a scalable approach for designing FIR filters
1 / 33 The Parks McClellan algorithm: a scalable approach for designing FIR filters Silviu Filip under the supervision of N. Brisebarre and G. Hanrot (AriC, LIP, ENS Lyon) PEQUAN Seminar, February 26,
More informationGenerating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory
Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri Thejas Ramashekar Chandan Reddy Uday Bondhugula Department of Computer Science and Automation
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationRealtime Signal Processing on Embedded GPUs
Realtime Signal Processing on Embedded s Dr. Matthias Rosenthal Armin Weiss Dr. Amin Mazloumian Institute of Embedded Systems Realtime Platforms Research Group Zurich University of Applied Sciences Motivation
More informationEECS 452 Midterm Closed book part Fall 2010
EECS 452 Midterm Closed book part Fall 2010 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Closed book Page
More informationAutomatic Intra-Application Load Balancing for Heterogeneous Systems
Automatic Intra-Application Load Balancing for Heterogeneous Systems Michael Boyer, Shuai Che, and Kevin Skadron Department of Computer Science University of Virginia Jayanth Gummaraju and Nuwan Jayasena
More informationAdaptive QoS Control Beyond Embedded Systems
Adaptive QoS Control Beyond Embedded Systems Chenyang Lu! CSE 520S! Outline! Control-theoretic Framework! Service delay control on Web servers! On-line data migration in storage servers! ControlWare: adaptive
More informationDistributed Signal Processing for Binaural Hearing Aids
Distributed Signal Processing for Binaural Hearing Aids Olivier Roy LCAV - I&C - EPFL Joint work with Martin Vetterli July 24, 2008 Outline 1 Motivations 2 Information-theoretic Analysis 3 Example: Distributed
More informationUse cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games
Viewdle Inc. 1 Use cases Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games 2 Why OpenCL matter? OpenCL is going to bring such
More informationA Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering
A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering HPRCTA 2010 Stefan Craciun Dr. Alan D. George Dr. Herman Lam Dr. Jose C. Principe November 14, 2010 NSF CHREC Center ECE Department,
More informationOpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances
OpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances Stefano Cagnoni 1, Alessandro Bacchini 1,2, Luca Mussi 1 1 Dept. of Information Engineering, University of Parma,
More informationMake technology more simple, Make life more intelligent. Embedded Computer EC-A3288C. Specifications V1.0
Embedded Computer EC-A3288C Specifications V1.0 Version Date Updated content V1.0 2018-10-17 Original version - 1 - Directory 1. Product Overview... 4 1.1 Overview... 4 2. Interface description... 5 3.
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationExtending SLURM with Support for GPU Ranges
Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Extending SLURM with Support for GPU Ranges Seren Soner a, Can Özturana,, Itir Karac a a Computer Engineering Department,
More informationDigital Signal Processing with Field Programmable Gate Arrays
Uwe Meyer-Baese Digital Signal Processing with Field Programmable Gate Arrays Third Edition With 359 Figures and 98 Tables Book with CD-ROM ei Springer Contents Preface Preface to Second Edition Preface
More informationREAL-TIME DIGITAL SIGNAL PROCESSING
REAL-TIME DIGITAL SIGNAL PROCESSING FUNDAMENTALS, IMPLEMENTATIONS AND APPLICATIONS Third Edition Sen M. Kuo Northern Illinois University, USA Bob H. Lee Ittiam Systems, Inc., USA Wenshun Tian Sonus Networks,
More informationFilter Bank Design and Sub-Band Coding
Filter Bank Design and Sub-Band Coding Arash Komaee, Afshin Sepehri Department of Electrical and Computer Engineering University of Maryland Email: {akomaee, afshin}@eng.umd.edu. Introduction In this project,
More informationSamsung Electronics Co. Ltd. Sung H. Lee
Present and Future Direction of Mobile Benchmarks Samsung Electronics Co. Ltd. Sung H. Lee Mobile Forum 2014 Copyright 2014 Sung H. Lee, Samsung Benchmark, is it Beauty Contest? Benchmark, is it Beauty
More informationGDFLIB User's Guide. ARM Cortex M0+
GDFLIB User's Guide ARM Cortex M0+ Document Number: CM0GDFLIBUG Rev. 4, 11/2016 2 NXP Semiconductors Contents Section number Title Page Chapter 1 Library 1.1 Introduction... 5 1.2 Library integration into
More informationProject Update EEG-BMI, embedded system
Project Update EEG-BMI, embedded system Jan. 26 th, 2014 Plan of the presentation Motivation Definitions EEG-BMI Neural Drift System Overview OpenWRT operating system Intricities, problems and solutions
More informationA PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler Swiss National Supercomputing
More informationECE4703 B Term Laboratory Assignment 2 Floating Point Filters Using the TMS320C6713 DSK Project Code and Report Due at 3 pm 9-Nov-2017
ECE4703 B Term 2017 -- Laboratory Assignment 2 Floating Point Filters Using the TMS320C6713 DSK Project Code and Report Due at 3 pm 9-Nov-2017 The goals of this laboratory assignment are: to familiarize
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationDigital Signal Processing Laboratory 7: IIR Notch Filters Using the TMS320C6711
Digital Signal Processing Laboratory 7: IIR Notch Filters Using the TMS320C6711 PreLab due Wednesday, 3 November 2010 Objective: To implement a simple filter using a digital signal processing microprocessor
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationHead, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India
Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the
More informationMETHODS TO OPTIMALLY TRADE BANDWIDTH AGAINST BUFFER SIZE FOR A VBR STREAM. Stefan Hofbauer
METHODS TO OPTIMALLY TRADE BANDWIDTH AGAINST BUFFER SIZE FOR A VBR STREAM Stefan Hofbauer Overview Motivation Problem description Algorithms Buffer-increasing trade-off algorithm Rate-increasing trade-off
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationReal-time Signal Processing on the Ultrasparc
Technical Memorandum M97/4, Electronics Research Labs, 1/17/97 February 21, 1997 U N T H E I V E R S I T Y A O F LET TH E R E B E 1 8 6 8 LI G H T C A L I A I F O R N Real-time Signal Processing on the
More informationRUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS
RUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS Yash Ukidave, Perhaad Mistry, Charu Kalra, Dana Schaa and David Kaeli Department of Electrical and Computer Engineering
More informationLESSON PLAN. Sub Name: Discrete Time Systems and Signal Processing. Unit: I Branch: BE (EE) Semester: IV
Page 1 of 6 Unit: I Branch: BE (EE) Semester: IV Unit Syllabus: I. INTRODUCTION Classification of systems: Continuous, discrete, linear, causal, stable, dynamic, recursive, time variance; classification
More informationAll About Erasure Codes: - Reed-Solomon Coding - LDPC Coding. James S. Plank. ICL - August 20, 2004
All About Erasure Codes: - Reed-Solomon Coding - LDPC Coding James S. Plank Logistical Computing and Internetworking Laboratory Department of Computer Science University of Tennessee ICL - August 2, 24
More informationNUMERICAL ANALYSIS USING SCILAB: NUMERICAL STABILITY AND CONDITIONING
powered by NUMERICAL ANALYSIS USING SCILAB: NUMERICAL STABILITY AND CONDITIONING In this Scilab tutorial we provide a collection of implemented examples on numerical stability and conditioning. Level This
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationExercises in DSP Design 2016 & Exam from Exam from
Exercises in SP esign 2016 & Exam from 2005-12-12 Exam from 2004-12-13 ept. of Electrical and Information Technology Some helpful equations Retiming: Folding: ω r (e) = ω(e)+r(v) r(u) F (U V) = Nw(e) P
More informationOverflow Avoidance Techniques in Cascaded IIR Filter Implementations on the TMS320 DSP s
SPRA59 Overflow Avoidance Techniques in Cascaded IIR Filter Implementations on the TMS3 DSP s Aaron Kofi Aboagye C5 DSP Software Applications Abstract DSP programmers are faced with the problem of dealing
More informationGPU ACCELERATED DATABASE MANAGEMENT SYSTEMS
CIS 601 - Graduate Seminar Presentation 1 GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS PRESENTED BY HARINATH AMASA CSU ID: 2697292 What we will talk about.. Current problems GPU What are GPU Databases GPU
More information"On the Capability and Achievable Performance of FPGAs for HPC Applications"
"On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies
More informationBasics of Performance Engineering
ERLANGEN REGIONAL COMPUTING CENTER Basics of Performance Engineering J. Treibig HiPerCH 3, 23./24.03.2015 Why hardware should not be exposed Such an approach is not portable Hardware issues frequently
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationME scope Application Note 19
ME scope Application Note 19 Using the Stability Diagram to Estimate Modal Frequency & Damping The steps in this Application Note can be duplicated using any Package that includes the VES-4500 Advanced
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More informationDenormal numbers in floating point signal processing applications
Denormal numbers in floating point signal processing applications Laurent de Soras 2002.01.11 ABSTRACT Nowadays many DSP applications are running on personal computers using general
More informationNew STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU
New STM32 F7 Series World s 1 st to market, ARM Cortex -M7 based 32-bit MCU 7 Keys of STM32 F7 series 2 1 2 3 4 5 6 7 First. ST is first to sample a fully functional Cortex-M7 based 32-bit MCU : STM32
More informationBetter sharc data such as vliw format, number of kind of functional units
Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com
More informationJPEG decoding using end of block markers to concurrently partition channels on a GPU. Patrick Chieppe (u ) Supervisor: Dr.
JPEG decoding using end of block markers to concurrently partition channels on a GPU Patrick Chieppe (u5333226) Supervisor: Dr. Eric McCreath JPEG Lossy compression Widespread image format Introduction
More informationRISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER
RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER Miss. Sushma kumari IES COLLEGE OF ENGINEERING, BHOPAL MADHYA PRADESH Mr. Ashish Raghuwanshi(Assist. Prof.) IES COLLEGE OF ENGINEERING, BHOPAL
More informationPersistent Background Effects for Realtime Applications Greg Lane, April 2009
Persistent Background Effects for Realtime Applications Greg Lane, April 2009 0. Abstract This paper outlines the general use of the GPU interface defined in shader languages, for the design of large-scale
More informationSATGPU - A Step Change in Model Runtimes
SATGPU - A Step Change in Model Runtimes User Group Meeting Thursday 16 th November 2017 Ian Wright, Atkins Peter Heywood, University of Sheffield 20 November 2017 1 SATGPU: Phased Development Phase 1
More informationEEMBC FPMARK THE EMBEDDED INDUSTRY S FIRST STANDARDIZED FLOATING-POINT BENCHMARK SUITE
EEMBC FPMARK THE EMBEDDED INDUSTRY S FIRST STANDARDIZED FLOATING-POINT BENCHMARK SUITE Supporting Both Single- and Double-Precision Floating-Point Performance Quick Background: Industry-Standard Benchmarks
More informationAnalytical Approach for Numerical Accuracy Estimation of Fixed-Point Systems Based on Smooth Operations
2326 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL 59, NO 10, OCTOBER 2012 Analytical Approach for Numerical Accuracy Estimation of Fixed-Point Systems Based on Smooth Operations Romuald
More informationDenormal numbers in floating point signal processing applications
Denormal numbers in floating point signal processing applications Laurent de Soras 2005.04.19 web: http://ldesoras.free.fr ABSTRACT Nowadays many DSP applications are running on personal computers using
More informationTowards Interactive Global Illumination Effects via Sequential Monte Carlo Adaptation. Carson Brownlee Peter S. Shirley Steven G.
Towards Interactive Global Illumination Effects via Sequential Monte Carlo Adaptation Vincent Pegoraro Carson Brownlee Peter S. Shirley Steven G. Parker Outline Motivation & Applications Monte Carlo Integration
More informationGPU Programming with Ateji PX June 8 th Ateji All rights reserved.
GPU Programming with Ateji PX June 8 th 2010 Ateji All rights reserved. Goals Write once, run everywhere, even on a GPU Target heterogeneous architectures from Java GPU accelerators OpenCL standard Get
More informationMultithreaded Value Prediction
Multithreaded Value Prediction N. Tuck and D.M. Tullesn HPCA-11 2005 CMPE 382/510 Review Presentation Peter Giese 30 November 2005 Outline Motivation Multithreaded & Value Prediction Architectures Single
More informationREDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS
BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the
More informationOutline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work
Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationCache Justification for Digital Signal Processors
Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose
More informationAltera SDK for OpenCL
Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group
More informationProcessor Design. Introduction, part I
Processor Design Introduction, part I Professor Jari Nurmi Institute of Digital and Computer Systems Tampere University of Technology, Finland email jari.nurmi@tut.fi Background Some trends in digital
More informationAdaptive Run-time Resource Management on Heterogeneous Devices
Adaptive Run-time Resource Management on Heterogeneous Devices Roel Wuyts imec, Leuven, Belgium ArtistDesign Meeting, July 6 2010, Leuven, Belgium Agenda! Background and motivations! Network level resource
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationImplementation of Deep Convolutional Neural Net on a Digital Signal Processor
Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm
More informationOutline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency
1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming
More informationDesigning for Performance. Patrick Happ Raul Feitosa
Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance
More informationLINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P.
1 2 The LINPACK Benchmark on the Fujitsu AP 1000 Richard P. Brent Computer Sciences Laboratory The LINPACK Benchmark A popular benchmark for floating-point performance. Involves the solution of a nonsingular
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationAdvanced Design System 1.5. Digital Filter Designer
Advanced Design System 1.5 Digital Filter Designer December 2000 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind
More informationEvaluating the Potential of Graphics Processors for High Performance Embedded Computing
Evaluating the Potential of Graphics Processors for High Performance Embedded Computing Shuai Mu, Chenxi Wang, Ming Liu, Yangdong Deng Department of Micro-/Nano-electronics Tsinghua University Outline
More informationScheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok
Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Texas Learning and Computation Center Department of Computer Science University of Houston Outline Motivation
More informationSES-SBA-150W USER MANUAL
SES-SBA-150W USER MANUAL www.sescom.com Contents Contents 1 Introduction 1 2 Getting Started 1 3 Overview 1 4 Your SES-SBA-150W 2 5 Connecting the SES-SBA-150W 3 5.1 Connecting the Stereo Audio Input 3
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationLinear Equation Systems Iterative Methods
Linear Equation Systems Iterative Methods Content Iterative Methods Jacobi Iterative Method Gauss Seidel Iterative Method Iterative Methods Iterative methods are those that produce a sequence of successive
More informationMatrix-free IPM with GPU acceleration
Matrix-free IPM with GPU acceleration Julian Hall, Edmund Smith and Jacek Gondzio School of Mathematics University of Edinburgh jajhall@ed.ac.uk 29th June 2011 Linear programming theory Primal-dual pair
More informationCloth Simulation on the GPU. Cyril Zeller NVIDIA Corporation
Cloth Simulation on the GPU Cyril Zeller NVIDIA Corporation Overview A method to simulate cloth on any GPU supporting Shader Model 3 (Quadro FX 4500, 4400, 3400, 1400, 540, GeForce 6 and above) Takes advantage
More informationTowards Breast Anatomy Simulation Using GPUs
Towards Breast Anatomy Simulation Using GPUs Joseph H. Chui 1, David D. Pokrajac 2, Andrew D.A. Maidment 3, and Predrag R. Bakic 4 1 Department of Radiology, University of Pennsylvania, Philadelphia PA
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More information19. Implementing High-Performance DSP Functions in Stratix & Stratix GX Devices
19. Implementing High-Performance SP Functions in Stratix & Stratix GX evices S52007-1.1 Introduction igital signal processing (SP) is a rapidly advancing field. With products increasing in complexity,
More informationImplementing CUDA Audio Networks. Giancarlo Del Sordo Acustica Audio
Implementing CUDA Audio Networks Giancarlo Del Sordo Acustica Audio giancarlo@acusticaudio.com Motivation Vintage gear processing in software domain High audio quality results Low cost and user-driven
More informationdesigning a GPU Computing Solution
designing a GPU Computing Solution Patrick Van Reeth EMEA HPC Competency Center - GPU Computing Solutions Saturday, May the 29th, 2010 1 2010 Hewlett-Packard Development Company, L.P. The information contained
More informationRTW SUPPORT FOR PARALLEL 64bit ALPHA AXP-BASED PLATFORMS. Christian Vialatte, Jiri Kadlec,
RTW SUPPORT FOR PARALLEL 64bit ALPHA AXP-BASED PLATFORMS Christian Vialatte, Jiri Kadlec, Introduction Presentation of software supporting the Real-Time Workshop (Matlab 5.3), targeting AD66 ISA and AD66-PCI
More information