GPU Acceleration of SAR/ISAR Imaging Algorithms
|
|
- Pierce Jennings
- 6 years ago
- Views:
Transcription
1 GPU Acceleration of SAR/ISAR Imaging Algorithms Gary Rubin Earl V. Sager, Ph.D. David H. Berger, Ph.D. ABSTRACT General Purpose Graphical Processor Units (GPGPUs) provide increased processing capability for applications with a high degree of data parallelism. In the past the few years, GPGPUs have become readily available in the commercial market, and off-the-shelf programming tools (e.g. CUDA from the NVIDIA Corporation and Jacket from Accelereyes, LLC) have made them more accessible to the technical community. SAR and ISAR imaging algorithms are inherently computationally intensive. In order to overcome performance limitations of CPUs and traditional DSPs, simplified, computationally-efficient algorithms are often used, but at the expense of the phase information available within the raw data. We have demonstrated that GPGPU acceleration of SAR/ISAR processing has greatly improved processing times of a less-efficient (but more flexible) algorithm, making its use more practical. We have shown that GPGPUs can provide performance improvement in excess of 30X for a backprojection-based SAR/ISAR imaging technique. Keywords: Algorithms, Computations, Data Processing, Imaging, Inverse SAR, Radar, Signal Processing, Synthetic Aperture Radar 1.0 Introduction For decades, Synthetic Aperture Radar (SAR) and Inverse SAR (ISAR) imaging techniques have been used to represent radar data in a way that is meaningful to a human analyst. The calculations necessary for these imaging techniques are computationally complex, and since the advent of SAR and ISAR imaging, computational throughput has always been a critical factor in the development, implementation, and use of processing algorithms. With the steady advance of computational power provided by CPUs, vector processors, and DSPs, imaging routines have become faster and faster over the years. Chip makers, however, have reached a clock-speed plateau, as processors with increasingly high transistor densities are no longer able to dissipate the heat associated with increasingly high clock speeds. This heat dissipation challenge has led to the recent trend away from ever-increasing clock speeds and instead has resulted in an explosion of multi-core, commercial-grade processors. Multicore processors offer the potential for extremely high throughput, with the ability to achieve such results depending greatly on the nature of the computations being performed. This paper describes recent efforts to use massively-parallel commercial GPUs to accelerate a Backprojection Algorithm (BPA)-based imaging routine. The paper will begin with a discussion of the imaging algorithms, followed by a description of the acceleration process and results. 2.0 Imaging Algorithms Over the years, radar scientists and engineers have developed a wide variety of imaging algorithms and implementations. These include Range Doppler (RDA), Polar Format (PFA), Chirp Scaling (CSA), Range Migration (RMA), and Backprojection or Time-Domain Correlation (BPA or TDC). Each algorithm has its own strengths and weaknesses, and choosing an optimal imaging algorithm typically depends on radar parameters and mission requirements [1,2].
2 y pixels For example, RDA provides a good balance between accuracy and efficiency, but at the expense of bandwidth and aperture length [2]. CSA is computationally efficient, but can limit scene size and image resolution. It also operates only on radar data that has not been de-chirped [1]. PFA can operate on de-chirped data, but may introduce geometric distortion [1]. This paper will primarily focus on an implementation of BPA, as described by [3]. BPA has the advantage of being able to image to an arbitrary surface, can provide phase and amplitude history for an image pixel, and is easily applicable to both SAR and ISAR imaging. BPA can be quite slow, but does exhibit a high degree of data parallelism, defined as simultaneous operations across large sets of data, rather than from multiple threads of control [4]. 2.1 Backprojection Algorithm Implementation We have tested our imaging routines using an ISAR data set collected by an SPC MkV instrumentation radar in The MkV used a 256-step chirp from 8-12 GHz to measure a Saab 9000 hatchback as the Saab was rotated on a turntable. A 360-degree image of the SAAB 9000 is shown in Figure image grid can be either planar or non-planar (Figure 3). 3. For each burst, calculate slant range between each pixel and the radar location. The white traces in Figure 2 represent two examples of pixel slant-range vs. burst number. 4. For each radar burst, use the pixel ranges calculated in step 4 to assign radar range cells to pixels. 5. For each pixel, coherently sum each pulse s signal contribution. Cross-range motion is provided either by the radar motion (SAR) or the target motion (ISAR). In our implementation, the image grid is held fixed, while the radar position is imagined to rotate around the center of the image grid Figure 2. High-range-resolution (HRR) vs. burst number for car ISAR data. This figure represents zero-padded step-chirp data for 180 degrees of rotation. Solid ( x in Figure 3) and dashed traces ( + in Figure 3) represent slant-range profiles for the pixels identified in Figure x pixels Figure 1. ISAR image of SAAB 9000 hatchback. Data represents 256-step 8-12 GHz chirp and 360 degrees of rotation. Pixel resolution is 1cm. The image was created using our BPA implementation. Our implementation of BPA is similar to that described by [3] and comprises the following steps: 1. Perform a downrange DFT on the radar data to obtain a range (fast time) v. position (slow time) data array, phase corrected to a reference range (Figure 2). 2. Form an image grid that defines the spatial location of each pixel relative to the radar. This Figure 3. Image Grid. Two arbitrary pixels are highlighted by the red + and magenta x. The black dot represents the ISAR center of rotation.
3 Target rotation causes the red and magenta pixels to trace the lines shown in Figure 2. In our implementation, Step 1 is a data-parallel operation across the multiple radar bursts, while Step 5 is a dataparallel operation performed inside the Step 4 for loop. 3.0 Code Acceleration There are two processes associated with GPU acceleration. First, the code must be written in a way that operations are highly data-parallel. For The Mathworks MATLAB, this requires that the code be vectorized (see Section 3.1). Second, once the algorithm has been implemented in a data-parallel manner, it can be targeted to the GPU, as described in Section Code Vectorization The BPA algorithm described above has been implemented in MATLAB R2010a. Initially, the code was translated from FORTRAN to MATLAB and relied heavily on nested for loops. The code was then largely rewritten using the MATLAB art of vectorization. In MATLAB, vectorization refers to taking advantage of polymorphism, a compiler feature that allows the same line of code to apply to scalars, vectors, or matrices. MATLAB can perform these vectorized calculations much more efficiently than loops and automatically multithreads some operations [5]. Depending on the nature of the calculations, vectorization may involve a trade-off between CPU efficiency and memory usage. Memory limitations may therefore prevent some vectorization. A simple example of vectorization is as follows. Define two random data vectors A=rand(10000,1); B=rand(10000,1); Using a 2.67 GHz Intel Core i7-920, the vectorized implementation executed in roughly 40% of the time of the looped expression. Another very useful vectorization function in MATLAB is bsxfun, which allows for efficient matrix-vector arithmetic. Consider the following example: Generate a random 2000x1000-element matrix. A=rand(2000,1000); Preallocate an output vector. For each of the 2000 rows, calculate and subtract the mean row value from each element in the row. Some vectorized calculations are used here as well, as CurrentRow is a vector and mean(currentrow) is a scalar. NuA1=zeros(size(A)); for indx=1:size(a,2) CurrentRow=A(:,indx); NuA1(:,indx)=CurrentRowmean(CurrentRow); end Perform the same operation using the matrix implementation of mean and bsxfun. Here, MeanA is a vector, while A is a matrix. MeanA=mean(A,1); NuA2=bsxfun(@minus,A,MeanA); In this case, the bsxfun implementation runs approximately 4x faster than the loop iteration on 2.67 GHz Intel Core i GPU Implementation Over the past several years, graphics processing unit (GPU) technology has experienced dramatic growth in terms of computational performance (Figure 4). Preallocate an output vector C, then perform the operation using a loop. C=zeros(10000,1); for indx=1:length(a) C(indx)=A(indx)*(B(indx)^2); end Perform the same calculation as a vector operation. The.* and.^ operators refer to element-by-element vector operations. C=A.*(B.^2); Figure 4. Growth in NVIDIA GPU performance vs. CPU performance. Solid lines are single-precision
4 Runtime (s) GFlops/sec; dashed lines are double-precision GFlops/sec [6]. AMD/ATI and NVIDIA are leaders in the GPU market, and both support non-graphics general-purpose GPU (GPGPU) applications. We have chosen to use NVIDIA GPUs due primarily to the maturity and community support of their CUDA development environment. To reduce schedule risk for the GPU acceleration effort described in this paper, we decided to avoid writing our own CUDA code. Instead, improvements to runtimes were achieved through the use of Accelereyes, LLC s Jacket software platform. Jacket serves as nearlytransparent middleware, allowing execution of MATLAB code on CUDA-capable NVIDIA GPUs directly from the MATLAB development environment. Jacket achieves this by overloading most base MATLAB functions. When these functions are called using special Jacket data classes, Jacket builds an internal representation of the program being run, compiles that representation if necessary, performs the computation on the GPU, and makes the results available to MATLAB if requested (leaving data GPU-resident as long as possible). Because Accelereyes has written Jacket to work with existing MATLAB syntax, parallelizing operations for GPU use is essentially identical to the MATLAB vectorization described in Section Acceleration Results Benchmarking was performed using the system described in Table 1. Table 1. Benchmark CPU CPU Intel Core 2.67 GHz Motherboard EVGA X58 SLI Memory 12 GB DDR OS (dual boot) -Windows 7 Professional 64-bit -CentOS 5.5 GPU 1 NVIDIA C1060 w/ 4 GB GDDR3 (~$1300) GPU 2 NVIDIA GeForce 9800 GT w/ 512 MB MATLAB R2010a Version Jacket Version 1.3 The Core i7-920 CPU provides four processing cores, each with two processing threads. Of these eight available processing threads, two are typically used during CPU benchmarking. The CPU resources could be applied more efficiently by using MATLAB s Parallel Computing Toolbox to spread the burst for-loop iterations across the multiple threads. Similarly, the Parallel Computing Toolbox can be used in conjunction with Jacket to spread the processing among multiple GPUs. For the imaging performance benchmarks, the dataset described in Table 2 and shown in Error! Reference source not found.figure 1 was used. Table 2. Benchmark Dataset Collection System SPC MkV radar Collection Mode ISAR Waveform Type Step-chirp Start Frequency (GHz) 8 Stop Frequency (GHz) 12 Chirp Bandwidth (GHz) 4 Frequency Steps 256 Angle Start (rad) Angle Stop (rad) Angle Step (rad) Range (m) 100 Subject Saab 9000 hatchback It is understood that the pixel resolutions used for benchmarking are much higher than the resolution supported by the actual data. While these resolutions may be artificially high for this particular dataset, they were used to demonstrate computational performance for large image sizes of the type that might be used for airborne or spaceborne SAR. Because the ISAR BPA implementation is virtually identical to the SAR BPA implementation, we believe that it is valid to use an ISAR dataset to demonstrate image sizes that are more typical of SAR. Execution times for the Core i7-920 CPU-only BPA ISAR imaging algorithm are shown in Figure 5. The C1060 GPU-enabled runtimes are shown in Figure x Figure 5. Runtimes for BPA ISAR imaging of Saab 9000 using Core i7-920 CPU under Win7 Pro 64-bit. Burst counts correspond to 10, 60, 120, 180, and 240- degree sectors.
5 Speedup (CPU Time / GPU Time) Runtime (s) Figure 6. Runtimes for BPA ISAR imaging of Saab 9000 using NVIDIA C1060 GPGPU under Win7 Pro 64-bit. Burst counts correspond to 10, 60, 120, 180, and 240-degree sectors. Speedup is defined as. Figure 7 shows speedup for the runtimes shown in Figure 5 and Figure 6. We believe that the sharp decrease in speedup after 15 megapixels is due to a memory efficiency threshold associated with the larger data arrays Figure 7. GPU Speedups; C1060 vs. Core i7-920 under Win7 Pro 64-bit 4.0 Related Work In addition to the imaging acceleration described in this paper, SPC has also demonstrated GPU acceleration for surface-surveillance radar clutter reduction. For that processing, we were able to demonstrate speedups of roughly 10x vs. the Core i7-920 and roughly 5x vs. a realtime DSP implementation. SPC has also begun the process of performing GPU acceleration of PFA-based imaging routines. This process was still in progress at the time of publication of this paper. 5.0 Summary We have demonstrated that highly-parallel, computationally-complex tasks, such as those associated with BPA SAR/ISAR imaging, can be greatly accelerated through the use of GPUs. We have demonstrated improvements in BPA runtime in excess of 30x, meaning that the GPU allows processing that might take an entire workweek on a standard desktop PC to be completed in a little over an hour. Such runtime improvements increase the practicality of BPA as a large-scale imaging routine. The speedups presented in this paper should not be seen as an upper limit. It is very likely that additional speed improvements could be realized by further optimization of the BPA code. It is also anticipated that GPU performance will be further improved as CUDA and Jacket evolve and are enhanced. 6.0 References [1] Carrera, W.G., Goodman, R.S., and Majewski, R.M., Spotlight Synthetic Aperture: Radar Signal Processing Algorithms, Norwood, MA: Artech House, 1995 [2] Cumming, I.G., and Wong, F.H., Digital Processing of Synthetic Aperture Radar Data, Norwood, MA: Artech House, 2005 [3] Soumekh, M., Synthetic Aperture Radar Signal Processing with MATLAB Algorithms, New York: John Wiley & Sons, 1999 [4] Hillis, W.D., and Steele, G. L., Data Parallel Algorithms, Communications of the ACM 29, 12 (Dec. 1986), pp [5] Which MATLAB functions benefit from multithreaded computation?, MATLAB Technical Solution, [6] Source: NVIDIA via personal correspondence 7.0 Acknowledgments Thanks to Gallagher Pryor, Dave Gibson, and others at Accelereyes, LLC for their inputs and technical advice.
high performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationUsing CUDA to Accelerate Radar Image Processing
Using CUDA to Accelerate Radar Image Processing Aaron Rogan Richard Carande 9/23/2010 Approved for Public Release by the Air Force on 14 Sep 2010, Document Number 88 ABW-10-5006 Company Overview Neva Ridge
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationA real time SAR processor implementation with FPGA
Computational Methods and Experimental Measurements XV 435 A real time SAR processor implementation with FPGA C. Lesnik, A. Kawalec & P. Serafin Institute of Radioelectronics, Military University of Technology,
More informationChapter 04. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Chapter 04 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 4.1 Potential speedup via parallelism from MIMD, SIMD, and both MIMD and SIMD over time for
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationFFT-Based Astronomical Image Registration and Stacking using GPU
M. Aurand 4.21.2010 EE552 FFT-Based Astronomical Image Registration and Stacking using GPU The productive imaging of faint astronomical targets mandates vanishingly low noise due to the small amount of
More informationOptimizing and Accelerating Your MATLAB Code
Optimizing and Accelerating Your MATLAB Code Sofia Mosesson Senior Application Engineer 2016 The MathWorks, Inc. 1 Agenda Optimizing for loops and using vector and matrix operations Indexing in different
More informationMit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen
Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Frank Graeber Application Engineering MathWorks Germany 2013 The MathWorks, Inc. 1 Speed up the serial code within core
More informationImplementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU
Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationAdaptive Doppler centroid estimation algorithm of airborne SAR
Adaptive Doppler centroid estimation algorithm of airborne SAR Jian Yang 1,2a), Chang Liu 1, and Yanfei Wang 1 1 Institute of Electronics, Chinese Academy of Sciences 19 North Sihuan Road, Haidian, Beijing
More informationImproving Segmented Interferometric Synthetic Aperture Radar Processing Using Presumming. by: K. Clint Slatton. Final Report.
Improving Segmented Interferometric Synthetic Aperture Radar Processing Using Presumming by: K. Clint Slatton Final Report Submitted to Professor Brian Evans EE381K Multidimensional Digital Signal Processing
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationATS-GPU Real Time Signal Processing Software
Transfer A/D data to at high speed Up to 4 GB/s transfer rate for PCIe Gen 3 digitizer boards Supports CUDA compute capability 2.0+ Designed to work with AlazarTech PCI Express waveform digitizers Optional
More informationGPU Programming Using NVIDIA CUDA
GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics
More informationMultistatic SAR Algorithm with Image Combination
Multistatic SAR Algorithm with Image Combination Tommy Teer and Nathan A. Goodman Department of Electrical and Computer Engineering, The University of Arizona 13 E. Speedway Blvd., Tucson, AZ 8571-14 Phone:
More informationMemorandum. Clint Slatton Prof. Brian Evans Term project idea for Multidimensional Signal Processing (EE381k)
Memorandum From: To: Subject: Date : Clint Slatton Prof. Brian Evans Term project idea for Multidimensional Signal Processing (EE381k) 16-Sep-98 Project title: Minimizing segmentation discontinuities in
More informationThe Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration
The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration Dell IP Video Platform Design and Calibration Lab June 2018 H17415 Reference Architecture Dell EMC Solutions Copyright
More informationMap3D V58 - Multi-Processor Version
Map3D V58 - Multi-Processor Version Announcing the multi-processor version of Map3D. How fast would you like to go? 2x, 4x, 6x? - it's now up to you. In order to achieve these performance gains it is necessary
More informationCSE 599 I Accelerated Computing - Programming GPUS. Memory performance
CSE 599 I Accelerated Computing - Programming GPUS Memory performance GPU Teaching Kit Accelerated Computing Module 6.1 Memory Access Performance DRAM Bandwidth Objective To learn that memory bandwidth
More informationAccelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors
Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte
More informationDigital Processing of Synthetic Aperture Radar Data
Digital Processing of Synthetic Aperture Radar Data Algorithms and Implementation Ian G. Cumming Frank H. Wong ARTECH HOUSE BOSTON LONDON artechhouse.com Contents Foreword Preface Acknowledgments xix xxiii
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationIntroduction to Matlab GPU Acceleration for. Computational Finance. Chuan- Hsiang Han 1. Section 1: Introduction
Introduction to Matlab GPU Acceleration for Computational Finance Chuan- Hsiang Han 1 Abstract: This note aims to introduce the concept of GPU computing in Matlab and demonstrates several numerical examples
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationSpeeding up MATLAB Applications Sean de Wolski Application Engineer
Speeding up MATLAB Applications Sean de Wolski Application Engineer 2014 The MathWorks, Inc. 1 Non-rigid Displacement Vector Fields 2 Agenda Leveraging the power of vector and matrix operations Addressing
More informationGeoImaging Accelerator Pansharpen Test Results. Executive Summary
Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance Whitepaper), the same approach has
More informationThe HPEC Challenge Benchmark Suite
The HPEC Challenge Benchmark Suite Ryan Haney, Theresa Meuse, Jeremy Kepner and James Lebak Massachusetts Institute of Technology Lincoln Laboratory HPEC 2005 This work is sponsored by the Defense Advanced
More informationSlide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, DRAM Bandwidth
Slide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2016 DRAM Bandwidth MEMORY ACCESS PERFORMANCE Objective To learn that memory bandwidth is a first-order performance factor in
More informationLUNAR TEMPERATURE CALCULATIONS ON A GPU
LUNAR TEMPERATURE CALCULATIONS ON A GPU Kyle M. Berney Department of Information & Computer Sciences Department of Mathematics University of Hawai i at Mānoa Honolulu, HI 96822 ABSTRACT Lunar surface temperature
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationOptimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing
Optimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing Jeffrey T. Muehring and John K. Antonio Deptartment of Computer Science, P.O. Box 43104, Texas Tech University, Lubbock, TX
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationCUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging
CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction
More informationParallel and Distributed Computing with MATLAB The MathWorks, Inc. 1
Parallel and Distributed Computing with MATLAB 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster insight on more complex problems with larger datasets
More informationA Challenge Problem for 2D/3D Imaging of Targets from a Volumetric Data Set in an Urban Environment
A Challenge Problem for 2D/3D Imaging of Targets from a Volumetric Data Set in an Urban Environment Curtis H. Casteel, Jr,*, LeRoy A. Gorham, Michael J. Minardi, Steven M. Scarborough, Kiranmai D. Naidu,
More informationWhite Paper Assessing FPGA DSP Benchmarks at 40 nm
White Paper Assessing FPGA DSP Benchmarks at 40 nm Introduction Benchmarking the performance of algorithms, devices, and programming methodologies is a well-worn topic among developers and research of
More informationLecture 1: Gentle Introduction to GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationGPGPU. Peter Laurens 1st-year PhD Student, NSC
GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,
More informationParallelism in Hardware
Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law
More informationChapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.
Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE
More informationMeasurement of real time information using GPU
Measurement of real time information using GPU Pooja Sharma M. Tech Scholar, Department of Electronics and Communication E-mail: poojachaturvedi1985@gmail.com Rajni Billa M. Tech Scholar, Department of
More informationChapter 1: Fundamentals of Quantitative Design and Analysis
1 / 12 Chapter 1: Fundamentals of Quantitative Design and Analysis Be careful in this chapter. It contains a tremendous amount of information and data about the changes in computer architecture since the
More informationOptimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,
More informationB. Tech. Project Second Stage Report on
B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic
More informationIntroduction to Multicore Programming
Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming
More informationCalculation of Weight Vectors for Wideband Beamforming Using Graphics Processing Units
Calculation of Weight Vectors for Wideband Beamforming Using Graphics Processing Units Jason D. Bonior, Zhen Hu and Robert C. Qiu Department of Electrical and Computer Engineering Tennessee Technological
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University
More informationLecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1
Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei
More informationPerformance of computer systems
Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type
More informationGedae cwcembedded.com. The CHAMP-AV6 VPX-REDI. Digital Signal Processing Card. Maximizing Performance with Minimal Porting Effort
Technology White Paper The CHAMP-AV6 VPX-REDI Digital Signal Processing Card Maximizing Performance with Minimal Porting Effort Introduction The Curtiss-Wright Controls Embedded Computing CHAMP-AV6 is
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
More informationSummer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics
Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one
More informationG P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G
Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty
More informationUsing GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation
Using GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation GPU Technology Conference 2012 May 15, 2012 Thomas M. Benson, Daniel P. Campbell, Daniel A. Cook thomas.benson@gtri.gatech.edu
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationGPU-accelerated 3-D point cloud generation from stereo images
GPU-accelerated 3-D point cloud generation from stereo images Dr. Bingcai Zhang Release of this guide is approved as of 02/28/2014. This document gives only a general description of the product(s) or service(s)
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationSynthetic Aperture Radar Modeling using MATLAB and Simulink
Synthetic Aperture Radar Modeling using MATLAB and Simulink Naivedya Mishra Team Lead Uurmi Systems Pvt. Ltd. Hyderabad Agenda What is Synthetic Aperture Radar? SAR Imaging Process Challenges in Design
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationA MATLAB Interface to the GPU
Introduction Results, conclusions and further work References Department of Informatics Faculty of Mathematics and Natural Sciences University of Oslo June 2007 Introduction Results, conclusions and further
More informationEMBEDDED VISION AND 3D SENSORS: WHAT IT MEANS TO BE SMART
EMBEDDED VISION AND 3D SENSORS: WHAT IT MEANS TO BE SMART INTRODUCTION Adding embedded processing to simple sensors can make them smart but that is just the beginning of the story. Fixed Sensor Design
More informationN-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo
N-Body Simulation using CUDA CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo Project plan Develop a program to simulate gravitational
More informationProfiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015
Profiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015 Oct 1: Introduction to OpenACC Oct 6: Office Hours Oct 15: Profiling and Parallelizing with the OpenACC Toolkit
More informationParallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer
Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationExecutable Requirements: Opportunities and Impediments
Executable Requirements: Oppotunities and Impediments Executable Requirements: Opportunities and Impediments G. A. Shaw and A. H. Anderson * Abstract: In a top-down, language-based design methodology,
More informationImproved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture
Improved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture Parth Shah 1 and Rachana Oza 2 1 Chhotubhai Gopalbhai Patel Institute of Technology, Bardoli, India parthpunita@yahoo.in
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationEngineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary
white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs
More informationACCELERATION OF IMAGE RESTORATION ALGORITHMS FOR DYNAMIC MEASUREMENTS IN COORDINATE METROLOGY BY USING OPENCV GPU FRAMEWORK
URN (Paper): urn:nbn:de:gbv:ilm1-2014iwk-140:6 58 th ILMENAU SCIENTIFIC COLLOQUIUM Technische Universität Ilmenau, 08 12 September 2014 URN: urn:nbn:de:gbv:ilm1-2014iwk:3 ACCELERATION OF IMAGE RESTORATION
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationMulticore Hardware and Parallelism
Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3
More informationImplementation of Deep Convolutional Neural Net on a Digital Signal Processor
Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm
More informationParallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU
Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware
More informationLECTURE 1. Introduction
LECTURE 1 Introduction CLASSES OF COMPUTERS When we think of a computer, most of us might first think of our laptop or maybe one of the desktop machines frequently used in the Majors Lab. Computers, however,
More informationIntroduction to GPU computing
Introduction to GPU computing Nagasaki Advanced Computing Center Nagasaki, Japan The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU
More informationNOISE SUSCEPTIBILITY OF PHASE UNWRAPPING ALGORITHMS FOR INTERFEROMETRIC SYNTHETIC APERTURE SONAR
Proceedings of the Fifth European Conference on Underwater Acoustics, ECUA 000 Edited by P. Chevret and M.E. Zakharia Lyon, France, 000 NOISE SUSCEPTIBILITY OF PHASE UNWRAPPING ALGORITHMS FOR INTERFEROMETRIC
More informationComputational Acceleration of Image Inpainting Alternating-Direction Implicit (ADI) Method Using GPU CUDA
Computational Acceleration of Inpainting Alternating-Direction Implicit (ADI) Method Using GPU CUDA Mutaqin Akbar mutaqin.akbar@gmail.com Pranowo pran@mail.uajy.ac.id Suyoto suyoto@mail.uajy.ac.id Abstract
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationAn Efficient GPU-Based Implementation of the R-MSF-Algorithm for Remote Sensing Imagery
An Efficient GPU-Based Implementation of the R-MSF-Algorithm for Remote Sensing Imagery David Castro-Palazuelos 1,2,*, Daniel Robles-Valdez 1, and Deni Torres-Roman 1 1 Center for Advanced Research and
More informationCUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni
CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3
More informationCS 426 Parallel Computing. Parallel Computing Platforms
CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:
More informationMassively Parallel Architectures
Massively Parallel Architectures A Take on Cell Processor and GPU programming Joel Falcou - LRI joel.falcou@lri.fr Bat. 490 - Bureau 104 20 janvier 2009 Motivation The CELL processor Harder,Better,Faster,Stronger
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 15 LAST TIME: CACHE ORGANIZATION Caches have several important parameters B = 2 b bytes to store the block in each cache line S = 2 s cache sets
More informationSince the invention of microprocessors at Intel in the late 1960s, Multicore Chips and Parallel Processing for High-End Learning Environments
old Learning on Demand Marcelo Hoffmann +1 650 859 3680; fax: +1 650 859 4544; electronic mail: mhoffmann@sric-bi.com Multicore Chips and Parallel Processing for High-End Learning Environments Why is this
More informationImproving Performance and Power of Multi-Core Processors with Wonderware and System Platform 3.0
Improving Performance and Power of Multi-Core Processors with Wonderware and Krishna Gummuluri, Wonderware Software Development Manager Introduction Wonderware has spent a considerable effort to enable
More information