GPU Acceleration of SAR/ISAR Imaging Algorithms

Size: px
Start display at page:

Download "GPU Acceleration of SAR/ISAR Imaging Algorithms"

Transcription

1 GPU Acceleration of SAR/ISAR Imaging Algorithms Gary Rubin Earl V. Sager, Ph.D. David H. Berger, Ph.D. ABSTRACT General Purpose Graphical Processor Units (GPGPUs) provide increased processing capability for applications with a high degree of data parallelism. In the past the few years, GPGPUs have become readily available in the commercial market, and off-the-shelf programming tools (e.g. CUDA from the NVIDIA Corporation and Jacket from Accelereyes, LLC) have made them more accessible to the technical community. SAR and ISAR imaging algorithms are inherently computationally intensive. In order to overcome performance limitations of CPUs and traditional DSPs, simplified, computationally-efficient algorithms are often used, but at the expense of the phase information available within the raw data. We have demonstrated that GPGPU acceleration of SAR/ISAR processing has greatly improved processing times of a less-efficient (but more flexible) algorithm, making its use more practical. We have shown that GPGPUs can provide performance improvement in excess of 30X for a backprojection-based SAR/ISAR imaging technique. Keywords: Algorithms, Computations, Data Processing, Imaging, Inverse SAR, Radar, Signal Processing, Synthetic Aperture Radar 1.0 Introduction For decades, Synthetic Aperture Radar (SAR) and Inverse SAR (ISAR) imaging techniques have been used to represent radar data in a way that is meaningful to a human analyst. The calculations necessary for these imaging techniques are computationally complex, and since the advent of SAR and ISAR imaging, computational throughput has always been a critical factor in the development, implementation, and use of processing algorithms. With the steady advance of computational power provided by CPUs, vector processors, and DSPs, imaging routines have become faster and faster over the years. Chip makers, however, have reached a clock-speed plateau, as processors with increasingly high transistor densities are no longer able to dissipate the heat associated with increasingly high clock speeds. This heat dissipation challenge has led to the recent trend away from ever-increasing clock speeds and instead has resulted in an explosion of multi-core, commercial-grade processors. Multicore processors offer the potential for extremely high throughput, with the ability to achieve such results depending greatly on the nature of the computations being performed. This paper describes recent efforts to use massively-parallel commercial GPUs to accelerate a Backprojection Algorithm (BPA)-based imaging routine. The paper will begin with a discussion of the imaging algorithms, followed by a description of the acceleration process and results. 2.0 Imaging Algorithms Over the years, radar scientists and engineers have developed a wide variety of imaging algorithms and implementations. These include Range Doppler (RDA), Polar Format (PFA), Chirp Scaling (CSA), Range Migration (RMA), and Backprojection or Time-Domain Correlation (BPA or TDC). Each algorithm has its own strengths and weaknesses, and choosing an optimal imaging algorithm typically depends on radar parameters and mission requirements [1,2].

2 y pixels For example, RDA provides a good balance between accuracy and efficiency, but at the expense of bandwidth and aperture length [2]. CSA is computationally efficient, but can limit scene size and image resolution. It also operates only on radar data that has not been de-chirped [1]. PFA can operate on de-chirped data, but may introduce geometric distortion [1]. This paper will primarily focus on an implementation of BPA, as described by [3]. BPA has the advantage of being able to image to an arbitrary surface, can provide phase and amplitude history for an image pixel, and is easily applicable to both SAR and ISAR imaging. BPA can be quite slow, but does exhibit a high degree of data parallelism, defined as simultaneous operations across large sets of data, rather than from multiple threads of control [4]. 2.1 Backprojection Algorithm Implementation We have tested our imaging routines using an ISAR data set collected by an SPC MkV instrumentation radar in The MkV used a 256-step chirp from 8-12 GHz to measure a Saab 9000 hatchback as the Saab was rotated on a turntable. A 360-degree image of the SAAB 9000 is shown in Figure image grid can be either planar or non-planar (Figure 3). 3. For each burst, calculate slant range between each pixel and the radar location. The white traces in Figure 2 represent two examples of pixel slant-range vs. burst number. 4. For each radar burst, use the pixel ranges calculated in step 4 to assign radar range cells to pixels. 5. For each pixel, coherently sum each pulse s signal contribution. Cross-range motion is provided either by the radar motion (SAR) or the target motion (ISAR). In our implementation, the image grid is held fixed, while the radar position is imagined to rotate around the center of the image grid Figure 2. High-range-resolution (HRR) vs. burst number for car ISAR data. This figure represents zero-padded step-chirp data for 180 degrees of rotation. Solid ( x in Figure 3) and dashed traces ( + in Figure 3) represent slant-range profiles for the pixels identified in Figure x pixels Figure 1. ISAR image of SAAB 9000 hatchback. Data represents 256-step 8-12 GHz chirp and 360 degrees of rotation. Pixel resolution is 1cm. The image was created using our BPA implementation. Our implementation of BPA is similar to that described by [3] and comprises the following steps: 1. Perform a downrange DFT on the radar data to obtain a range (fast time) v. position (slow time) data array, phase corrected to a reference range (Figure 2). 2. Form an image grid that defines the spatial location of each pixel relative to the radar. This Figure 3. Image Grid. Two arbitrary pixels are highlighted by the red + and magenta x. The black dot represents the ISAR center of rotation.

3 Target rotation causes the red and magenta pixels to trace the lines shown in Figure 2. In our implementation, Step 1 is a data-parallel operation across the multiple radar bursts, while Step 5 is a dataparallel operation performed inside the Step 4 for loop. 3.0 Code Acceleration There are two processes associated with GPU acceleration. First, the code must be written in a way that operations are highly data-parallel. For The Mathworks MATLAB, this requires that the code be vectorized (see Section 3.1). Second, once the algorithm has been implemented in a data-parallel manner, it can be targeted to the GPU, as described in Section Code Vectorization The BPA algorithm described above has been implemented in MATLAB R2010a. Initially, the code was translated from FORTRAN to MATLAB and relied heavily on nested for loops. The code was then largely rewritten using the MATLAB art of vectorization. In MATLAB, vectorization refers to taking advantage of polymorphism, a compiler feature that allows the same line of code to apply to scalars, vectors, or matrices. MATLAB can perform these vectorized calculations much more efficiently than loops and automatically multithreads some operations [5]. Depending on the nature of the calculations, vectorization may involve a trade-off between CPU efficiency and memory usage. Memory limitations may therefore prevent some vectorization. A simple example of vectorization is as follows. Define two random data vectors A=rand(10000,1); B=rand(10000,1); Using a 2.67 GHz Intel Core i7-920, the vectorized implementation executed in roughly 40% of the time of the looped expression. Another very useful vectorization function in MATLAB is bsxfun, which allows for efficient matrix-vector arithmetic. Consider the following example: Generate a random 2000x1000-element matrix. A=rand(2000,1000); Preallocate an output vector. For each of the 2000 rows, calculate and subtract the mean row value from each element in the row. Some vectorized calculations are used here as well, as CurrentRow is a vector and mean(currentrow) is a scalar. NuA1=zeros(size(A)); for indx=1:size(a,2) CurrentRow=A(:,indx); NuA1(:,indx)=CurrentRowmean(CurrentRow); end Perform the same operation using the matrix implementation of mean and bsxfun. Here, MeanA is a vector, while A is a matrix. MeanA=mean(A,1); NuA2=bsxfun(@minus,A,MeanA); In this case, the bsxfun implementation runs approximately 4x faster than the loop iteration on 2.67 GHz Intel Core i GPU Implementation Over the past several years, graphics processing unit (GPU) technology has experienced dramatic growth in terms of computational performance (Figure 4). Preallocate an output vector C, then perform the operation using a loop. C=zeros(10000,1); for indx=1:length(a) C(indx)=A(indx)*(B(indx)^2); end Perform the same calculation as a vector operation. The.* and.^ operators refer to element-by-element vector operations. C=A.*(B.^2); Figure 4. Growth in NVIDIA GPU performance vs. CPU performance. Solid lines are single-precision

4 Runtime (s) GFlops/sec; dashed lines are double-precision GFlops/sec [6]. AMD/ATI and NVIDIA are leaders in the GPU market, and both support non-graphics general-purpose GPU (GPGPU) applications. We have chosen to use NVIDIA GPUs due primarily to the maturity and community support of their CUDA development environment. To reduce schedule risk for the GPU acceleration effort described in this paper, we decided to avoid writing our own CUDA code. Instead, improvements to runtimes were achieved through the use of Accelereyes, LLC s Jacket software platform. Jacket serves as nearlytransparent middleware, allowing execution of MATLAB code on CUDA-capable NVIDIA GPUs directly from the MATLAB development environment. Jacket achieves this by overloading most base MATLAB functions. When these functions are called using special Jacket data classes, Jacket builds an internal representation of the program being run, compiles that representation if necessary, performs the computation on the GPU, and makes the results available to MATLAB if requested (leaving data GPU-resident as long as possible). Because Accelereyes has written Jacket to work with existing MATLAB syntax, parallelizing operations for GPU use is essentially identical to the MATLAB vectorization described in Section Acceleration Results Benchmarking was performed using the system described in Table 1. Table 1. Benchmark CPU CPU Intel Core 2.67 GHz Motherboard EVGA X58 SLI Memory 12 GB DDR OS (dual boot) -Windows 7 Professional 64-bit -CentOS 5.5 GPU 1 NVIDIA C1060 w/ 4 GB GDDR3 (~$1300) GPU 2 NVIDIA GeForce 9800 GT w/ 512 MB MATLAB R2010a Version Jacket Version 1.3 The Core i7-920 CPU provides four processing cores, each with two processing threads. Of these eight available processing threads, two are typically used during CPU benchmarking. The CPU resources could be applied more efficiently by using MATLAB s Parallel Computing Toolbox to spread the burst for-loop iterations across the multiple threads. Similarly, the Parallel Computing Toolbox can be used in conjunction with Jacket to spread the processing among multiple GPUs. For the imaging performance benchmarks, the dataset described in Table 2 and shown in Error! Reference source not found.figure 1 was used. Table 2. Benchmark Dataset Collection System SPC MkV radar Collection Mode ISAR Waveform Type Step-chirp Start Frequency (GHz) 8 Stop Frequency (GHz) 12 Chirp Bandwidth (GHz) 4 Frequency Steps 256 Angle Start (rad) Angle Stop (rad) Angle Step (rad) Range (m) 100 Subject Saab 9000 hatchback It is understood that the pixel resolutions used for benchmarking are much higher than the resolution supported by the actual data. While these resolutions may be artificially high for this particular dataset, they were used to demonstrate computational performance for large image sizes of the type that might be used for airborne or spaceborne SAR. Because the ISAR BPA implementation is virtually identical to the SAR BPA implementation, we believe that it is valid to use an ISAR dataset to demonstrate image sizes that are more typical of SAR. Execution times for the Core i7-920 CPU-only BPA ISAR imaging algorithm are shown in Figure 5. The C1060 GPU-enabled runtimes are shown in Figure x Figure 5. Runtimes for BPA ISAR imaging of Saab 9000 using Core i7-920 CPU under Win7 Pro 64-bit. Burst counts correspond to 10, 60, 120, 180, and 240- degree sectors.

5 Speedup (CPU Time / GPU Time) Runtime (s) Figure 6. Runtimes for BPA ISAR imaging of Saab 9000 using NVIDIA C1060 GPGPU under Win7 Pro 64-bit. Burst counts correspond to 10, 60, 120, 180, and 240-degree sectors. Speedup is defined as. Figure 7 shows speedup for the runtimes shown in Figure 5 and Figure 6. We believe that the sharp decrease in speedup after 15 megapixels is due to a memory efficiency threshold associated with the larger data arrays Figure 7. GPU Speedups; C1060 vs. Core i7-920 under Win7 Pro 64-bit 4.0 Related Work In addition to the imaging acceleration described in this paper, SPC has also demonstrated GPU acceleration for surface-surveillance radar clutter reduction. For that processing, we were able to demonstrate speedups of roughly 10x vs. the Core i7-920 and roughly 5x vs. a realtime DSP implementation. SPC has also begun the process of performing GPU acceleration of PFA-based imaging routines. This process was still in progress at the time of publication of this paper. 5.0 Summary We have demonstrated that highly-parallel, computationally-complex tasks, such as those associated with BPA SAR/ISAR imaging, can be greatly accelerated through the use of GPUs. We have demonstrated improvements in BPA runtime in excess of 30x, meaning that the GPU allows processing that might take an entire workweek on a standard desktop PC to be completed in a little over an hour. Such runtime improvements increase the practicality of BPA as a large-scale imaging routine. The speedups presented in this paper should not be seen as an upper limit. It is very likely that additional speed improvements could be realized by further optimization of the BPA code. It is also anticipated that GPU performance will be further improved as CUDA and Jacket evolve and are enhanced. 6.0 References [1] Carrera, W.G., Goodman, R.S., and Majewski, R.M., Spotlight Synthetic Aperture: Radar Signal Processing Algorithms, Norwood, MA: Artech House, 1995 [2] Cumming, I.G., and Wong, F.H., Digital Processing of Synthetic Aperture Radar Data, Norwood, MA: Artech House, 2005 [3] Soumekh, M., Synthetic Aperture Radar Signal Processing with MATLAB Algorithms, New York: John Wiley & Sons, 1999 [4] Hillis, W.D., and Steele, G. L., Data Parallel Algorithms, Communications of the ACM 29, 12 (Dec. 1986), pp [5] Which MATLAB functions benefit from multithreaded computation?, MATLAB Technical Solution, [6] Source: NVIDIA via personal correspondence 7.0 Acknowledgments Thanks to Gallagher Pryor, Dave Gibson, and others at Accelereyes, LLC for their inputs and technical advice.

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

Using CUDA to Accelerate Radar Image Processing

Using CUDA to Accelerate Radar Image Processing Using CUDA to Accelerate Radar Image Processing Aaron Rogan Richard Carande 9/23/2010 Approved for Public Release by the Air Force on 14 Sep 2010, Document Number 88 ABW-10-5006 Company Overview Neva Ridge

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

A real time SAR processor implementation with FPGA

A real time SAR processor implementation with FPGA Computational Methods and Experimental Measurements XV 435 A real time SAR processor implementation with FPGA C. Lesnik, A. Kawalec & P. Serafin Institute of Radioelectronics, Military University of Technology,

More information

Chapter 04. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 04. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 04 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 4.1 Potential speedup via parallelism from MIMD, SIMD, and both MIMD and SIMD over time for

More information

Turbostream: A CFD solver for manycore

Turbostream: A CFD solver for manycore Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware

More information

FFT-Based Astronomical Image Registration and Stacking using GPU

FFT-Based Astronomical Image Registration and Stacking using GPU M. Aurand 4.21.2010 EE552 FFT-Based Astronomical Image Registration and Stacking using GPU The productive imaging of faint astronomical targets mandates vanishingly low noise due to the small amount of

More information

Optimizing and Accelerating Your MATLAB Code

Optimizing and Accelerating Your MATLAB Code Optimizing and Accelerating Your MATLAB Code Sofia Mosesson Senior Application Engineer 2016 The MathWorks, Inc. 1 Agenda Optimizing for loops and using vector and matrix operations Indexing in different

More information

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen Frank Graeber Application Engineering MathWorks Germany 2013 The MathWorks, Inc. 1 Speed up the serial code within core

More information

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU 1 1 Samara National Research University, Moskovskoe Shosse 34, Samara, Russia, 443086 Abstract.

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Adaptive Doppler centroid estimation algorithm of airborne SAR

Adaptive Doppler centroid estimation algorithm of airborne SAR Adaptive Doppler centroid estimation algorithm of airborne SAR Jian Yang 1,2a), Chang Liu 1, and Yanfei Wang 1 1 Institute of Electronics, Chinese Academy of Sciences 19 North Sihuan Road, Haidian, Beijing

More information

Improving Segmented Interferometric Synthetic Aperture Radar Processing Using Presumming. by: K. Clint Slatton. Final Report.

Improving Segmented Interferometric Synthetic Aperture Radar Processing Using Presumming. by: K. Clint Slatton. Final Report. Improving Segmented Interferometric Synthetic Aperture Radar Processing Using Presumming by: K. Clint Slatton Final Report Submitted to Professor Brian Evans EE381K Multidimensional Digital Signal Processing

More information

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge

More information

ATS-GPU Real Time Signal Processing Software

ATS-GPU Real Time Signal Processing Software Transfer A/D data to at high speed Up to 4 GB/s transfer rate for PCIe Gen 3 digitizer boards Supports CUDA compute capability 2.0+ Designed to work with AlazarTech PCI Express waveform digitizers Optional

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

Multistatic SAR Algorithm with Image Combination

Multistatic SAR Algorithm with Image Combination Multistatic SAR Algorithm with Image Combination Tommy Teer and Nathan A. Goodman Department of Electrical and Computer Engineering, The University of Arizona 13 E. Speedway Blvd., Tucson, AZ 8571-14 Phone:

More information

Memorandum. Clint Slatton Prof. Brian Evans Term project idea for Multidimensional Signal Processing (EE381k)

Memorandum. Clint Slatton Prof. Brian Evans Term project idea for Multidimensional Signal Processing (EE381k) Memorandum From: To: Subject: Date : Clint Slatton Prof. Brian Evans Term project idea for Multidimensional Signal Processing (EE381k) 16-Sep-98 Project title: Minimizing segmentation discontinuities in

More information

The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration

The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration Dell IP Video Platform Design and Calibration Lab June 2018 H17415 Reference Architecture Dell EMC Solutions Copyright

More information

Map3D V58 - Multi-Processor Version

Map3D V58 - Multi-Processor Version Map3D V58 - Multi-Processor Version Announcing the multi-processor version of Map3D. How fast would you like to go? 2x, 4x, 6x? - it's now up to you. In order to achieve these performance gains it is necessary

More information

CSE 599 I Accelerated Computing - Programming GPUS. Memory performance

CSE 599 I Accelerated Computing - Programming GPUS. Memory performance CSE 599 I Accelerated Computing - Programming GPUS Memory performance GPU Teaching Kit Accelerated Computing Module 6.1 Memory Access Performance DRAM Bandwidth Objective To learn that memory bandwidth

More information

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte

More information

Digital Processing of Synthetic Aperture Radar Data

Digital Processing of Synthetic Aperture Radar Data Digital Processing of Synthetic Aperture Radar Data Algorithms and Implementation Ian G. Cumming Frank H. Wong ARTECH HOUSE BOSTON LONDON artechhouse.com Contents Foreword Preface Acknowledgments xix xxiii

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

Introduction to Matlab GPU Acceleration for. Computational Finance. Chuan- Hsiang Han 1. Section 1: Introduction

Introduction to Matlab GPU Acceleration for. Computational Finance. Chuan- Hsiang Han 1. Section 1: Introduction Introduction to Matlab GPU Acceleration for Computational Finance Chuan- Hsiang Han 1 Abstract: This note aims to introduce the concept of GPU computing in Matlab and demonstrates several numerical examples

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Speeding up MATLAB Applications Sean de Wolski Application Engineer

Speeding up MATLAB Applications Sean de Wolski Application Engineer Speeding up MATLAB Applications Sean de Wolski Application Engineer 2014 The MathWorks, Inc. 1 Non-rigid Displacement Vector Fields 2 Agenda Leveraging the power of vector and matrix operations Addressing

More information

GeoImaging Accelerator Pansharpen Test Results. Executive Summary

GeoImaging Accelerator Pansharpen Test Results. Executive Summary Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance Whitepaper), the same approach has

More information

The HPEC Challenge Benchmark Suite

The HPEC Challenge Benchmark Suite The HPEC Challenge Benchmark Suite Ryan Haney, Theresa Meuse, Jeremy Kepner and James Lebak Massachusetts Institute of Technology Lincoln Laboratory HPEC 2005 This work is sponsored by the Defense Advanced

More information

Slide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, DRAM Bandwidth

Slide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, DRAM Bandwidth Slide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2016 DRAM Bandwidth MEMORY ACCESS PERFORMANCE Objective To learn that memory bandwidth is a first-order performance factor in

More information

LUNAR TEMPERATURE CALCULATIONS ON A GPU

LUNAR TEMPERATURE CALCULATIONS ON A GPU LUNAR TEMPERATURE CALCULATIONS ON A GPU Kyle M. Berney Department of Information & Computer Sciences Department of Mathematics University of Hawai i at Mānoa Honolulu, HI 96822 ABSTRACT Lunar surface temperature

More information

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters 1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk

More information

Optimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing

Optimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing Optimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing Jeffrey T. Muehring and John K. Antonio Deptartment of Computer Science, P.O. Box 43104, Texas Tech University, Lubbock, TX

More information

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017 INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and

More information

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction

More information

Parallel and Distributed Computing with MATLAB The MathWorks, Inc. 1

Parallel and Distributed Computing with MATLAB The MathWorks, Inc. 1 Parallel and Distributed Computing with MATLAB 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster insight on more complex problems with larger datasets

More information

A Challenge Problem for 2D/3D Imaging of Targets from a Volumetric Data Set in an Urban Environment

A Challenge Problem for 2D/3D Imaging of Targets from a Volumetric Data Set in an Urban Environment A Challenge Problem for 2D/3D Imaging of Targets from a Volumetric Data Set in an Urban Environment Curtis H. Casteel, Jr,*, LeRoy A. Gorham, Michael J. Minardi, Steven M. Scarborough, Kiranmai D. Naidu,

More information

White Paper Assessing FPGA DSP Benchmarks at 40 nm

White Paper Assessing FPGA DSP Benchmarks at 40 nm White Paper Assessing FPGA DSP Benchmarks at 40 nm Introduction Benchmarking the performance of algorithms, devices, and programming methodologies is a well-worn topic among developers and research of

More information

Lecture 1: Gentle Introduction to GPUs

Lecture 1: Gentle Introduction to GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes. HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation

More information

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPGPU. Peter Laurens 1st-year PhD Student, NSC GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing

More information

Deep Learning Performance and Cost Evaluation

Deep Learning Performance and Cost Evaluation Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,

More information

Parallelism in Hardware

Parallelism in Hardware Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law

More information

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved. Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE

More information

Measurement of real time information using GPU

Measurement of real time information using GPU Measurement of real time information using GPU Pooja Sharma M. Tech Scholar, Department of Electronics and Communication E-mail: poojachaturvedi1985@gmail.com Rajni Billa M. Tech Scholar, Department of

More information

Chapter 1: Fundamentals of Quantitative Design and Analysis

Chapter 1: Fundamentals of Quantitative Design and Analysis 1 / 12 Chapter 1: Fundamentals of Quantitative Design and Analysis Be careful in this chapter. It contains a tremendous amount of information and data about the changes in computer architecture since the

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming

More information

Calculation of Weight Vectors for Wideband Beamforming Using Graphics Processing Units

Calculation of Weight Vectors for Wideband Beamforming Using Graphics Processing Units Calculation of Weight Vectors for Wideband Beamforming Using Graphics Processing Units Jason D. Bonior, Zhen Hu and Robert C. Qiu Department of Electrical and Computer Engineering Tennessee Technological

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University

More information

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1 Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei

More information

Performance of computer systems

Performance of computer systems Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

More information

Gedae cwcembedded.com. The CHAMP-AV6 VPX-REDI. Digital Signal Processing Card. Maximizing Performance with Minimal Porting Effort

Gedae cwcembedded.com. The CHAMP-AV6 VPX-REDI. Digital Signal Processing Card. Maximizing Performance with Minimal Porting Effort Technology White Paper The CHAMP-AV6 VPX-REDI Digital Signal Processing Card Maximizing Performance with Minimal Porting Effort Introduction The Curtiss-Wright Controls Embedded Computing CHAMP-AV6 is

More information

Accelerating image registration on GPUs

Accelerating image registration on GPUs Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one

More information

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty

More information

Using GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation

Using GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation Using GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation GPU Technology Conference 2012 May 15, 2012 Thomas M. Benson, Daniel P. Campbell, Daniel A. Cook thomas.benson@gtri.gatech.edu

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

GPU-accelerated 3-D point cloud generation from stereo images

GPU-accelerated 3-D point cloud generation from stereo images GPU-accelerated 3-D point cloud generation from stereo images Dr. Bingcai Zhang Release of this guide is approved as of 02/28/2014. This document gives only a general description of the product(s) or service(s)

More information

Accelerating CFD with Graphics Hardware

Accelerating CFD with Graphics Hardware Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery

More information

Synthetic Aperture Radar Modeling using MATLAB and Simulink

Synthetic Aperture Radar Modeling using MATLAB and Simulink Synthetic Aperture Radar Modeling using MATLAB and Simulink Naivedya Mishra Team Lead Uurmi Systems Pvt. Ltd. Hyderabad Agenda What is Synthetic Aperture Radar? SAR Imaging Process Challenges in Design

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

A MATLAB Interface to the GPU

A MATLAB Interface to the GPU Introduction Results, conclusions and further work References Department of Informatics Faculty of Mathematics and Natural Sciences University of Oslo June 2007 Introduction Results, conclusions and further

More information

EMBEDDED VISION AND 3D SENSORS: WHAT IT MEANS TO BE SMART

EMBEDDED VISION AND 3D SENSORS: WHAT IT MEANS TO BE SMART EMBEDDED VISION AND 3D SENSORS: WHAT IT MEANS TO BE SMART INTRODUCTION Adding embedded processing to simple sensors can make them smart but that is just the beginning of the story. Fixed Sensor Design

More information

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo N-Body Simulation using CUDA CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo Project plan Develop a program to simulate gravitational

More information

Profiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015

Profiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015 Profiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015 Oct 1: Introduction to OpenACC Oct 6: Office Hours Oct 15: Profiling and Parallelizing with the OpenACC Toolkit

More information

Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer

Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster

More information

Deep Learning Performance and Cost Evaluation

Deep Learning Performance and Cost Evaluation Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

Executable Requirements: Opportunities and Impediments

Executable Requirements: Opportunities and Impediments Executable Requirements: Oppotunities and Impediments Executable Requirements: Opportunities and Impediments G. A. Shaw and A. H. Anderson * Abstract: In a top-down, language-based design methodology,

More information

Improved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture

Improved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture Improved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture Parth Shah 1 and Rachana Oza 2 1 Chhotubhai Gopalbhai Patel Institute of Technology, Bardoli, India parthpunita@yahoo.in

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs

More information

ACCELERATION OF IMAGE RESTORATION ALGORITHMS FOR DYNAMIC MEASUREMENTS IN COORDINATE METROLOGY BY USING OPENCV GPU FRAMEWORK

ACCELERATION OF IMAGE RESTORATION ALGORITHMS FOR DYNAMIC MEASUREMENTS IN COORDINATE METROLOGY BY USING OPENCV GPU FRAMEWORK URN (Paper): urn:nbn:de:gbv:ilm1-2014iwk-140:6 58 th ILMENAU SCIENTIFIC COLLOQUIUM Technische Universität Ilmenau, 08 12 September 2014 URN: urn:nbn:de:gbv:ilm1-2014iwk:3 ACCELERATION OF IMAGE RESTORATION

More information

ECE 8823: GPU Architectures. Objectives

ECE 8823: GPU Architectures. Objectives ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading

More information

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm

More information

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware

More information

LECTURE 1. Introduction

LECTURE 1. Introduction LECTURE 1 Introduction CLASSES OF COMPUTERS When we think of a computer, most of us might first think of our laptop or maybe one of the desktop machines frequently used in the Majors Lab. Computers, however,

More information

Introduction to GPU computing

Introduction to GPU computing Introduction to GPU computing Nagasaki Advanced Computing Center Nagasaki, Japan The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU

More information

NOISE SUSCEPTIBILITY OF PHASE UNWRAPPING ALGORITHMS FOR INTERFEROMETRIC SYNTHETIC APERTURE SONAR

NOISE SUSCEPTIBILITY OF PHASE UNWRAPPING ALGORITHMS FOR INTERFEROMETRIC SYNTHETIC APERTURE SONAR Proceedings of the Fifth European Conference on Underwater Acoustics, ECUA 000 Edited by P. Chevret and M.E. Zakharia Lyon, France, 000 NOISE SUSCEPTIBILITY OF PHASE UNWRAPPING ALGORITHMS FOR INTERFEROMETRIC

More information

Computational Acceleration of Image Inpainting Alternating-Direction Implicit (ADI) Method Using GPU CUDA

Computational Acceleration of Image Inpainting Alternating-Direction Implicit (ADI) Method Using GPU CUDA Computational Acceleration of Inpainting Alternating-Direction Implicit (ADI) Method Using GPU CUDA Mutaqin Akbar mutaqin.akbar@gmail.com Pranowo pran@mail.uajy.ac.id Suyoto suyoto@mail.uajy.ac.id Abstract

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

An Efficient GPU-Based Implementation of the R-MSF-Algorithm for Remote Sensing Imagery

An Efficient GPU-Based Implementation of the R-MSF-Algorithm for Remote Sensing Imagery An Efficient GPU-Based Implementation of the R-MSF-Algorithm for Remote Sensing Imagery David Castro-Palazuelos 1,2,*, Daniel Robles-Valdez 1, and Deni Torres-Roman 1 1 Center for Advanced Research and

More information

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

Massively Parallel Architectures

Massively Parallel Architectures Massively Parallel Architectures A Take on Cell Processor and GPU programming Joel Falcou - LRI joel.falcou@lri.fr Bat. 490 - Bureau 104 20 janvier 2009 Motivation The CELL processor Harder,Better,Faster,Stronger

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 15 LAST TIME: CACHE ORGANIZATION Caches have several important parameters B = 2 b bytes to store the block in each cache line S = 2 s cache sets

More information

Since the invention of microprocessors at Intel in the late 1960s, Multicore Chips and Parallel Processing for High-End Learning Environments

Since the invention of microprocessors at Intel in the late 1960s, Multicore Chips and Parallel Processing for High-End Learning Environments old Learning on Demand Marcelo Hoffmann +1 650 859 3680; fax: +1 650 859 4544; electronic mail: mhoffmann@sric-bi.com Multicore Chips and Parallel Processing for High-End Learning Environments Why is this

More information

Improving Performance and Power of Multi-Core Processors with Wonderware and System Platform 3.0

Improving Performance and Power of Multi-Core Processors with Wonderware and System Platform 3.0 Improving Performance and Power of Multi-Core Processors with Wonderware and Krishna Gummuluri, Wonderware Software Development Manager Introduction Wonderware has spent a considerable effort to enable

More information