Image and Video Processing on Parallel (GPU) and Heterogeneous Architectures (Multi-CPU/Multi-GPU)

Size: px
Start display at page:

Download "Image and Video Processing on Parallel (GPU) and Heterogeneous Architectures (Multi-CPU/Multi-GPU)"

Transcription

1 Faculté Polytechnique 2nd Workshop of COST 0805 Open Network for High-Performance Computing on Complex Environments Image and Video Processing on Parallel (GPU) and Heterogeneous Architectures (Multi-CPU/Multi-GPU) 26 Jan 2012 Sidi Ahmed Mahmoudi, Pierre Manneback Computer Science Department, Faculty of Engineering. UMONS

2 Agenda 1. Introduction 2. Context 3. Image Processing on Parallel (GPU) and Heterogeneous Architectures 4. Video Processing on GPU 5. Proposed Framework for Multimedia Processing on Heterogeneous Architectures 6. Experimental Results 7. Conclusion Université de Mons 2

3 Introduction «The number of transistors that can be placed on an integrated circuit would double every two years». Moore s law. Effectively, the CPU power has doubled every 18 months till 2008 This law has no more been respected in recent years for thermal reasons: CPU power capped at 4 GHz. Solution: multiplication of computing units in CPU (many-cores) Large number on processing units on GPU: initially used in 3D and video games. Birth of GPGPU: General Purpose Graphic Processing Unit. The use of GPU to perform tasks habitually performed by the CPU. Université de Mons 3

4 CPU INTRODUCTION CONTEXT IMAGE PROCESSING ON HETEROGENEOUS ARCHITECTURES VIDEO PROCESSING ON GPU PROPOSED FRAMEWORK EXPERIMENTATIONS CONCLUSION Control DRAM ALU ALU Cache ALU ALU Context Hardware High computing power of GPUs. GPU Heterogeneous architectures: Multi-CPU/Multi-GPU. DRAM Applications Intensive processing of multimedia objects (images, videos, etc.). Platform Multi -CPU-GPU High intensity : Large volumes of multimedia objects (HD, Full HD, etc.). Constraints Transfer time between CPU and GPU memories. Adapted selection of the computing units (CPU or/and GPU) for processing. Complex management of heterogeneous architectures. Objectives Efficient multimedia processing on heterogeneous architectures (Multi-CPU/Multi-GPU). Efficient Selection of the computing units depending on the type of media to process. Université de Mons 4

5 GPU Programming Brook GPU : Since ATI Stream : for ATI cards. DirectX 11, OpenGL : GPGPU shaders. ATI Radeon-HD-4770 OpenCL : all types of GPUs. CUDA : for nvidia cards. Université de Mons 5 NVIDIA-GTX-590

6 GPU Programming: Runtimes for heterogeneous platforms 1. StarPU 2. StarSs Developed in LABRI laboratory. Bordeaux. France. Exploitation of the full computing power of machines (multi-cpu-gpu). Efficient Scheduling strategies. Developed in the university of Cataluña. Spain. Flexible programming model for multicores. Based mainly on: CPUSS: for multicore programming. GPUSS: for multi-gpu programming. 3. Grand Central Dispatch Developed by Apple, released for Mac systems. Optimize application support for systems with multi-core processors Université de Mons 6

7 Image Processing on GPU Image processing fits naturally for data parallel processing - pixels can be mapped directly to threads - lots of data are shared between pixels - high resolution images require intensive computing Advantage of CUDA and pixel shader for based image processing CUDA supports sharing images data with OpenGL and Direct3D applications Université de Mons 7

8 Image Processing on GPU Case 1: Single Image : OpenGL Visualization (without transfer CPU-GPU). Case 2: Set of images: Storing results on CPU memory ( with transfer CPU-GPU). Université de Mons 8

9 Image Processing on GPU 1. Classic image processing methods: Geometrical transformations(rotation, translation, etc.) Parallel processing between image pixels GPU Acceleration ranging from 10x to 40x compared to CPU Input Image 2. Corner Detection on GPU: Preliminary step for many algorithms of computer vision GPU implementation based on Harris and Bouguet technics Efficiency: invariance to rotation, brightness, scale, etc. Detected Contours on GPU 3. Edge Detection contours on GPU: GPU Implementation based on Deriche-Canny method Efficiency: robustness to noise, reduced number of operations. Good quality detected contours. Université de Mons 9 Detected Corners on GPU

10 Image Processing on GPU Case 1: Single Image (OpenGL Visualization) Image Resolution Corner Detection (CPU) Corner Detection (GPU) Speedup 512* ms ms * ms ms * ms ms * ms ms Corner Detection using Harris Detector Image Resolution Edge Detection (CPU) Edge Detection (GPU) Speedup 512* ms ms * ms ms * ms ms * ms ms Edge Detection using Deriche-Canny Method Case 2: Set of Image (Resolution: 1476*1680) Images Number Corner Detection (CPU) Corner Detection (GPU) Speedup s 0.48 s s 1.35 s s 2.60 s s 4.29 s 6.11 Corner Detection using Harris Detector Images Number Edge Detection (CPU) Edge Detection (GPU) Speedup s 0.40 s s 0.98 s s 1.80 s s 3.43 s 8.51 Edge Detection using Deriche-Canny Method Université de Mons 10

11 Multiple Image Processing on Heterogeneous Platforms (Multi-CPU/Multi-GPU) Optimization : - GPU streaming Technic: overlap kernel execution with device and host memory copies. - Streaming within multiple GPUs allowed improving performances about 25% in case of data-base medical image processing solution (Experimental results). Université de Mons 11

12 Video Processing on GPU in Real Time CPU GPU Video (Real Time) Imagei (i <=N) on GPU Imagei (i <=N) on CPU CUDA Parallel Processing OpenGL Visualisation i = i +1 Video End NO END YES Université de Mons 12

13 Video Processing on GPU in Real Time Background Subtraction on GPU Point of Interest Detection on GPU Performance: Background Subtraction on GPU Performance :Corner Detection on GPU Université de Mons 13

14 Video Processing on GPU in Real Time CPU Dual Core GPU GTX 280 GPU Tesla C2070(Fermi) 80 FPS x x x x1080 Video Resolution Performance: Optical flow Computation on GPU Université de Mons 14

15 Proposed Framework for Heterogeneous Multimedia Processing In case of heterogeneous computing, we use a scheduling strategy which gives priority to GPU for high intensive tasks and to CPU for less intensive tasks. Université de Mons 15

16 Use Case 1: Vertebra Segmentation Extraction of mean shape models of vertebrae Set of medical images Histogram Equalization (Improve contrast) Edge Detection Corner Detection Selection of vertebrae corners Extraction of real vertebrae (ASM) CPU treatments Hybrid Processing Multi-CPU/Multi-GPU CPU treatments Heterogeneous Computing for Vertebra Segmentation Université de Mons 16

17 Use Case 2: Video Indexation VideoCycle: Indexation of video sequences based on features extraction: Silhouette. Areas mouvements. Contours. Hu Moments. Hybrid detection of contours Hu moments extracted from edges Université de Mons 17

18 EXPERIMENTAL RESULTS: PERFORMANCE Speedup GPU 1GPU-2CPU 2-GPU 2GPU-4CPU 4GPU 4GPU-8CPU 0 512x x x x3936 Image Resolution Performance of edge + corner detection on heterogeneous architectures Note: The use of GPU Streaming technic allowed improving performances about 25%. Université de Mons 18

19 CONCLUSION The proposed framework allows parallel treatments of images at two levels: Low level: parallel processing on GPU between pixels in image (intra-image parallel processing) High Level: Simultaneous exploitation of both CPUs and GPUs cores (inter-image parallel processing) GPU treatments for High Definition video processing on real time : CUDA treatments and OpenGL visualization. Use of CUDA streaming technic in order to overlap transfers with computations. Current works : Computation of the intensity factor of each algorithm based on different parameter (number of operations, number of memory access, dependency factor, etc.) Efficient Selection of resources (CPU or/ and GPU) for a full exploitation of heterogeneous architectures. Université de Mons 19

20 Future Works Future Works : A general framework enabling an automatic selection of resources (CPU or/and GPU) depending of the intensity of image (single or multiple) and video processing applications. Exploitation of SDI capture (Input/Output) cards for real time video processing exploiting multiple outputs simultaneously. Quadro SDI Capture Card Full integrated GPU-based solution for real-time video processing Université de Mons 20

21 Regular Papers in Journals PUBLICATIONS F. Lecron, S. A. Mahmoudi, M. Benjelloun, S. Mahmoudi and P. Manneback "Heerogenous Computing for Vertebra Detection and Segmentation in X-Ray Images", International Journal of Biomedical Imaging : Parallel Computation in Medical Imaging Applications. Juin S. A. Mahmoudi, P. Manneback, C. Augonnet, S. Thibault «Traitement d Images sur Architectures Parallèles et Hétérogènes», Revue des sciences et technologies de l'information. In submission ( Submitted on 16/09/2011). International Conferences and Workshops S. A. Mahmoudi, P. Manneback, C. Augonnet, S. Thibault "Détection optimale des coins et contours dans des bases d'images volumineuses sur architectures multi-cœurs hétérogènes", 20eme Rencontres Francophones du Parallélisme, RenPar'20, Saint-Malo, France. Mai S. A. Mahmoudi, S. Frémal, M. Bagein, P. Manneback, "Calcul intensif sur GPU : exemples en traitement d'images, en bio-informatique et en télécommunication", CIAE 2011 : Colloque d'informatique, Automatique et Electronique, Casablanca, Maroc. Mars S. A. Mahmoudi, F. Lecron, P. Manneback, M. Benjelloun, S. Mahmoudi, "GPU-Based Segmentation of Cervical Vertebra in X-Ray Images", Workshop HPCCE. IEEE International Conference on Cluster Computing, Crete, Greece. Septembre S. A. Mahmoudi, P. Manneback, "Parallel Image Processing with CUDA and OpenGL", Network for High-Performance Computing on Complex Environments. Lisbon, Portugal. COST ACTION IC 805, WG Meeting. October S. A. Mahmoudi, P. Manneback, "Traitements d'images sur GPU sous CUDA et OpenGL : application a l'imagerie médicale", Journées CIGIL : Calcul Intensif et Grilles Informatiques a Lille. Lille, France. December S. A. Mahmoudi, Pierre Manneback, «Traitement d'objets multimédias sur gpu", Seconde journée scientifique du pôle hainuyer. Belgique, Mai Technical Reports: S. Dupont, C. Frisson, S. A. Mahmoudi, X. Siebert, J. Urbain, T. Ravet, "MediaBlender : Interactive Multimedia Segmentation and Annotation", QPSR of the numediart research program, volume 3, December M. Mancas, R. B. Madkhour, S. A. Mahmoudi, T. Ravet, "VirTrack: Tracking for Virtual Studios", QPSR of the numediart research program, volume 3, N 1, pp. 1-4, March M. Mancas, J. Tilmanne, R. Chessini, S. Hidot, C. Machy, S. A. Mahmoudi, T. Ravet, "MATRIX : Natural Interaction Between Real and Virtual Worlds", QPRS of the numediart research program, vol. 1, N 5, January M. Mancas, M. Bagein, N. Guichard, S. Hidot, C. Machy, S. A. Mahmoudi, X. Siebert, "AVS : Augmented Virtual Studio", QPSR of the numediart research program, Vol. 1, No. 4, December Université de Mons 21

22 BIBLIOGRAPHIE [NVIDIA2010] nvidia Corporation, nvidia cuda programming guide version 3.2. in : Cuda zone, [Online]. Available: develop.html [Cedric2009] C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, In Concurrency and Computation: Practice and Experience, Euro-Par 2009, best papers issue, pp , [Eduard2009] ] Eduard Ayguadé et al. An Extension of the StarSs Programming Model for Platforms with Multiple GPUs. In Euro-Par 09: Proceedings of the 15th International Euro-Par Conference on Parallel Processing, pages , Berlin, Heidelberg, [Grand2009] Apple, Grand Central Dispatch. A better way to do multicore," [Online]. Available : [Yang2008] Z. Yang, Y. Zhu, and Y. pu, Parallel Image Processing Based on CUDA," International Conference on Computer Science and Software Engineering. China, pp , [OpenVIDIA2005] J. Fung et al. OpenVIDIA :Parallel gpu computer vision," In Proc of ACM Multimedia, pp , [Heng2005] Y. Heng and L. GPU-based Volume Rendering for Medical Image Visualization, Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference Shanghai, China, pp , [Schiwietz2006] T. Schiwietz, T. Chang, P. Speier, and R. Westermann, MR image reconstruction using the GPU," Image-Guided Procedures, and Display. Proceedings of the SPIE, pp , [Sinha2006] S. N. Sinha, J. M. Frahm, M. Pollefeys, and Y. Genc, GPU-based video feature tracking and matching," Workshop on Edge Computing Using New Commodity Architectures (EDGE 2006), Chapel Hill, [Midhun2008] M. Midhun, K. C. Neethu, and J. Preetha, Real-time face tracking with GPU acceleration," High Performance Computing Group, Network Systems and Technologies(P) Ltd, [Sundaram2010] N. Sundaram, T. Brox, and K. Keutzer, Dense point trajectories by gpu-accelerated large displacement optical flow," [Online]. Available : http: // Université de Mons 22

23 Université de Mons 23

Faculté Polytechnique

Faculté Polytechnique Faculté Polytechnique INFORMATIQUE PARALLÈLE ET DISTRIBUÉE CHAPTER 7 : MULTI-CPU/MULTI-GPU PROCESSING APPLICATION FOR IMAGE AND VIDEO PROCESSING Sidi Ahmed Mahmoudi sidi.mahmoudi@umons.ac.be 13 December

More information

Multi-CPU/Multi-GPU Based Framework for Multimedia Processing

Multi-CPU/Multi-GPU Based Framework for Multimedia Processing Multi-CPU/Multi-GPU Based Framework for Multimedia Processing Sidi Mahmoudi, Pierre Manneback To cite this version: Sidi Mahmoudi, Pierre Manneback. Multi-CPU/Multi-GPU Based Framework for Multimedia Processing.

More information

Faculté Polytechnique

Faculté Polytechnique Faculté Polytechnique MULTIMEDIA RETRIEVAL & INDEXATION CHAPTER 7 : CLOUD & GPU FOR MULTIMEDIA RETRIEVAL Sidi Ahmed Mahmoudi sidi.mahmoudi@umons.ac.be 19 December 2017 PLAN Introduction I. Multimedia retrieval

More information

Performance Analysis of Sobel Edge Detection Filter on GPU using CUDA & OpenGL

Performance Analysis of Sobel Edge Detection Filter on GPU using CUDA & OpenGL Performance Analysis of Sobel Edge Detection Filter on GPU using CUDA & OpenGL Ms. Khyati Shah Assistant Professor, Computer Engineering Department VIER-kotambi, INDIA khyati30@gmail.com Abstract: CUDA(Compute

More information

Semi-Automatic Detection of Cervical Vertebrae in X-ray Images Using Generalized Hough Transform

Semi-Automatic Detection of Cervical Vertebrae in X-ray Images Using Generalized Hough Transform Semi-Automatic Detection of Cervical Vertebrae in X-ray Images Using Generalized Hough Transform Mohamed Amine LARHMAM, Saïd MAHMOUDI and Mohammed BENJELLOUN Faculty of Engineering, University of Mons,

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

Improving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine

Improving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine Improving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine Samuel Cremer 1,2, Michel Bagein 1, Saïd Mahmoudi 1, Pierre Manneback 1 1 UMONS, University of Mons Computer Science

More information

GPU for HPC. October 2010

GPU for HPC. October 2010 GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,

More information

Neural Network Implementation using CUDA and OpenMP

Neural Network Implementation using CUDA and OpenMP Neural Network Implementation using CUDA and OpenMP Honghoon Jang, Anjin Park, Keechul Jung Department of Digital Media, College of Information Science, Soongsil University {rollco82,anjin,kcjung}@ssu.ac.kr

More information

Real-Time Scene Reconstruction. Remington Gong Benjamin Harris Iuri Prilepov

Real-Time Scene Reconstruction. Remington Gong Benjamin Harris Iuri Prilepov Real-Time Scene Reconstruction Remington Gong Benjamin Harris Iuri Prilepov June 10, 2010 Abstract This report discusses the implementation of a real-time system for scene reconstruction. Algorithms for

More information

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty

More information

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs An Extension of the StarSs Programming Model for Platforms with Multiple GPUs Eduard Ayguadé 2 Rosa M. Badia 2 Francisco Igual 1 Jesús Labarta 2 Rafael Mayo 1 Enrique S. Quintana-Ortí 1 1 Departamento

More information

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied

More information

A cache-aware performance prediction framework for GPGPU computations

A cache-aware performance prediction framework for GPGPU computations A cache-aware performance prediction framework for GPGPU computations The 8th Workshop on UnConventional High Performance Computing 215 Alexander Pöppl, Alexander Herz August 24th, 215 UCHPC 215, August

More information

Static Scene Reconstruction

Static Scene Reconstruction GPU supported Real-Time Scene Reconstruction with a Single Camera Jan-Michael Frahm, 3D Computer Vision group, University of North Carolina at Chapel Hill Static Scene Reconstruction 1 Capture on campus

More information

Communication Library to Overlap Computation and Communication for OpenCL Application

Communication Library to Overlap Computation and Communication for OpenCL Application Communication Library to Overlap Computation and Communication for OpenCL Application Toshiya Komoda, Shinobu Miwa, Hiroshi Nakamura Univ.Tokyo What is today s talk about? Heterogeneous Computing System

More information

GPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3

GPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 /CPU,a),2,2 2,2 Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 XMP XMP-dev CPU XMP-dev/StarPU XMP-dev XMP CPU StarPU CPU /CPU XMP-dev/StarPU N /CPU CPU. Graphics Processing Unit GP General-Purpose

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory

More information

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance

More information

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction

More information

Fully Automatic Vertebra Detection in X-Ray Images Based on Multi-Class SVM

Fully Automatic Vertebra Detection in X-Ray Images Based on Multi-Class SVM Fully Automatic Vertebra Detection in X-Ray Images Based on Multi-Class SVM Fabian Lecron, Mohammed Benjelloun, Saïd Mahmoudi University of Mons, Faculty of Engineering, Computer Science Department 20,

More information

Comparison of High-Speed Ray Casting on GPU

Comparison of High-Speed Ray Casting on GPU Comparison of High-Speed Ray Casting on GPU using CUDA and OpenGL November 8, 2008 NVIDIA 1,2, Andreas Weinlich 1, Holger Scherl 2, Markus Kowarschik 2 and Joachim Hornegger 1 1 Chair of Pattern Recognition

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008 Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

General Purpose Computing on Graphical Processing Units (GPGPU(

General Purpose Computing on Graphical Processing Units (GPGPU( General Purpose Computing on Graphical Processing Units (GPGPU( / GPGP /GP 2 ) By Simon J.K. Pedersen Aalborg University, Oct 2008 VGIS, Readings Course Presentation no. 7 Presentation Outline Part 1:

More information

Large Displacement Optical Flow & Applications

Large Displacement Optical Flow & Applications Large Displacement Optical Flow & Applications Narayanan Sundaram, Kurt Keutzer (Parlab) In collaboration with Thomas Brox (University of Freiburg) Michael Tao (University of California Berkeley) Parlab

More information

CS 179: GPU Programming

CS 179: GPU Programming CS 179: GPU Programming Introduction Lecture originally written by Luke Durant, Tamas Szalay, Russell McClellan What We Will Cover Programming GPUs, of course: OpenGL Shader Language (GLSL) Compute Unified

More information

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT

More information

AMS-HMI12: Assisted mobility supported by shared-control and advanced human-machine interfaces

AMS-HMI12: Assisted mobility supported by shared-control and advanced human-machine interfaces AMS-HMI12: Assisted mobility supported by shared-control and advanced human-machine interfaces RECI/EEI-AUT/0181/2012 Partners: ISR-UC (Principal Contractor), UC, APCC, IPT Period: 1/1/2013-31/12/2015

More information

Segmentation Using a Region Growing Thresholding

Segmentation Using a Region Growing Thresholding Segmentation Using a Region Growing Thresholding Matei MANCAS 1, Bernard GOSSELIN 1, Benoît MACQ 2 1 Faculté Polytechnique de Mons, Circuit Theory and Signal Processing Laboratory Bâtiment MULTITEL/TCTS

More information

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes. HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

Accelerating image registration on GPUs

Accelerating image registration on GPUs Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining

More information

Accelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms

Accelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms Accelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms Liang Men, Miaoqing Huang, John Gauch Department of Computer Science and Computer Engineering University of Arkansas {mliang,mqhuang,jgauch}@uark.edu

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

GPU programming. Dr. Bernhard Kainz

GPU programming. Dr. Bernhard Kainz GPU programming Dr. Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages GPU programming paradigms Pitfalls and best practice Reduction and tiling

More information

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods

More information

NVIDIA Parallel Nsight. Jeff Kiel

NVIDIA Parallel Nsight. Jeff Kiel NVIDIA Parallel Nsight Jeff Kiel Agenda: NVIDIA Parallel Nsight Programmable GPU Development Presenting Parallel Nsight Demo Questions/Feedback Programmable GPU Development More programmability = more

More information

Dense matching GPU implementation

Dense matching GPU implementation Dense matching GPU implementation Author: Hailong Fu. Supervisor: Prof. Dr.-Ing. Norbert Haala, Dipl. -Ing. Mathias Rothermel. Universität Stuttgart 1. Introduction Correspondence problem is an important

More information

CUDA Conference. Walter Mundt-Blum March 6th, 2008

CUDA Conference. Walter Mundt-Blum March 6th, 2008 CUDA Conference Walter Mundt-Blum March 6th, 2008 NVIDIA s Businesses Multiple Growth Engines GPU Graphics Processing Units MCP Media and Communications Processors PESG Professional Embedded & Solutions

More information

MAGMA. Matrix Algebra on GPU and Multicore Architectures

MAGMA. Matrix Algebra on GPU and Multicore Architectures MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/

More information

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. 1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 1 Structure of a Graphics Adapter Video Memory Graphics

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Numerical Algorithms on Multi-GPU Architectures

Numerical Algorithms on Multi-GPU Architectures Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications

More information

GPU Architecture. Michael Doggett Department of Computer Science Lund university

GPU Architecture. Michael Doggett Department of Computer Science Lund university GPU Architecture Michael Doggett Department of Computer Science Lund university GPUs from my time at ATI R200 Xbox360 GPU R630 R610 R770 Let s start at the beginning... Graphics Hardware before GPUs 1970s

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming

More information

gpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques

gpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques gpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques Yuta TOMATSU, Tomoyuki HIROYASU, Masato YOSHIMI, Mitsunori MIKI Graduate Student of School of Ewngineering, Faculty of Department

More information

Evaluation Of The Performance Of GPU Global Memory Coalescing

Evaluation Of The Performance Of GPU Global Memory Coalescing Evaluation Of The Performance Of GPU Global Memory Coalescing Dae-Hwan Kim Department of Computer and Information, Suwon Science College, 288 Seja-ro, Jeongnam-myun, Hwaseong-si, Gyeonggi-do, Rep. of Korea

More information

Addressing Heterogeneity in Manycore Applications

Addressing Heterogeneity in Manycore Applications Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others Final Project Demo Sign-Up: Will be posted outside my office after lecture today.

More information

How GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige

How GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige How GPUs can find your next hit: Accelerating virtual screening with OpenCL Simon Krige ACS 2013 Agenda > Background > About blazev10 > What is a GPU? > Heterogeneous computing > OpenCL: a framework for

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9 General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Parallel SIFT-detector implementation for images matching

Parallel SIFT-detector implementation for images matching Parallel SIFT-detector implementation for images matching Anton I. Vasilyev, Andrey A. Boguslavskiy, Sergey M. Sokolov Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Moscow, Russia

More information

Scientific Computing on GPUs: GPU Architecture Overview

Scientific Computing on GPUs: GPU Architecture Overview Scientific Computing on GPUs: GPU Architecture Overview Dominik Göddeke, Jakub Kurzak, Jan-Philipp Weiß, André Heidekrüger and Tim Schröder PPAM 2011 Tutorial Toruń, Poland, September 11 http://gpgpu.org/ppam11

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms Shuoxin Lin, Yanzhou Liu, William Plishker, Shuvra Bhattacharyya Maryland DSPCAD Research Group Department of

More information

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU. Basics of s Basics Introduction to Why vs CPU S. Sundar and Computing architecture August 9, 2014 1 / 70 Outline Basics of s Why vs CPU Computing architecture 1 2 3 of s 4 5 Why 6 vs CPU 7 Computing 8

More information

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited

More information

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high

More information

L10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion

L10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion L10 Layered Depth Normal Images Introduction Related Work Structured Point Representation Boolean Operations Conclusion 1 Introduction Purpose: using the computational power on GPU to speed up solid modeling

More information

GENERAL-PURPOSE COMPUTATION USING GRAPHICAL PROCESSING UNITS

GENERAL-PURPOSE COMPUTATION USING GRAPHICAL PROCESSING UNITS GENERAL-PURPOSE COMPUTATION USING GRAPHICAL PROCESSING UNITS Adrian Salazar, Texas A&M-University-Corpus Christi Faculty Advisor: Dr. Ahmed Mahdy, Texas A&M-University-Corpus Christi ABSTRACT Graphical

More information

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort rob@cs.vu.nl Schedule 2 1. Introduction, performance metrics & analysis 2. Many-core hardware 3. Cuda class 1: basics 4. Cuda class

More information

GPGPU, 4th Meeting Mordechai Butrashvily, CEO GASS Company for Advanced Supercomputing Solutions

GPGPU, 4th Meeting Mordechai Butrashvily, CEO GASS Company for Advanced Supercomputing Solutions GPGPU, 4th Meeting Mordechai Butrashvily, CEO moti@gass-ltd.co.il GASS Company for Advanced Supercomputing Solutions Agenda 3rd meeting 4th meeting Future meetings Activities All rights reserved (c) 2008

More information

Improving 3D Shape Retrieval Methods based on Bag-of Feature Approach by using Local Codebooks

Improving 3D Shape Retrieval Methods based on Bag-of Feature Approach by using Local Codebooks Improving 3D Shape Retrieval Methods based on Bag-of Feature Approach by using Local Codebooks El Wardani Dadi 1,*, El Mostafa Daoudi 1 and Claude Tadonki 2 1 University Mohammed First, Faculty of Sciences,

More information

CS516 Programming Languages and Compilers II

CS516 Programming Languages and Compilers II CS516 Programming Languages and Compilers II Zheng Zhang Spring 2015 Jan 22 Overview and GPU Programming I Rutgers University CS516 Course Information Staff Instructor: zheng zhang (eddy.zhengzhang@cs.rutgers.edu)

More information

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been

More information

Heterogenous Computing

Heterogenous Computing Heterogenous Computing Fall 2018 CS, SE - Freshman Seminar 11:00 a 11:50a Computer Architecture What are the components of a computer? How do these components work together to perform computations? How

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1 Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei

More information

Image Processing Methods Optimization by Means of GPU Computing

Image Processing Methods Optimization by Means of GPU Computing Image Processing Methods Optimization by Means of GPU Computing SKORPIL, V.*, ZIDEK, K.**, KOUBEK, T.**, LANDA, J.**, ENDRLE, P.* *Faculty of Electrical Engineering and Communication Brno University of

More information

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

More information

Performance Estimation of Parallel Face Detection Algorithm on Multi-Core Platforms

Performance Estimation of Parallel Face Detection Algorithm on Multi-Core Platforms Performance Estimation of Parallel Face Detection Algorithm on Multi-Core Platforms Subhi A. Bahudaila and Adel Sallam M. Haider Information Technology Department, Faculty of Engineering, Aden University.

More information

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN Graphics Processing Unit Accelerate the creation of images in a frame buffer intended for the output

More information

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPGPU. Peter Laurens 1st-year PhD Student, NSC GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing

More information

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the

More information

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.

More information

From Brook to CUDA. GPU Technology Conference

From Brook to CUDA. GPU Technology Conference From Brook to CUDA GPU Technology Conference A 50 Second Tutorial on GPU Programming by Ian Buck Adding two vectors in C is pretty easy for (i=0; i

More information

CONSOLE ARCHITECTURE

CONSOLE ARCHITECTURE CONSOLE ARCHITECTURE Introduction Part 1 What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design What

More information

Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing

Georgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing Real-Time Rigid id 2D-3D Medical Image Registration ti Using RapidMind Multi-Core Platform Georgia Tech/AFRL Workshop on Computational Science Challenge Using Emerging & Massively Parallel Computer Architectures

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction

More information

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture XIV International PhD Workshop OWD 2012, 20 23 October 2012 Optimal structure of face detection algorithm using GPU architecture Dmitry Pertsau, Belarusian State University of Informatics and Radioelectronics

More information

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Dirk Ribbrock Faculty of Mathematics, TU dortmund 2016 Table of Contents Why parallel

More information

USING THE GPU FOR FAST SYMMETRY-BASED DENSE STEREO MATCHING IN HIGH RESOLUTION IMAGES

USING THE GPU FOR FAST SYMMETRY-BASED DENSE STEREO MATCHING IN HIGH RESOLUTION IMAGES USING THE GPU FOR FAST SYMMETRY-BASED DENSE STEREO MATCHING IN HIGH RESOLUTION IMAGES Vasco Mota Gabriel Falcao Michel Antunes Joao Barreto Urbano Nunes Institute of Systems and Robotics, Dept. of Electr.

More information

On the Efficiency of Iterative Ordered Subset Reconstruction Algorithms for Acceleration on GPUs

On the Efficiency of Iterative Ordered Subset Reconstruction Algorithms for Acceleration on GPUs On the Efficiency of Iterative Ordered Subset Reconstruction Algorithms for Acceleration on GPUs Fang Xu 1, Klaus Mueller 1, Mel Jones 2, Bettina Keszthelyi 2, John Sedat 2, David Agard 2 1 Center for

More information

Using Graphics Chips for General Purpose Computation

Using Graphics Chips for General Purpose Computation White Paper Using Graphics Chips for General Purpose Computation Document Version 0.1 May 12, 2010 442 Northlake Blvd. Altamonte Springs, FL 32701 (407) 262-7100 TABLE OF CONTENTS 1. INTRODUCTION....1

More information

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters 1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk

More information

Introduction to CUDA (1 of n*)

Introduction to CUDA (1 of n*) Agenda Introduction to CUDA (1 of n*) GPU architecture review CUDA First of two or three dedicated classes Joseph Kider University of Pennsylvania CIS 565 - Spring 2011 * Where n is 2 or 3 Acknowledgements

More information

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng

More information

Efficient Computation of Histograms on the GPU

Efficient Computation of Histograms on the GPU Efficient Computation of Histograms on the GPU Alexander Kubias University of Koblenz-Landau Frank Deinzer Siemens Medical Solutions Dietrich Paulus University of Koblenz-Landau Matthias Kreiser Siemens

More information

Automatic Intra-Application Load Balancing for Heterogeneous Systems

Automatic Intra-Application Load Balancing for Heterogeneous Systems Automatic Intra-Application Load Balancing for Heterogeneous Systems Michael Boyer, Shuai Che, and Kevin Skadron Department of Computer Science University of Virginia Jayanth Gummaraju and Nuwan Jayasena

More information

Technology for a better society. SINTEF ICT, Applied Mathematics, Heterogeneous Computing Group

Technology for a better society. SINTEF ICT, Applied Mathematics, Heterogeneous Computing Group Technology for a better society SINTEF, Applied Mathematics, Heterogeneous Computing Group Trond Hagen GPU Computing Seminar, SINTEF Oslo, October 23, 2009 1 Agenda 12:30 Introduction and welcoming Trond

More information