Image and Video Processing on Parallel (GPU) and Heterogeneous Architectures (Multi-CPU/Multi-GPU)
|
|
- Roy Higgins
- 5 years ago
- Views:
Transcription
1 Faculté Polytechnique 2nd Workshop of COST 0805 Open Network for High-Performance Computing on Complex Environments Image and Video Processing on Parallel (GPU) and Heterogeneous Architectures (Multi-CPU/Multi-GPU) 26 Jan 2012 Sidi Ahmed Mahmoudi, Pierre Manneback Computer Science Department, Faculty of Engineering. UMONS
2 Agenda 1. Introduction 2. Context 3. Image Processing on Parallel (GPU) and Heterogeneous Architectures 4. Video Processing on GPU 5. Proposed Framework for Multimedia Processing on Heterogeneous Architectures 6. Experimental Results 7. Conclusion Université de Mons 2
3 Introduction «The number of transistors that can be placed on an integrated circuit would double every two years». Moore s law. Effectively, the CPU power has doubled every 18 months till 2008 This law has no more been respected in recent years for thermal reasons: CPU power capped at 4 GHz. Solution: multiplication of computing units in CPU (many-cores) Large number on processing units on GPU: initially used in 3D and video games. Birth of GPGPU: General Purpose Graphic Processing Unit. The use of GPU to perform tasks habitually performed by the CPU. Université de Mons 3
4 CPU INTRODUCTION CONTEXT IMAGE PROCESSING ON HETEROGENEOUS ARCHITECTURES VIDEO PROCESSING ON GPU PROPOSED FRAMEWORK EXPERIMENTATIONS CONCLUSION Control DRAM ALU ALU Cache ALU ALU Context Hardware High computing power of GPUs. GPU Heterogeneous architectures: Multi-CPU/Multi-GPU. DRAM Applications Intensive processing of multimedia objects (images, videos, etc.). Platform Multi -CPU-GPU High intensity : Large volumes of multimedia objects (HD, Full HD, etc.). Constraints Transfer time between CPU and GPU memories. Adapted selection of the computing units (CPU or/and GPU) for processing. Complex management of heterogeneous architectures. Objectives Efficient multimedia processing on heterogeneous architectures (Multi-CPU/Multi-GPU). Efficient Selection of the computing units depending on the type of media to process. Université de Mons 4
5 GPU Programming Brook GPU : Since ATI Stream : for ATI cards. DirectX 11, OpenGL : GPGPU shaders. ATI Radeon-HD-4770 OpenCL : all types of GPUs. CUDA : for nvidia cards. Université de Mons 5 NVIDIA-GTX-590
6 GPU Programming: Runtimes for heterogeneous platforms 1. StarPU 2. StarSs Developed in LABRI laboratory. Bordeaux. France. Exploitation of the full computing power of machines (multi-cpu-gpu). Efficient Scheduling strategies. Developed in the university of Cataluña. Spain. Flexible programming model for multicores. Based mainly on: CPUSS: for multicore programming. GPUSS: for multi-gpu programming. 3. Grand Central Dispatch Developed by Apple, released for Mac systems. Optimize application support for systems with multi-core processors Université de Mons 6
7 Image Processing on GPU Image processing fits naturally for data parallel processing - pixels can be mapped directly to threads - lots of data are shared between pixels - high resolution images require intensive computing Advantage of CUDA and pixel shader for based image processing CUDA supports sharing images data with OpenGL and Direct3D applications Université de Mons 7
8 Image Processing on GPU Case 1: Single Image : OpenGL Visualization (without transfer CPU-GPU). Case 2: Set of images: Storing results on CPU memory ( with transfer CPU-GPU). Université de Mons 8
9 Image Processing on GPU 1. Classic image processing methods: Geometrical transformations(rotation, translation, etc.) Parallel processing between image pixels GPU Acceleration ranging from 10x to 40x compared to CPU Input Image 2. Corner Detection on GPU: Preliminary step for many algorithms of computer vision GPU implementation based on Harris and Bouguet technics Efficiency: invariance to rotation, brightness, scale, etc. Detected Contours on GPU 3. Edge Detection contours on GPU: GPU Implementation based on Deriche-Canny method Efficiency: robustness to noise, reduced number of operations. Good quality detected contours. Université de Mons 9 Detected Corners on GPU
10 Image Processing on GPU Case 1: Single Image (OpenGL Visualization) Image Resolution Corner Detection (CPU) Corner Detection (GPU) Speedup 512* ms ms * ms ms * ms ms * ms ms Corner Detection using Harris Detector Image Resolution Edge Detection (CPU) Edge Detection (GPU) Speedup 512* ms ms * ms ms * ms ms * ms ms Edge Detection using Deriche-Canny Method Case 2: Set of Image (Resolution: 1476*1680) Images Number Corner Detection (CPU) Corner Detection (GPU) Speedup s 0.48 s s 1.35 s s 2.60 s s 4.29 s 6.11 Corner Detection using Harris Detector Images Number Edge Detection (CPU) Edge Detection (GPU) Speedup s 0.40 s s 0.98 s s 1.80 s s 3.43 s 8.51 Edge Detection using Deriche-Canny Method Université de Mons 10
11 Multiple Image Processing on Heterogeneous Platforms (Multi-CPU/Multi-GPU) Optimization : - GPU streaming Technic: overlap kernel execution with device and host memory copies. - Streaming within multiple GPUs allowed improving performances about 25% in case of data-base medical image processing solution (Experimental results). Université de Mons 11
12 Video Processing on GPU in Real Time CPU GPU Video (Real Time) Imagei (i <=N) on GPU Imagei (i <=N) on CPU CUDA Parallel Processing OpenGL Visualisation i = i +1 Video End NO END YES Université de Mons 12
13 Video Processing on GPU in Real Time Background Subtraction on GPU Point of Interest Detection on GPU Performance: Background Subtraction on GPU Performance :Corner Detection on GPU Université de Mons 13
14 Video Processing on GPU in Real Time CPU Dual Core GPU GTX 280 GPU Tesla C2070(Fermi) 80 FPS x x x x1080 Video Resolution Performance: Optical flow Computation on GPU Université de Mons 14
15 Proposed Framework for Heterogeneous Multimedia Processing In case of heterogeneous computing, we use a scheduling strategy which gives priority to GPU for high intensive tasks and to CPU for less intensive tasks. Université de Mons 15
16 Use Case 1: Vertebra Segmentation Extraction of mean shape models of vertebrae Set of medical images Histogram Equalization (Improve contrast) Edge Detection Corner Detection Selection of vertebrae corners Extraction of real vertebrae (ASM) CPU treatments Hybrid Processing Multi-CPU/Multi-GPU CPU treatments Heterogeneous Computing for Vertebra Segmentation Université de Mons 16
17 Use Case 2: Video Indexation VideoCycle: Indexation of video sequences based on features extraction: Silhouette. Areas mouvements. Contours. Hu Moments. Hybrid detection of contours Hu moments extracted from edges Université de Mons 17
18 EXPERIMENTAL RESULTS: PERFORMANCE Speedup GPU 1GPU-2CPU 2-GPU 2GPU-4CPU 4GPU 4GPU-8CPU 0 512x x x x3936 Image Resolution Performance of edge + corner detection on heterogeneous architectures Note: The use of GPU Streaming technic allowed improving performances about 25%. Université de Mons 18
19 CONCLUSION The proposed framework allows parallel treatments of images at two levels: Low level: parallel processing on GPU between pixels in image (intra-image parallel processing) High Level: Simultaneous exploitation of both CPUs and GPUs cores (inter-image parallel processing) GPU treatments for High Definition video processing on real time : CUDA treatments and OpenGL visualization. Use of CUDA streaming technic in order to overlap transfers with computations. Current works : Computation of the intensity factor of each algorithm based on different parameter (number of operations, number of memory access, dependency factor, etc.) Efficient Selection of resources (CPU or/ and GPU) for a full exploitation of heterogeneous architectures. Université de Mons 19
20 Future Works Future Works : A general framework enabling an automatic selection of resources (CPU or/and GPU) depending of the intensity of image (single or multiple) and video processing applications. Exploitation of SDI capture (Input/Output) cards for real time video processing exploiting multiple outputs simultaneously. Quadro SDI Capture Card Full integrated GPU-based solution for real-time video processing Université de Mons 20
21 Regular Papers in Journals PUBLICATIONS F. Lecron, S. A. Mahmoudi, M. Benjelloun, S. Mahmoudi and P. Manneback "Heerogenous Computing for Vertebra Detection and Segmentation in X-Ray Images", International Journal of Biomedical Imaging : Parallel Computation in Medical Imaging Applications. Juin S. A. Mahmoudi, P. Manneback, C. Augonnet, S. Thibault «Traitement d Images sur Architectures Parallèles et Hétérogènes», Revue des sciences et technologies de l'information. In submission ( Submitted on 16/09/2011). International Conferences and Workshops S. A. Mahmoudi, P. Manneback, C. Augonnet, S. Thibault "Détection optimale des coins et contours dans des bases d'images volumineuses sur architectures multi-cœurs hétérogènes", 20eme Rencontres Francophones du Parallélisme, RenPar'20, Saint-Malo, France. Mai S. A. Mahmoudi, S. Frémal, M. Bagein, P. Manneback, "Calcul intensif sur GPU : exemples en traitement d'images, en bio-informatique et en télécommunication", CIAE 2011 : Colloque d'informatique, Automatique et Electronique, Casablanca, Maroc. Mars S. A. Mahmoudi, F. Lecron, P. Manneback, M. Benjelloun, S. Mahmoudi, "GPU-Based Segmentation of Cervical Vertebra in X-Ray Images", Workshop HPCCE. IEEE International Conference on Cluster Computing, Crete, Greece. Septembre S. A. Mahmoudi, P. Manneback, "Parallel Image Processing with CUDA and OpenGL", Network for High-Performance Computing on Complex Environments. Lisbon, Portugal. COST ACTION IC 805, WG Meeting. October S. A. Mahmoudi, P. Manneback, "Traitements d'images sur GPU sous CUDA et OpenGL : application a l'imagerie médicale", Journées CIGIL : Calcul Intensif et Grilles Informatiques a Lille. Lille, France. December S. A. Mahmoudi, Pierre Manneback, «Traitement d'objets multimédias sur gpu", Seconde journée scientifique du pôle hainuyer. Belgique, Mai Technical Reports: S. Dupont, C. Frisson, S. A. Mahmoudi, X. Siebert, J. Urbain, T. Ravet, "MediaBlender : Interactive Multimedia Segmentation and Annotation", QPSR of the numediart research program, volume 3, December M. Mancas, R. B. Madkhour, S. A. Mahmoudi, T. Ravet, "VirTrack: Tracking for Virtual Studios", QPSR of the numediart research program, volume 3, N 1, pp. 1-4, March M. Mancas, J. Tilmanne, R. Chessini, S. Hidot, C. Machy, S. A. Mahmoudi, T. Ravet, "MATRIX : Natural Interaction Between Real and Virtual Worlds", QPRS of the numediart research program, vol. 1, N 5, January M. Mancas, M. Bagein, N. Guichard, S. Hidot, C. Machy, S. A. Mahmoudi, X. Siebert, "AVS : Augmented Virtual Studio", QPSR of the numediart research program, Vol. 1, No. 4, December Université de Mons 21
22 BIBLIOGRAPHIE [NVIDIA2010] nvidia Corporation, nvidia cuda programming guide version 3.2. in : Cuda zone, [Online]. Available: develop.html [Cedric2009] C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, In Concurrency and Computation: Practice and Experience, Euro-Par 2009, best papers issue, pp , [Eduard2009] ] Eduard Ayguadé et al. An Extension of the StarSs Programming Model for Platforms with Multiple GPUs. In Euro-Par 09: Proceedings of the 15th International Euro-Par Conference on Parallel Processing, pages , Berlin, Heidelberg, [Grand2009] Apple, Grand Central Dispatch. A better way to do multicore," [Online]. Available : [Yang2008] Z. Yang, Y. Zhu, and Y. pu, Parallel Image Processing Based on CUDA," International Conference on Computer Science and Software Engineering. China, pp , [OpenVIDIA2005] J. Fung et al. OpenVIDIA :Parallel gpu computer vision," In Proc of ACM Multimedia, pp , [Heng2005] Y. Heng and L. GPU-based Volume Rendering for Medical Image Visualization, Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference Shanghai, China, pp , [Schiwietz2006] T. Schiwietz, T. Chang, P. Speier, and R. Westermann, MR image reconstruction using the GPU," Image-Guided Procedures, and Display. Proceedings of the SPIE, pp , [Sinha2006] S. N. Sinha, J. M. Frahm, M. Pollefeys, and Y. Genc, GPU-based video feature tracking and matching," Workshop on Edge Computing Using New Commodity Architectures (EDGE 2006), Chapel Hill, [Midhun2008] M. Midhun, K. C. Neethu, and J. Preetha, Real-time face tracking with GPU acceleration," High Performance Computing Group, Network Systems and Technologies(P) Ltd, [Sundaram2010] N. Sundaram, T. Brox, and K. Keutzer, Dense point trajectories by gpu-accelerated large displacement optical flow," [Online]. Available : http: // Université de Mons 22
23 Université de Mons 23
Faculté Polytechnique
Faculté Polytechnique INFORMATIQUE PARALLÈLE ET DISTRIBUÉE CHAPTER 7 : MULTI-CPU/MULTI-GPU PROCESSING APPLICATION FOR IMAGE AND VIDEO PROCESSING Sidi Ahmed Mahmoudi sidi.mahmoudi@umons.ac.be 13 December
More informationMulti-CPU/Multi-GPU Based Framework for Multimedia Processing
Multi-CPU/Multi-GPU Based Framework for Multimedia Processing Sidi Mahmoudi, Pierre Manneback To cite this version: Sidi Mahmoudi, Pierre Manneback. Multi-CPU/Multi-GPU Based Framework for Multimedia Processing.
More informationFaculté Polytechnique
Faculté Polytechnique MULTIMEDIA RETRIEVAL & INDEXATION CHAPTER 7 : CLOUD & GPU FOR MULTIMEDIA RETRIEVAL Sidi Ahmed Mahmoudi sidi.mahmoudi@umons.ac.be 19 December 2017 PLAN Introduction I. Multimedia retrieval
More informationPerformance Analysis of Sobel Edge Detection Filter on GPU using CUDA & OpenGL
Performance Analysis of Sobel Edge Detection Filter on GPU using CUDA & OpenGL Ms. Khyati Shah Assistant Professor, Computer Engineering Department VIER-kotambi, INDIA khyati30@gmail.com Abstract: CUDA(Compute
More informationSemi-Automatic Detection of Cervical Vertebrae in X-ray Images Using Generalized Hough Transform
Semi-Automatic Detection of Cervical Vertebrae in X-ray Images Using Generalized Hough Transform Mohamed Amine LARHMAM, Saïd MAHMOUDI and Mohammed BENJELLOUN Faculty of Engineering, University of Mons,
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationImproving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine
Improving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine Samuel Cremer 1,2, Michel Bagein 1, Saïd Mahmoudi 1, Pierre Manneback 1 1 UMONS, University of Mons Computer Science
More informationGPU for HPC. October 2010
GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,
More informationNeural Network Implementation using CUDA and OpenMP
Neural Network Implementation using CUDA and OpenMP Honghoon Jang, Anjin Park, Keechul Jung Department of Digital Media, College of Information Science, Soongsil University {rollco82,anjin,kcjung}@ssu.ac.kr
More informationReal-Time Scene Reconstruction. Remington Gong Benjamin Harris Iuri Prilepov
Real-Time Scene Reconstruction Remington Gong Benjamin Harris Iuri Prilepov June 10, 2010 Abstract This report discusses the implementation of a real-time system for scene reconstruction. Algorithms for
More informationG P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G
Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty
More informationAn Extension of the StarSs Programming Model for Platforms with Multiple GPUs
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs Eduard Ayguadé 2 Rosa M. Badia 2 Francisco Igual 1 Jesús Labarta 2 Rafael Mayo 1 Enrique S. Quintana-Ortí 1 1 Departamento
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied
More informationA cache-aware performance prediction framework for GPGPU computations
A cache-aware performance prediction framework for GPGPU computations The 8th Workshop on UnConventional High Performance Computing 215 Alexander Pöppl, Alexander Herz August 24th, 215 UCHPC 215, August
More informationStatic Scene Reconstruction
GPU supported Real-Time Scene Reconstruction with a Single Camera Jan-Michael Frahm, 3D Computer Vision group, University of North Carolina at Chapel Hill Static Scene Reconstruction 1 Capture on campus
More informationCommunication Library to Overlap Computation and Communication for OpenCL Application
Communication Library to Overlap Computation and Communication for OpenCL Application Toshiya Komoda, Shinobu Miwa, Hiroshi Nakamura Univ.Tokyo What is today s talk about? Heterogeneous Computing System
More informationGPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3
/CPU,a),2,2 2,2 Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 XMP XMP-dev CPU XMP-dev/StarPU XMP-dev XMP CPU StarPU CPU /CPU XMP-dev/StarPU N /CPU CPU. Graphics Processing Unit GP General-Purpose
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationCUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging
CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction
More informationFully Automatic Vertebra Detection in X-Ray Images Based on Multi-Class SVM
Fully Automatic Vertebra Detection in X-Ray Images Based on Multi-Class SVM Fabian Lecron, Mohammed Benjelloun, Saïd Mahmoudi University of Mons, Faculty of Engineering, Computer Science Department 20,
More informationComparison of High-Speed Ray Casting on GPU
Comparison of High-Speed Ray Casting on GPU using CUDA and OpenGL November 8, 2008 NVIDIA 1,2, Andreas Weinlich 1, Holger Scherl 2, Markus Kowarschik 2 and Joachim Hornegger 1 1 Chair of Pattern Recognition
More informationCS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter
More informationChallenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008
Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationGeneral Purpose Computing on Graphical Processing Units (GPGPU(
General Purpose Computing on Graphical Processing Units (GPGPU( / GPGP /GP 2 ) By Simon J.K. Pedersen Aalborg University, Oct 2008 VGIS, Readings Course Presentation no. 7 Presentation Outline Part 1:
More informationLarge Displacement Optical Flow & Applications
Large Displacement Optical Flow & Applications Narayanan Sundaram, Kurt Keutzer (Parlab) In collaboration with Thomas Brox (University of Freiburg) Michael Tao (University of California Berkeley) Parlab
More informationCS 179: GPU Programming
CS 179: GPU Programming Introduction Lecture originally written by Luke Durant, Tamas Szalay, Russell McClellan What We Will Cover Programming GPUs, of course: OpenGL Shader Language (GLSL) Compute Unified
More informationTHE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS
Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT
More informationAMS-HMI12: Assisted mobility supported by shared-control and advanced human-machine interfaces
AMS-HMI12: Assisted mobility supported by shared-control and advanced human-machine interfaces RECI/EEI-AUT/0181/2012 Partners: ISR-UC (Principal Contractor), UC, APCC, IPT Period: 1/1/2013-31/12/2015
More informationSegmentation Using a Region Growing Thresholding
Segmentation Using a Region Growing Thresholding Matei MANCAS 1, Bernard GOSSELIN 1, Benoît MACQ 2 1 Faculté Polytechnique de Mons, Circuit Theory and Signal Processing Laboratory Bâtiment MULTITEL/TCTS
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationThreading Hardware in G80
ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationAccelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms
Accelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms Liang Men, Miaoqing Huang, John Gauch Department of Computer Science and Computer Engineering University of Arkansas {mliang,mqhuang,jgauch}@uark.edu
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationGPU programming. Dr. Bernhard Kainz
GPU programming Dr. Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages GPU programming paradigms Pitfalls and best practice Reduction and tiling
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationNVIDIA Parallel Nsight. Jeff Kiel
NVIDIA Parallel Nsight Jeff Kiel Agenda: NVIDIA Parallel Nsight Programmable GPU Development Presenting Parallel Nsight Demo Questions/Feedback Programmable GPU Development More programmability = more
More informationDense matching GPU implementation
Dense matching GPU implementation Author: Hailong Fu. Supervisor: Prof. Dr.-Ing. Norbert Haala, Dipl. -Ing. Mathias Rothermel. Universität Stuttgart 1. Introduction Correspondence problem is an important
More informationCUDA Conference. Walter Mundt-Blum March 6th, 2008
CUDA Conference Walter Mundt-Blum March 6th, 2008 NVIDIA s Businesses Multiple Growth Engines GPU Graphics Processing Units MCP Media and Communications Processors PESG Professional Embedded & Solutions
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures
MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/
More information1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.
1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 1 Structure of a Graphics Adapter Video Memory Graphics
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationNumerical Algorithms on Multi-GPU Architectures
Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications
More informationGPU Architecture. Michael Doggett Department of Computer Science Lund university
GPU Architecture Michael Doggett Department of Computer Science Lund university GPUs from my time at ATI R200 Xbox360 GPU R630 R610 R770 Let s start at the beginning... Graphics Hardware before GPUs 1970s
More informationIntroduction to Multicore Programming
Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming
More informationgpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques
gpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques Yuta TOMATSU, Tomoyuki HIROYASU, Masato YOSHIMI, Mitsunori MIKI Graduate Student of School of Ewngineering, Faculty of Department
More informationEvaluation Of The Performance Of GPU Global Memory Coalescing
Evaluation Of The Performance Of GPU Global Memory Coalescing Dae-Hwan Kim Department of Computer and Information, Suwon Science College, 288 Seja-ro, Jeongnam-myun, Hwaseong-si, Gyeonggi-do, Rep. of Korea
More informationAddressing Heterogeneity in Manycore Applications
Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationGPU Programming Using NVIDIA CUDA
GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics
More informationWhat Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others
What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others Final Project Demo Sign-Up: Will be posted outside my office after lecture today.
More informationHow GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige
How GPUs can find your next hit: Accelerating virtual screening with OpenCL Simon Krige ACS 2013 Agenda > Background > About blazev10 > What is a GPU? > Heterogeneous computing > OpenCL: a framework for
More informationGeneral Purpose GPU Programming. Advanced Operating Systems Tutorial 9
General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous
More informationParallel SIFT-detector implementation for images matching
Parallel SIFT-detector implementation for images matching Anton I. Vasilyev, Andrey A. Boguslavskiy, Sergey M. Sokolov Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Moscow, Russia
More informationScientific Computing on GPUs: GPU Architecture Overview
Scientific Computing on GPUs: GPU Architecture Overview Dominik Göddeke, Jakub Kurzak, Jan-Philipp Weiß, André Heidekrüger and Tim Schröder PPAM 2011 Tutorial Toruń, Poland, September 11 http://gpgpu.org/ppam11
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
More informationA Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms
A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms Shuoxin Lin, Yanzhou Liu, William Plishker, Shuvra Bhattacharyya Maryland DSPCAD Research Group Department of
More informationGPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.
Basics of s Basics Introduction to Why vs CPU S. Sundar and Computing architecture August 9, 2014 1 / 70 Outline Basics of s Why vs CPU Computing architecture 1 2 3 of s 4 5 Why 6 vs CPU 7 Computing 8
More informationGraphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university
Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited
More informationGraphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics
Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high
More informationL10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion
L10 Layered Depth Normal Images Introduction Related Work Structured Point Representation Boolean Operations Conclusion 1 Introduction Purpose: using the computational power on GPU to speed up solid modeling
More informationGENERAL-PURPOSE COMPUTATION USING GRAPHICAL PROCESSING UNITS
GENERAL-PURPOSE COMPUTATION USING GRAPHICAL PROCESSING UNITS Adrian Salazar, Texas A&M-University-Corpus Christi Faculty Advisor: Dr. Ahmed Mahdy, Texas A&M-University-Corpus Christi ABSTRACT Graphical
More informationPARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort
PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort rob@cs.vu.nl Schedule 2 1. Introduction, performance metrics & analysis 2. Many-core hardware 3. Cuda class 1: basics 4. Cuda class
More informationGPGPU, 4th Meeting Mordechai Butrashvily, CEO GASS Company for Advanced Supercomputing Solutions
GPGPU, 4th Meeting Mordechai Butrashvily, CEO moti@gass-ltd.co.il GASS Company for Advanced Supercomputing Solutions Agenda 3rd meeting 4th meeting Future meetings Activities All rights reserved (c) 2008
More informationImproving 3D Shape Retrieval Methods based on Bag-of Feature Approach by using Local Codebooks
Improving 3D Shape Retrieval Methods based on Bag-of Feature Approach by using Local Codebooks El Wardani Dadi 1,*, El Mostafa Daoudi 1 and Claude Tadonki 2 1 University Mohammed First, Faculty of Sciences,
More informationCS516 Programming Languages and Compilers II
CS516 Programming Languages and Compilers II Zheng Zhang Spring 2015 Jan 22 Overview and GPU Programming I Rutgers University CS516 Course Information Staff Instructor: zheng zhang (eddy.zhengzhang@cs.rutgers.edu)
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More informationHeterogenous Computing
Heterogenous Computing Fall 2018 CS, SE - Freshman Seminar 11:00 a 11:50a Computer Architecture What are the components of a computer? How do these components work together to perform computations? How
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationLecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1
Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei
More informationImage Processing Methods Optimization by Means of GPU Computing
Image Processing Methods Optimization by Means of GPU Computing SKORPIL, V.*, ZIDEK, K.**, KOUBEK, T.**, LANDA, J.**, ENDRLE, P.* *Faculty of Electrical Engineering and Communication Brno University of
More informationEvaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi
Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
More informationPerformance Estimation of Parallel Face Detection Algorithm on Multi-Core Platforms
Performance Estimation of Parallel Face Detection Algorithm on Multi-Core Platforms Subhi A. Bahudaila and Adel Sallam M. Haider Information Technology Department, Faculty of Engineering, Aden University.
More informationCMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN
CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN Graphics Processing Unit Accelerate the creation of images in a frame buffer intended for the output
More informationGPGPU. Peter Laurens 1st-year PhD Student, NSC
GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing
More informationREDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS
BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the
More informationReal-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010
1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.
More informationFrom Brook to CUDA. GPU Technology Conference
From Brook to CUDA GPU Technology Conference A 50 Second Tutorial on GPU Programming by Ian Buck Adding two vectors in C is pretty easy for (i=0; i
More informationCONSOLE ARCHITECTURE
CONSOLE ARCHITECTURE Introduction Part 1 What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design What
More informationGeorgia Institute of Technology, August 17, Justin W. L. Wan. Canada Research Chair in Scientific Computing
Real-Time Rigid id 2D-3D Medical Image Registration ti Using RapidMind Multi-Core Platform Georgia Tech/AFRL Workshop on Computational Science Challenge Using Emerging & Massively Parallel Computer Architectures
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction
More informationXIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture
XIV International PhD Workshop OWD 2012, 20 23 October 2012 Optimal structure of face detection algorithm using GPU architecture Dmitry Pertsau, Belarusian State University of Informatics and Radioelectronics
More informationIntroduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model
Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Dirk Ribbrock Faculty of Mathematics, TU dortmund 2016 Table of Contents Why parallel
More informationUSING THE GPU FOR FAST SYMMETRY-BASED DENSE STEREO MATCHING IN HIGH RESOLUTION IMAGES
USING THE GPU FOR FAST SYMMETRY-BASED DENSE STEREO MATCHING IN HIGH RESOLUTION IMAGES Vasco Mota Gabriel Falcao Michel Antunes Joao Barreto Urbano Nunes Institute of Systems and Robotics, Dept. of Electr.
More informationOn the Efficiency of Iterative Ordered Subset Reconstruction Algorithms for Acceleration on GPUs
On the Efficiency of Iterative Ordered Subset Reconstruction Algorithms for Acceleration on GPUs Fang Xu 1, Klaus Mueller 1, Mel Jones 2, Bettina Keszthelyi 2, John Sedat 2, David Agard 2 1 Center for
More informationUsing Graphics Chips for General Purpose Computation
White Paper Using Graphics Chips for General Purpose Computation Document Version 0.1 May 12, 2010 442 Northlake Blvd. Altamonte Springs, FL 32701 (407) 262-7100 TABLE OF CONTENTS 1. INTRODUCTION....1
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationIntroduction to CUDA (1 of n*)
Agenda Introduction to CUDA (1 of n*) GPU architecture review CUDA First of two or three dedicated classes Joseph Kider University of Pennsylvania CIS 565 - Spring 2011 * Where n is 2 or 3 Acknowledgements
More informationad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng
More informationEfficient Computation of Histograms on the GPU
Efficient Computation of Histograms on the GPU Alexander Kubias University of Koblenz-Landau Frank Deinzer Siemens Medical Solutions Dietrich Paulus University of Koblenz-Landau Matthias Kreiser Siemens
More informationAutomatic Intra-Application Load Balancing for Heterogeneous Systems
Automatic Intra-Application Load Balancing for Heterogeneous Systems Michael Boyer, Shuai Che, and Kevin Skadron Department of Computer Science University of Virginia Jayanth Gummaraju and Nuwan Jayasena
More informationTechnology for a better society. SINTEF ICT, Applied Mathematics, Heterogeneous Computing Group
Technology for a better society SINTEF, Applied Mathematics, Heterogeneous Computing Group Trond Hagen GPU Computing Seminar, SINTEF Oslo, October 23, 2009 1 Agenda 12:30 Introduction and welcoming Trond
More information