A GPU-Enabled Rapid Image Processing Framework

Size: px
Start display at page:

Download "A GPU-Enabled Rapid Image Processing Framework"

Transcription

1 A GPU-Enabled Rapid Image Processing Framework Mark Davey Lead HPC Engineer

2 The Foundry We develop, market and sell NUKE - Compositing MARI - 3D Texture Painting HIERO - Shot Management KATANA - Look-Development and Lighting MODO - 3D Modelling and Rendering

3 Who uses our software? Well established global client base: Film Animation Commercial advertising Broadcast Aardman Animations BaseFX Studios BlueBolt Blue Sky Studios Brainstorm Digital CoSA VFX Digital Domain Digital Fusion Double Negative Dreamworks The Embassy Fido Framestore Igloo VFX Industrial Light and Magic Jellyfish Look Effects Lucasfilm Entertainment March Entertainment Method Studios Mikros Image The Moving Picture Company Mr. X Passion Pictures Pixomondo Prime Focus Rushes Smoke and Mirrors Sony Pictures Imageworks Sony Pictures Animation The Mill Township Unexpected Union VFX Walt Disney Animation Warner Bros. Animation Weta Digital ZOIC Studios

4 At the heart - 2D Image Processing A fundamental component of our products Used in effects such as: Noise reduction Keying Motion and disparity estimation Colour correction 3D texture creation We need to make it as fast as possible!

5 Moving to GPUs Traditionally used the CPU for image processing Lots of legacy code GPUs are great at image processing Our customers often have GPUs, but not always (e.g. on render farms) So need a CPU path Do not want to write same code multiple times (debugging, maintenance, new hardware, etc)

6 Solution: A Rapid Image Processing Framework (RIP) Image processing algorithms expressed as kernels Kernels written in a C++ like, domain-specific language Kernels run over an iteration space Metadata expresses access patterns, image formats, bounday conditions, etc Kernels are converted to an Abstract Syntax Tree (AST) The AST is translated into different languages

7 Example Kernel class GainImage : Kernel<eComponentWise> { param: Image<eRead, epoint> src; Image<eWrite, epoint> dst; float gain; local: void define() { defineparam(gain, "gain", 1.0f); } void kernel() { dst() = src() * gain; } };

8 Example Kernel class GainImage : Kernel<eComponentWise> { param: Image<eRead, epoint> src; Image<eWrite, epoint> dst; float gain; local: void define() { defineparam(gain, "gain", 1.0f); } void kernel() { dst() = src() * gain; } };

9 Example Kernel class GainImage : Kernel<eComponentWise> { param: Image<eRead, epoint> src; Image<eWrite, epoint> dst; float gain; local: void define() { defineparam(gain, "gain", 1.0f); } void kernel() { dst() = src() * gain; } };

10 Kernel Types Iteration Independently run at every point in an iteration space. Example: Gain Rolling Run over iteration space in order, providing access to results from the previous point along an axis Example: Box blur Reduction Image data is reduced down to a single value example: Maximum pixel value

11 Kernel Metadata Granularity - Component Pixel Image access - Read Write Memory Access - Point Ranged Random Edge Methods - None Clamped Constant

12 Current Language Support CUDA OpenCL C++ Scalar C++ SIMD (SSE2, SSE4.1, AVX)

13 Current RIP-Based Effects Denoise Video Retimer Convolution Depth-Based Blur Motion Blur Vector Generator

14 Run-time RIP Generate kernels at run time (JIT) for specific image formats and data types Profile machine to determine best language options to use CPU kernels compiled using LLVM GPU kernels currently translated to OpenCL

15 Example - Denoise Proprietary Wavelet-Based Algorithm Requires 20+ Kernels Tunable parameters for best results Must run at interactive speeds Legacy CPU plug-in too slow

16 Denoise - Tests GPUs CPUs Quadro FX 3800M 4 SM Core i7m 2 Core + HT Quadro K600 1 SMX Core i7-3667u 2 Core + HT Quadro SM Xeon E Core Quadro SM Xeon X Core + HT Quadro K SMX Xeon E Core + HT Geforce GTX SM Xeon E Core + HT Tesla K20 13 SMX Image: 1920x1080 (1080p) RGB, 32-bit float

17

18

19 The RIP Node - Fast R&D Develop kernels at run-time within our software using the RIP language No other development tools required Automatically creates parameter sliders via kernel introspection Use graph of nodes to create complex algorithms Great for rapid research and development

20 Example - Directional Blur CPU: Xeon X5550 GPU: Quadro 6000 Legacy CPU: 227s GPU RIP: 5.6s (40 times faster) GPU RIP Pixel: 3.0s (75 times faster)

21 GPU Image Processing - Issues Memory is finite and limited Our software supports very large images Not always possible to process a whole image on a GPU Point, and ranged access processing is tiled Relatively long transfer times Try to keep intermediate data on GPU as long as possible We are working on better caching

22 The Future - Highlights Beyond 2D image processing 3D data Deep data Arrays of structures Heterogeneous computing Use all available devices Efficient scheduling Minimise data transfers Unified kernel results Greater GPU optimisation Better caching Better use of Kepler architecture

23 Questions?

Accelerating high-end compositing with CUDA in NUKE. Jon Wadelton NUKE Product Manager

Accelerating high-end compositing with CUDA in NUKE. Jon Wadelton NUKE Product Manager Accelerating high-end compositing with CUDA in NUKE Jon Wadelton NUKE Product Manager 2 Overview What is NUKE? Image processing - exploiting the GPU The Foundry Approach Simple examples in NUKEX Real world

More information

Manage shots with a scriptable timeline for a collaborative VFX workflow.

Manage shots with a scriptable timeline for a collaborative VFX workflow. HIERO 1.0 Manage shots with a scriptable timeline for a collaborative VFX workflow. HIERO enables VFX artists to manage shots more effectively by conforming them into a timeline, playing back, and then

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes. HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation

More information

Scheduling Image Processing Pipelines

Scheduling Image Processing Pipelines Lecture 15: Scheduling Image Processing Pipelines Visual Computing Systems Simple image processing kernel int WIDTH = 1024; int HEIGHT = 1024; float input[width * HEIGHT]; float output[width * HEIGHT];

More information

A Cross-Input Adaptive Framework for GPU Program Optimizations

A Cross-Input Adaptive Framework for GPU Program Optimizations A Cross-Input Adaptive Framework for GPU Program Optimizations Yixun Liu, Eddy Z. Zhang, Xipeng Shen Computer Science Department The College of William & Mary Outline GPU overview G-Adapt Framework Evaluation

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

Programmable Graphics Hardware (GPU) A Primer

Programmable Graphics Hardware (GPU) A Primer Programmable Graphics Hardware (GPU) A Primer Klaus Mueller Stony Brook University Computer Science Department Parallel Computing Explained video Parallel Computing Explained Any questions? Parallelism

More information

GPU-ACCELERATED RENDERING THAT KEEPS UP WITH THE CREATIVE PROCESS

GPU-ACCELERATED RENDERING THAT KEEPS UP WITH THE CREATIVE PROCESS GPU-ACCELERATED RENDERING THAT KEEPS UP WITH THE CREATIVE PROCESS Nike USA Inc. NABD / Aixsponza Aixsponza renders larger models faster than ever thanks to the power of NVIDIA Quadro GP100 GPUs. BRINGING

More information

The Future of #GPU Rendering #GTC17 #Octane

The Future of #GPU Rendering #GTC17 #Octane The Future of #GPU Rendering #GTC17 #Octane OTOY Inc. May 2017 OTOY s Mission: Practical digital holographic* content creation and publishing for everyone *(Digital Hologram: 8D light field volume + depth

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU

More information

NVIDIA. Interacting with Particle Simulation in Maya using CUDA & Maximus. Wil Braithwaite NVIDIA Applied Engineering Digital Film

NVIDIA. Interacting with Particle Simulation in Maya using CUDA & Maximus. Wil Braithwaite NVIDIA Applied Engineering Digital Film NVIDIA Interacting with Particle Simulation in Maya using CUDA & Maximus Wil Braithwaite NVIDIA Applied Engineering Digital Film Some particle milestones FX Rendering Physics 1982 - First CG particle FX

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

JCudaMP: OpenMP/Java on CUDA

JCudaMP: OpenMP/Java on CUDA JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems

More information

Mathematical computations with GPUs

Mathematical computations with GPUs Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing

More information

3D Production Pipeline

3D Production Pipeline Overview 3D Production Pipeline Story Character Design Art Direction Storyboarding Vocal Tracks 3D Animatics Modeling Animation Rendering Effects Compositing Basics : OpenGL, transformation Modeling :

More information

OpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer

OpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer OpenACC Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer 3 WAYS TO ACCELERATE APPLICATIONS Applications Libraries Compiler Directives Programming Languages Easy to use Most Performance

More information

The Role of Standards in Heterogeneous Programming

The Role of Standards in Heterogeneous Programming The Role of Standards in Heterogeneous Programming Multi-core Challenge Bristol UWE 45 York Place, Edinburgh EH1 3HP June 12th, 2013 Codeplay Software Ltd. Incorporated in 1999 Based in Edinburgh, Scotland

More information

Shaders. Slide credit to Prof. Zwicker

Shaders. Slide credit to Prof. Zwicker Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?

More information

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1 Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei

More information

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay Introduction to CUDA Lecture originally by Luke Durant and Tamas Szalay Today CUDA - Why CUDA? - Overview of CUDA architecture - Dense matrix multiplication with CUDA 2 Shader GPGPU - Before current generation,

More information

CUDA (Compute Unified Device Architecture)

CUDA (Compute Unified Device Architecture) CUDA (Compute Unified Device Architecture) Mike Bailey History of GPU Performance vs. CPU Performance GFLOPS Source: NVIDIA G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Memory Issues in CUDA Execution Scheduling in CUDA February 23, 2012 Dan Negrut, 2012 ME964 UW-Madison Computers are useless. They can only

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

Scheduling Image Processing Pipelines

Scheduling Image Processing Pipelines Lecture 14: Scheduling Image Processing Pipelines Visual Computing Systems Simple image processing kernel int WIDTH = 1024; int HEIGHT = 1024; float input[width * HEIGHT]; float output[width * HEIGHT];

More information

Improved Deep Image Compositing Using Subpixel Masks

Improved Deep Image Compositing Using Subpixel Masks Improved Deep Image Compositing Using Subpixel Masks Jonathan Egstad DreamWorks Animation jonathan.egstad@dreamworks.com Mark Davis Dylan Lacewell DreamWorks Animation DreamWorks Animation / NVIDIA mark.davis@dreamworks.com

More information

A GPU-Accelerated Node Based Framework for Hair Simulation and Rendering

A GPU-Accelerated Node Based Framework for Hair Simulation and Rendering A GPU-Accelerated Node Based Framework for Hair Simulation and Rendering Francesco Giordana Sarah Macdonald Gianluca Vatinno Double Negative VFX double negative visual effects Hair Creatures: Digi-doubles

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

Using GPUs to compute the multilevel summation of electrostatic forces

Using GPUs to compute the multilevel summation of electrostatic forces Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty

More information

Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools

Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition Jeff Kiel Director, Graphics Developer Tools Computational Graphics Enabled Problem: Complexity of Computation

More information

Measurement of real time information using GPU

Measurement of real time information using GPU Measurement of real time information using GPU Pooja Sharma M. Tech Scholar, Department of Electronics and Communication E-mail: poojachaturvedi1985@gmail.com Rajni Billa M. Tech Scholar, Department of

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction

More information

Support Tools for Porting Legacy Applications to Multicore. Natsuki Kawai, Yuri Ardila, Takashi Nakamura, Yosuke Tamura

Support Tools for Porting Legacy Applications to Multicore. Natsuki Kawai, Yuri Ardila, Takashi Nakamura, Yosuke Tamura Support Tools for Porting Legacy Applications to Multicore Natsuki Kawai, Yuri Ardila, Takashi Nakamura, Yosuke Tamura Agenda Introduction PEMAP: Performance Estimator for MAny core Processors The overview

More information

Quantel Rio Rio Assist

Quantel Rio Rio Assist Quantel Rio Rio Assist V4.0.0 New Feature List Note: This release will require a new V4 license. This is free to any customer with a support contract and can be obtained through your local SAM office or

More information

Dynamics and Particle Effects, Part 1 By Audri Phillips

Dynamics and Particle Effects, Part 1 By Audri Phillips Dynamics and Particle Effects, Part 1 By Audri Phillips From their very inception, 3D programs have been used to imitate natural phenomena, creating realistic, stylized, or artistic effects. A greater

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied

More information

2D & 3D Animation NBAY Donald P. Greenberg March 21, 2016 Lecture 7

2D & 3D Animation NBAY Donald P. Greenberg March 21, 2016 Lecture 7 2D & 3D Animation NBAY 6120 Donald P. Greenberg March 21, 2016 Lecture 7 2D Cel Animation Cartoon Animation What is cartoon animation? A sequence of drawings which, when viewed in rapid succession, create

More information

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation Erik Schnetter, Perimeter Institute with M. Blazewicz, I. Hinder, D. Koppelman, S. Brandt, M. Ciznicki, M.

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

A GPU based brute force de-dispersion algorithm for LOFAR

A GPU based brute force de-dispersion algorithm for LOFAR A GPU based brute force de-dispersion algorithm for LOFAR W. Armour, M. Giles, A. Karastergiou and C. Williams. University of Oxford. 8 th May 2012 1 GPUs Why use GPUs? Latest Kepler/Fermi based cards

More information

Software api overview VERSION 3.1v3

Software api overview VERSION 3.1v3 Software api overview VERSION 3.1v3 Mari Software API Overview. Copyright 2016 The Foundry Visionmongers Ltd. All Rights Reserved. Use of this guide and the Mari software is subject to an End User License

More information

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes six radio telescope algorithms on

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc. Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors

More information

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D

More information

Image Processing Optimization C# on GPU with Hybridizer

Image Processing Optimization C# on GPU with Hybridizer Image Processing Optimization C# on GPU with Hybridizer regis.portalez@altimesh.com Median Filter Denoising Noisy image (lena 1960x1960) Denoised image window = 3 2 Median Filter Denoising window Output[i,j]=

More information

ECE 574 Cluster Computing Lecture 17

ECE 574 Cluster Computing Lecture 17 ECE 574 Cluster Computing Lecture 17 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 March 2019 HW#8 (CUDA) posted. Project topics due. Announcements 1 CUDA installing On Linux

More information

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated

More information

GPU Computing: Development and Analysis. Part 1. Anton Wijs Muhammad Osama. Marieke Huisman Sebastiaan Joosten

GPU Computing: Development and Analysis. Part 1. Anton Wijs Muhammad Osama. Marieke Huisman Sebastiaan Joosten GPU Computing: Development and Analysis Part 1 Anton Wijs Muhammad Osama Marieke Huisman Sebastiaan Joosten NLeSC GPU Course Rob van Nieuwpoort & Ben van Werkhoven Who are we? Anton Wijs Assistant professor,

More information

V-RAY NEXT FOR MAYA KEY FEATURES

V-RAY NEXT FOR MAYA KEY FEATURES V-RAY NEXT FOR MAYA KEY FEATURES October 2018 Jason Huang NEW FEATURES ADAPTIVE DOME LIGHT Faster, cleaner and more accurate image-based environment lighting based on V-Ray Scene Intelligence. FASTER IPR

More information

Future Studios Research Lab

Future Studios Research Lab GPU TECHNOLOGY WORKSHOP SOUTH EAST ASIA 2014 Future Studios Research Lab The Boy and His Robot Film Case Study Prof SEAH Hock Soon Director Multi-plAtform Game Innovation Centre (MAGIC) Nanyang Technological

More information

A GPU Implementation of Tiled Belief Propagation on Markov Random Fields. Hassan Eslami Theodoros Kasampalis Maria Kotsifakou

A GPU Implementation of Tiled Belief Propagation on Markov Random Fields. Hassan Eslami Theodoros Kasampalis Maria Kotsifakou A GPU Implementation of Tiled Belief Propagation on Markov Random Fields Hassan Eslami Theodoros Kasampalis Maria Kotsifakou BP-M AND TILED-BP 2 BP-M 3 Tiled BP T 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 4 Tiled

More information

Real-Time Rendering Architectures

Real-Time Rendering Architectures Real-Time Rendering Architectures Mike Houston, AMD Part 1: throughput processing Three key concepts behind how modern GPU processing cores run code Knowing these concepts will help you: 1. Understand

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and

More information

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño High Quality DXT Compression using OpenCL for CUDA Ignacio Castaño icastano@nvidia.com March 2009 Document Change History Version Date Responsible Reason for Change 0.1 02/01/2007 Ignacio Castaño First

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

An Introduction to GPU Architecture and CUDA C/C++ Programming. Bin Chen April 4, 2018 Research Computing Center

An Introduction to GPU Architecture and CUDA C/C++ Programming. Bin Chen April 4, 2018 Research Computing Center An Introduction to GPU Architecture and CUDA C/C++ Programming Bin Chen April 4, 2018 Research Computing Center Outline Introduction to GPU architecture Introduction to CUDA programming model Using the

More information

FREEDOM TO CREATE GPU-POWERED VIRTUAL WORKSTATIONS OFFER GREATER PERFORMANCE AND FLEXIBILITY.

FREEDOM TO CREATE GPU-POWERED VIRTUAL WORKSTATIONS OFFER GREATER PERFORMANCE AND FLEXIBILITY. INDUSTRY SOLUTION GUIDE: MEDIA & ENTERTAINMENT FREEDOM TO CREATE GPU-POWERED VIRTUAL WORKSTATIONS OFFER GREATER PERFORMANCE AND FLEXIBILITY. NVIDIA VIRTUAL GPU INDUSTRY SOLUTION GUIDE OCT 18 In an era

More information

Efficient and Scalable Shading for Many Lights

Efficient and Scalable Shading for Many Lights Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL

More information

Porting Fabric Engine to NVIDIA Unified Memory: A Case Study. Peter Zion Chief Architect Fabric Engine Inc.

Porting Fabric Engine to NVIDIA Unified Memory: A Case Study. Peter Zion Chief Architect Fabric Engine Inc. Porting Fabric Engine to NVIDIA Unified Memory: A Case Study Peter Zion Chief Architect Fabric Engine Inc. What is Fabric Engine? A high-performance platform for building 3D content creation applications,

More information

Optimisation Myths and Facts as Seen in Statistical Physics

Optimisation Myths and Facts as Seen in Statistical Physics Optimisation Myths and Facts as Seen in Statistical Physics Massimo Bernaschi Institute for Applied Computing National Research Council & Computer Science Department University La Sapienza Rome - ITALY

More information

NVidia s GPU Microarchitectures. By Stephen Lucas and Gerald Kotas

NVidia s GPU Microarchitectures. By Stephen Lucas and Gerald Kotas NVidia s GPU Microarchitectures By Stephen Lucas and Gerald Kotas Intro Discussion Points - Difference between CPU and GPU - Use s of GPUS - Brie f History - Te sla Archite cture - Fermi Architecture -

More information

V-RAY NEXT FOR 3DS MAX

V-RAY NEXT FOR 3DS MAX V-RAY NEXT FOR 3DS MAX May 2018 Dabarti Studio NEW FEATURES POWERFUL SCENE INTELLIGENCE V-Ray Scene Intelligence analyzes your scene to optimize rendering. You automatically get the best quality in less

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

GPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum

GPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum GPU & High Performance Computing (by NVIDIA) CUDA Compute Unified Device Architecture 29.02.2008 Florian Schornbaum GPU Computing Performance In the last few years the GPU has evolved into an absolute

More information

FREEDOM TO CREATE. GPU-Powered Virtual Workstations Offer Greater Performance and Flexibility.

FREEDOM TO CREATE. GPU-Powered Virtual Workstations Offer Greater Performance and Flexibility. FREEDOM TO CREATE GPU-Powered Virtual Workstations Offer Greater Performance and Flexibility. In an era of disruptive distribution models, increased consumer demands for high-quality visual effects, and

More information

The Benefits of GPU Compute on ARM Mali GPUs

The Benefits of GPU Compute on ARM Mali GPUs The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 1 SEMICON Europa 2014 ARM Introduction World leading semiconductor IP Founded in 1990 1060 processor licenses sold to more than 350 companies >

More information

CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION. Julien Demouth, NVIDIA Cliff Woolley, NVIDIA

CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION. Julien Demouth, NVIDIA Cliff Woolley, NVIDIA CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION Julien Demouth, NVIDIA Cliff Woolley, NVIDIA WHAT WILL YOU LEARN? An iterative method to optimize your GPU code A way to conduct that method with NVIDIA

More information

Computer Science, UCL, London

Computer Science, UCL, London Genetically Improved CUDA C++ Software W. B. Langdon Computer Science, UCL, London 26.4.2014 Genetically Improved CUDA C++ Software W. B. Langdon Centre for Research on Evolution, Search and Testing Computer

More information

GPU A rchitectures Architectures Patrick Neill May

GPU A rchitectures Architectures Patrick Neill May GPU Architectures Patrick Neill May 30, 2014 Outline CPU versus GPU CUDA GPU Why are they different? Terminology Kepler/Maxwell Graphics Tiled deferred rendering Opportunities What skills you should know

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3

More information

Working with Metal Overview

Working with Metal Overview Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission

More information

GPU for HPC. October 2010

GPU for HPC. October 2010 GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline Fermi/Kepler Architecture Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control

More information

Deep Image Nordic TDForum Presented by: Colin Doncaster

Deep Image Nordic TDForum Presented by: Colin Doncaster Deep Image Compositing @ Nordic TDForum 2011 Presented by: Colin Doncaster Introduction this course is meant to introduce the concepts of deep image compositing provide some background that will help when

More information

Abstract. Introduction. Kevin Todisco

Abstract. Introduction. Kevin Todisco - Kevin Todisco Figure 1: A large scale example of the simulation. The leftmost image shows the beginning of the test case, and shows how the fluid refracts the environment around it. The middle image

More information

ADVANCES IN EXTREME-SCALE APPLICATIONS ON GPU. Peng Wang HPC Developer Technology

ADVANCES IN EXTREME-SCALE APPLICATIONS ON GPU. Peng Wang HPC Developer Technology ADVANCES IN EXTREME-SCALE APPLICATIONS ON GPU Peng Wang HPC Developer Technology NVIDIA SuperPhones to SuperComputers Computers no longer get faster, just wider Architectural Features Common to All Processors

More information

Musemage. The Revolution of Image Processing

Musemage. The Revolution of Image Processing Musemage The Revolution of Image Processing Kaiyong Zhao Hong Kong Baptist University, Paraken Technology Co. Ltd. Yubo Zhang University of California Davis Outline Introduction of Musemage Why GPU based

More information

CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION

CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION WHAT YOU WILL LEARN An iterative method to optimize your GPU code Some common bottlenecks to look out for Performance diagnostics with NVIDIA Nsight

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

Simon Hodgkiss CG Generalist & Character Animator

Simon Hodgkiss CG Generalist & Character Animator Career Synopsis Simon Hodgkiss CG Generalist & Character Animator I have been working as an independent CG Director and Animator for the last 17 years. Having worked in IT, Games and Post Production. As

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort rob@cs.vu.nl Schedule 2 1. Introduction, performance metrics & analysis 2. Many-core hardware 3. Cuda class 1: basics 4. Cuda class

More information

Analysis and Visualization Algorithms in VMD

Analysis and Visualization Algorithms in VMD 1 Analysis and Visualization Algorithms in VMD David Hardy Research/~dhardy/ NAIS: State-of-the-Art Algorithms for Molecular Dynamics (Presenting the work of John Stone.) VMD Visual Molecular Dynamics

More information

High-performance image processing routines for video and film processing. Hannes Fassold

High-performance image processing routines for video and film processing. Hannes Fassold High-performance image processing routines for video and film processing Hannes Fassold 2018-03-28 2 Our research group GPU-accelerated algorithms / applications @ CCM Connected Computing research group,

More information

Accelerating Financial Applications on the GPU

Accelerating Financial Applications on the GPU Accelerating Financial Applications on the GPU Scott Grauer-Gray Robert Searles William Killian John Cavazos Department of Computer and Information Science University of Delaware Sixth Workshop on General

More information

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPGPU. Peter Laurens 1st-year PhD Student, NSC GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing

More information

INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES. Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp.

INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES. Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp. INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp. Computer Vision in Mobile Tegra K1 It s time! AGENDA Use cases categories

More information

Optimizing CUDA for GPU Architecture. CSInParallel Project

Optimizing CUDA for GPU Architecture. CSInParallel Project Optimizing CUDA for GPU Architecture CSInParallel Project August 13, 2014 CONTENTS 1 CUDA Architecture 2 1.1 Physical Architecture........................................... 2 1.2 Virtual Architecture...........................................

More information

Lecture 1: an introduction to CUDA

Lecture 1: an introduction to CUDA Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Overview hardware view software view CUDA programming

More information

CUDA OPTIMIZATION WITH NVIDIA NSIGHT VISUAL STUDIO EDITION

CUDA OPTIMIZATION WITH NVIDIA NSIGHT VISUAL STUDIO EDITION April 4-7, 2016 Silicon Valley CUDA OPTIMIZATION WITH NVIDIA NSIGHT VISUAL STUDIO EDITION CHRISTOPH ANGERER, NVIDIA JAKOB PROGSCH, NVIDIA 1 WHAT YOU WILL LEARN An iterative method to optimize your GPU

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Execution Scheduling in CUDA Revisiting Memory Issues in CUDA February 17, 2011 Dan Negrut, 2011 ME964 UW-Madison Computers are useless. They

More information