OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

Size: px
Start display at page:

Download "OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data"

Transcription

1

2 OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data Andrew Miller Computer Vision Group Research Developer

3 3-D TERRAIN RECONSTRUCTION FROM IMAGES 3 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

4 APPLICATIONS Augmented Reality Change Detection Geo Referencing Geo Positioning Visibility app Geo Reference the input image Geocoordinates (Long=?,lat=?, ele=?) Vantage Point 2-d Vantage point 3-d model 3-d Planar model Vantage point 4 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

5 GEO-POSITIONING Input Output Geo-coordinates (Long=?,lat=?, ele=?) 3-d model Planar model 5 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

6 APPLICATIONS Example: Change Detection: Change detection on planar model Change detection on 3-d model 6 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

7 WHY IS IT CHALLENGING? 7 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

8 EXISTING METHODS 3-d Method Pros Cons References Surface-Based Methods Output is a mesh. Cannot handle ambiguity, uncertainty, complex topology Cipolla et al. TPAMI Cipolla et al. TPAMI Zaharescu et al. ACCV Volumetric Methods Handles ambiguity and occlusion. Large amount of data and computationally expensive. Broadhurst et al. ICCV Pollard et al. CVPR 2007 Crispell 2009 Viola et al. ICCV 1999 Furukawa 2008 Crispell Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

9 VOLUMETRIC MODELING 9 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

10 VOLUMETRIC MODELING Basics Voxel data and probabilistic framework Ray tracing Updating Rendering Memory Footprint Spatial subdivisions Efficient spatial encoding and Data storage Refining GPU Implementation Ray Tracing Updating in parallel, with GPU optimizations 10 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

11 VOLUMETRIC MODEL Given a volume, divide into volumetric pixels, called voxels 11 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

12 VOLUMETRIC MODEL Given a volume, divide into volumetric pixels, called voxels 12 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

13 VOXEL DATA Two probability distributions stored in each voxel: Occupancy probability (alpha) Appearance model (Mixture of 3 Gaussians) P ( X S ) Probability that voxel X is in a surface I (intensity ) μ 0 σ 0 w 0 μ 1 σ 1 w 1 μ 2 σ 2 P ( I X S ) = i = 0 13 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, w i N ( μ, σ i 2 i )

14 MODEL MANIPULATION Divide volume into voxels that store two distributions Need to learn the correct (best) parameters for these two distributions Two main procedures to manipulate our model Update (model reconstruction) Given a set of observations and cameras, learn distributions in each voxel Render (expected images) Given a novel viewpoint, render expected image of the volume Both of these procedures are accomplished by casting rays 14 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

15 RAY CASTING Cast ray through the image plane into volume Sum relevant statistics as rays intersect voxels Camera Center Image Plane 15 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

16 MODEL ALONG A RAY Assume one voxel along a ray produces observed intensity Must be un-occluded, and have high surface probability How likely some voxel Xα is responsible for observation at r: P( V = X α ) = P( X α S ) [1 P( X α ' S )] r α ' < α Surface likelihood Visibility of voxel alpha 16 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

17 UPDATING Use an online algorithm to update surface probabilities: P N +1 ( X S ) = P N ( X S ) p N (I N +1 X S ) p N (I) Updated surface likelihood Previous surface likelihood Bayes update factor 17 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

18 UPDATING: Downtown Providence 18 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

19 UPDATING: Capitol Building in Providence 19 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

20 RENDERING Marginalize appearance across ray Calculate visibility of each cell along ray Keep running sum of the expected intensity: E[I r ] = α P(V r = X α )E[ p(i X S)] Expected image sequence of downtown Providence, RI 20 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

21 MEMORY FOOTPRINT 21 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

22 VOLUMETRIC MODELING Basics Voxel data and probabilistic framework Ray tracing Updating Rendering Memory Footprint Spatial subdivisions Efficient spatial encoding and Data storage Refining GPU Implementation Ray Tracing Updating in parallel, with GPU optimizations 22 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

23 MEMORY FOOTPRINT Existing scenes (downtown and capitol) are 1536 by 1536 by billion voxels Each voxel carries at least 12 bytes of information: 4 bytes for occupancy probability 8 bytes for appearance model Would require about 14 GB of information! Reducing Memory in two ways Spatial subdivision Efficiently encoded octree structure 23 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

24 SPATIAL SUBDIVISION Fixed grid of shallow octrees Trees have depth limit 4 (512 total voxels) Designed for optimal performance with GPU architecture An octree is a data structure in which each internal node has exactly eight children. Octrees are used to partition a three dimensional space by subdividing it into eight octants. 24 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

25 BIT-ENCODED OCTREE Represent tree structure as implicit octree String of bits, each bit indicates if cell has children 73 bits are sufficient for depth four tree (10 bytes) 4 reserved for data pointer (and align to 16) Tree bits [0-9] Data Ptr [10,11,12,13] [ ] [ ] Simple formulas to find child and parent bits: parent (i) = i 1 8 child (i) = 8 i +1 Tree structure can be stored in local memory (shared memory) per thread Accelerates tree traversal 25 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

26 BIT-ENCODED OCTREE Data stored in BFS order, tree points to root cell Simple sum calculate data index: Binary example: parent ( i ) 1 data _ index (i) = 8 tree _ bit ( j) + 1 j = 0 (a) (c) (b) a) Bits encoding octree found in (b). b) Binary tree where each node s data is a letter c) Location of each letter in memory 26 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

27 DATA STORAGE Modular data storage Voxel data stored separately from tree structure Minimizes data transfer for kernels that only need some data (i.e. rendering, update passes) Indexing identical across data buffers 27 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

28 REFINE Once the surface probability passes a certain threshold, cell is refined into 8 octants 28 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

29 REFINE Unrefined model 29 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

30 REFINE Refined to depth four 30 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

31 REFINE Refining one tree Surface Probability Buffer p p p 0 1 New data cells p p p 2 p 3 p 4 p 5 p Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

32 REFINE Difficult to parallelize Cannot refine all trees in parallel in place as race conditions may occur (data written over other new data) Refine in three steps: Compute and save new tree structure, size of each new tree Run a prefix sum on tree sizes to determine new location of each tree's data Allocate new, sufficiently large data buffer, swap data into new locations 32 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

33 REFINE Three parallel steps: 1. Compute new Tree Structure, sizes 2. Compute cumulative sum of size buffer using parallel prefix sum (determines new data indices) s 0 s 1 s 2 s 3 3. Swap data into new location (initializing new cells) α s 0 s 1 s 2 s 3 s 0 0 s i 1 i=0 2 i=0 s i α α 0 α 1 α 2 α 3 33 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

34 GPU IMPLEMENTATION 34 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

35 VOLUMETRIC MODELING Basics: Voxel data and probabilistic framework Ray tracing Updating Rendering Memory Footprint Spatial subdivisions Efficient spatial encoding and Data storage Refining GPU Implementation Ray Tracing Updating in parallel, with GPU optimizations 35 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

36 GPU IMPLEMENTATION Ray tracing is expensive on volumetric models Each voxel along the path of a ray needs to be intersected and examined Very parallel algorithm Can use GPU to process multiple rays simultaneously Note that now we are ray tracing a hybrid grid-octree structure 36 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

37 RAY CASTING REVISITED Workgroups are 8x8 patches of image (64 rays) Camera Center Image Plane 37 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

38 RAY CASTING Tracing through a fixed grid of octrees Two approaches: Two nested while loops (one traversing the grid, the other traversing the current octree) Single while loop, each ray caches it's own current tree 38 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

39 Double Loop Method Outer Loop: Iterating over regular-grid 39 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

40 Double Loop Method Outer Loop: Iterating over regular-grid 40 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

41 Double Loop Method Outer Loop: Iterating over regular-grid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 41 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

42 Double Loop Method Outer Loop: Iterating over regular-grid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 42 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

43 Double Loop Method Outer Loop: Iterating over regular-grid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 43 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

44 Double Loop Method Outer Loop: Iterating over regular-grid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 44 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

45 Double Loop Method Outer Loop: Iterating over regular-grid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 45 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

46 Double Loop Method Outer Loop: Iterating over regular-grid 46 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

47 Double Loop Method Outer Loop: Iterating over regular-grid New octree loaded into local memory Inner Loop: Iterating over Octree 47 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

48 Double Loop Method Outer Loop: Iterating over regular-grid New octree loaded into local memory Inner Loop: Iterating over Octree 48 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

49 Double Loop Method Outer Loop: Iterating over regular-grid New octree loaded into local memory Note how the two upper rays must wait for the lower ray to finish (due to OpenCL branching) Inner Loop: Iterating over Octree 49 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

50 Outer Loop: Iterating over regular-grid Note how the two upper rays must wait for the lower ray to finish (due to OpenCL branching) 50 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

51 Single Loop Method Rays traverse fixed grid and octrees 51 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

52 Single Loop Method Rays traverse fixed grid and octrees 52 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

53 Single Loop Method Rays traverse fixed grid and octrees Each ray reads octree (16 bytes) into local memory 53 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

54 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 54 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

55 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 55 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

56 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 56 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

57 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 57 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

58 RAY CASTING Single while loop tracing performs better Threads do not have to wait for neighbors to continue processing Global memory transfer time is hidden by scheduler Operations on tree in local memory are fast 58 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

59 UPDATING Complications with update: Multiple rays can pass through the same cell; creates race conditions Need atomic operations to update global memory Global writes (especially atomics) are very expensive Two rays (in red) update the same cell need atomic (serial) operations. 59 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

60 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per work-item (thread) Determine connected components in two passes: Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

61 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per work-item (thread) Determine connected components in two passes: 1)Each thread in the work group connects to the right Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

62 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per work-item (thread) Determine connected components in two passes: 1)Each thread in the work group connects to the right 2)Each tail looks at the next row, continues linked list Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

63 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per work-item (thread) The head of each list sums up their contribution to their cell The head of each list then makes an atomic call to update global memory Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

64 SPEED AND MEMORY ANALYSIS 64 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

65 SPEED AND MEMORY ANALYSIS Tested multiple processes: Render top down (fewer intersections), Render oblique (more intersections), Update empty scene, Update near converged scene, Refine Processes tested in OpenCL on: multi-core (Gulftown) CPU 200 series NVIDIA card Fermi generation NVIDIA card ATI 5870 series OpenCL code is currently optimized for NVIDIA Fermi generation cards Unrolled 3d rays/points in float4 vectors to three floats to save register space/boost occupancy Using 8x8 workgroup size 65 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

66 RENDER VS OCCUPANCY 900 Render time vs kernel occupancy for two generations of NVIDIA cards. In our ray trace implementation, lower occupancy yielded faster results, with a steeper climb for the previous generation hardware. Render Time (ms) GPU time for GTX280 GPU time for GTX Occupancy 66 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

67 UPDATE VS WORKSPACE SIZE Update time vs workgroup size Optimal workgroup size appears to be 8x8 for both current and previous generation NVIDIA cards. Time (ms) Gtx280 GTX x2 6x6 10x10 14x14 workgroup size 67 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

68 SPEED AND MEMORY ANALYSIS Render, update and refine benchmarks: Render Top down Render Oblique Update Empty CPU (intel 3.33 GHz) ATI 5870 NVIDIA GTX 280 NVIDIA GTX 480 NVIDIA GTX s.15s.118s.062s.039s 2.23s.27s.213s.114s.065ms 8.11s 4.72s 4.37s (cache) 13.99s 1.81s (cache).934s.795s (cache).790s Optimized OpenCL code for NVIDIA architecture. Benchmarks run out of the box on ATI card. ATI does vector ops in parallel, our implementation uses scalar operations. Update Converged 18.92s n/a n/a 1.363s 1.990s (cache).999s 68 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

69 FURTHER ANALYSIS Size of GPU memory currently restricts model size. Larger scenes stream blocks in and out of the GPU Current render processing is faster than 5 gb/s (PCIe bandwidth) Block streaming from host to device memory is a bottleneck Larger scenes would benefit from an integrated CPU/GPU with higher bandwidth 69 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

70 FUTURE WORK 70 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

71 FUTURE WORK Goals: More accurately model 3-d space Volumetric (cone) ray tracing. Casting 3-d cone rays, instead of 2-d lines Consider spatial continuity. Filtering voxels, accounting for neighboring joint distribution Model underlying physical properties (surface normals, textures, non Lambertian properties) to make models robust to many surface types and lighting conditions Build bigger models Multi resolution scenes (using octree) Streaming cache system (disk to RAM to GPU) for many blocks 71 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

72 CONE RAY TRACE Volumetric ray trace Goal is to cover the entire volume when updating Camera Center Image Plane 72 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

73 CONE RAY TRACE More ray coverage removes some aliasing effects: Cone Update Ray Update 73 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

74 DATA STREAMING Streaming models for larger scenes Begun implementing a streaming model Asynchronous transfers between three levels of memory 74 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

75 QUESTIONS? 75 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

76 CODE PUBLICLY AVAILABLE AT: 76 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

77 Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD s positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied. 77 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS

EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS James Ross High Performance Technologies, Inc (HPTi) Computational Scientist Edward Carmack David Richie Song Park, Brian Henz and Dale Shires HPTi

More information

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD INTRODUCTION TO OPENCL TM A Beginner s Tutorial Udeepta Bordoloi AMD IT S A HETEROGENEOUS WORLD Heterogeneous computing The new normal CPU Many CPU s 2, 4, 8, Very many GPU processing elements 100 s Different

More information

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games Viewdle Inc. 1 Use cases Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games 2 Why OpenCL matter? OpenCL is going to bring such

More information

HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based. Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager

HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based. Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager INTRODUCTION WHO WE ARE 3 Highly Parallel Computing in Physics-based

More information

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT JOSEPH L. GREATHOUSE, MAYANK DAGA AMD RESEARCH 11/20/2014 THIS TALK IN ONE SLIDE Demonstrate how to save space and time

More information

OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER

OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER Budirijanto Purnomo AMD Technical Lead, GPU Compute Tools PRESENTATION OVERVIEW Motivation AMD APP Profiler

More information

SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL

SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL Matthias Bach and David Rohr Frankfurt Institute for Advanced Studies Goethe University of Frankfurt I: INTRODUCTION 3 Scaling

More information

SIMULATOR AMD RESEARCH JUNE 14, 2015

SIMULATOR AMD RESEARCH JUNE 14, 2015 AMD'S gem5apu SIMULATOR AMD RESEARCH JUNE 14, 2015 OVERVIEW Introducing AMD s gem5 APU Simulator Extends gem5 with a GPU timing model Supports Heterogeneous System Architecture in SE mode Includes several

More information

Fusion Enabled Image Processing

Fusion Enabled Image Processing Fusion Enabled Image Processing I Jui (Ray) Sung, Mattieu Delahaye, Isaac Gelado, Curtis Davis MCW Strengths Complete Tools Port, Explore, Analyze, Tune Training World class R&D team Leading Algorithms

More information

Automatic Intra-Application Load Balancing for Heterogeneous Systems

Automatic Intra-Application Load Balancing for Heterogeneous Systems Automatic Intra-Application Load Balancing for Heterogeneous Systems Michael Boyer, Shuai Che, and Kevin Skadron Department of Computer Science University of Virginia Jayanth Gummaraju and Nuwan Jayasena

More information

viewdle! - machine vision experts

viewdle! - machine vision experts viewdle! - machine vision experts topic using algorithmic metadata creation and heterogeneous computing to build the personal content management system of the future Page 2 Page 3 video of basic recognition

More information

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to the features, functionality, availability, timing,

More information

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO Gestural and Cinematic Interfaces - DX11 David Brebner Unlimited Realities CTO Gestural and Cinematic Interfaces DX11 Making an emotional connection with users 3 Unlimited Realities / Fingertapps About

More information

Panel Discussion: The Future of I/O From a CPU Architecture Perspective

Panel Discussion: The Future of I/O From a CPU Architecture Perspective Panel Discussion: The Future of I/O From a CPU Architecture Perspective Brad Benton AMD, Inc. #OFADevWorkshop Issues Move to Exascale involves more parallel processing across more processing elements GPUs,

More information

BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD

BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM Dong Ping Zhang Heterogeneous System Architecture AMD VASCULATURE ENHANCEMENT 3 Biomedical data analysis on heterogeneous platform June, 2012 EXAMPLE:

More information

AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview April 2013 Approved for public distribution

AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview April 2013 Approved for public distribution AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION 1 3DMark Overview April 2013 Approved for public distribution 2 3DMark Overview April 2013 Approved for public distribution

More information

AMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011

AMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011 AMD IOMMU VERSION 2 How KVM will use it Jörg Rödel August 16th, 2011 AMD IOMMU VERSION 2 WHAT S NEW? 2 AMD IOMMU Version 2 Support in KVM August 16th, 2011 Public NEW FEATURES - OVERVIEW Two-level page

More information

Accelerating Applications. the art of maximum performance computing James Spooner Maxeler VP of Acceleration

Accelerating Applications. the art of maximum performance computing James Spooner Maxeler VP of Acceleration Accelerating Applications the art of maximum performance computing James Spooner Maxeler VP of Acceleration Introduction The Process The Tools Case Studies Summary What do we mean by acceleration? How

More information

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session 2117 Olivier Zegdoun AMD Sr. Software Engineer CONTENTS Rendering Effects Before Fusion: single discrete GPU case Before Fusion: multiple discrete

More information

AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview February 2013 Approved for public distribution

AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview February 2013 Approved for public distribution AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION 1 3DMark Overview February 2013 Approved for public distribution 2 3DMark Overview February 2013 Approved for public distribution

More information

Understanding GPGPU Vector Register File Usage

Understanding GPGPU Vector Register File Usage Understanding GPGPU Vector Register File Usage Mark Wyse AMD Research, Advanced Micro Devices, Inc. Paul G. Allen School of Computer Science & Engineering, University of Washington AGENDA GPU Architecture

More information

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011 GPGPU COMPUTE ON AMD Udeepta Bordoloi April 6, 2011 WHY USE GPU COMPUTE CPU: scalar processing + Latency + Optimized for sequential and branching algorithms + Runs existing applications very well - Throughput

More information

Multi-core processors are here, but how do you resolve data bottlenecks in native code?

Multi-core processors are here, but how do you resolve data bottlenecks in native code? Multi-core processors are here, but how do you resolve data bottlenecks in native code? hint: it s all about locality Michael Wall October, 2008 part I of II: System memory 2 PDC 2008 October 2008 Session

More information

AMD APU and Processor Comparisons. AMD Client Desktop Feb 2013 AMD

AMD APU and Processor Comparisons. AMD Client Desktop Feb 2013 AMD AMD APU and Processor Comparisons AMD Client Desktop Feb 2013 AMD SUMMARY 3DMark released Feb 4, 2013 Contains DirectX 9, DirectX 10, and DirectX 11 tests AMD s current product stack features DirectX 11

More information

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to AMD s strategy and focus, expected datacenter total

More information

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions FLASH MEMORY SUMMIT 2011 Adoption of Caching & Hybrid Solutions Market Overview 2009 Flash production reached parity with all other existing solid state memories in terms of bites. 2010 Overall flash production

More information

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling AMD Radeon Open Compute Platform Felix Kuehling ROCM PLATFORM ON LINUX Compiler Front End AMDGPU Driver Enabled with ROCm GCN Assembly Device LLVM Compiler (GCN) LLVM Opt Passes GCN Target Host LLVM Compiler

More information

HyperTransport Technology

HyperTransport Technology HyperTransport Technology in 2009 and Beyond Mike Uhler VP, Accelerated Computing, AMD President, HyperTransport Consortium February 11, 2009 Agenda AMD Roadmap Update Torrenza, Fusion, Stream Computing

More information

Designing Natural Interfaces

Designing Natural Interfaces Designing Natural Interfaces So what? Computers are everywhere C.T.D.L.L.C. Computers that don t look like computers. Computers that don t look like Computers Computers that don t look like Computers

More information

AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES

AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES VERSION 1 - FEBRUARY 2018 CONTACT Address Advanced Micro Devices, Inc 7171 Southwest Pkwy Austin, Texas 78735 United States Phone

More information

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to AMD s positioning in the datacenter market; expected

More information

STREAMING VIDEO DATA INTO 3D APPLICATIONS Session Christopher Mayer AMD Sr. Software Engineer

STREAMING VIDEO DATA INTO 3D APPLICATIONS Session Christopher Mayer AMD Sr. Software Engineer STREAMING VIDEO DATA INTO 3D APPLICATIONS Session 2116 Christopher Mayer AMD Sr. Software Engineer CONTENT Introduction Pinned Memory Streaming Video Data How does the APU change the game 3 Streaming Video

More information

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011 MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise June 2011 FREE LUNCH IS OVER, CODES HAVE TO MIGRATE! Many existing legacy codes needs to migrate to

More information

INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS

INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS ARKAPRAVA BASU, JOSEPH L. GREATHOUSE, GURU VENKATARAMANI, JÁN VESELÝ AMD RESEARCH, ADVANCED MICRO DEVICES, INC. MODERN SYSTEMS ARE POWERED BY HETEROGENEITY

More information

CAUTIONARY STATEMENT 1 AMD NEXT HORIZON NOVEMBER 6, 2018

CAUTIONARY STATEMENT 1 AMD NEXT HORIZON NOVEMBER 6, 2018 CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to AMD s positioning in the datacenter market; expected

More information

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research ACCELERATING MATRIX PROCESSING WITH GPUs Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research ACCELERATING MATRIX PROCESSING WITH GPUS MOTIVATION Matrix operations

More information

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015 The Rise of Open Programming Frameworks JC BARATAULT IWOCL May 2015 1,000+ OpenCL projects SourceForge GitHub Google Code BitBucket 2 TUM.3D Virtual Wind Tunnel 10K C++ lines of code, 30 GPU kernels CUDA

More information

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)

More information

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD THE PROGRAMMER S GUIDE TO THE APU GALAXY Phil Rogers, Corporate Fellow AMD THE OPPORTUNITY WE ARE SEIZING Make the unprecedented processing capability of the APU as accessible to programmers as the CPU

More information

RegMutex: Inter-Warp GPU Register Time-Sharing

RegMutex: Inter-Warp GPU Register Time-Sharing RegMutex: Inter-Warp GPU Register Time-Sharing Farzad Khorasani* Hodjat Asghari Esfeden Amin Farmahini-Farahani Nuwan Jayasena Vivek Sarkar *farkhor@gatech.edu The 45 th International Symposium on Computer

More information

1 Presentation Title Month ##, 2012

1 Presentation Title Month ##, 2012 1 Presentation Title Month ##, 2012 Malloc in OpenCL kernels Why and how? Roy Spliet Bsc. (r.spliet@student.tudelft.nl) Delft University of Technology Student Msc. Dr. A.L. Varbanescu Prof. Dr. Ir. H.J.

More information

Generic System Calls for GPUs

Generic System Calls for GPUs Generic System Calls for GPUs Ján Veselý*, Arkaprava Basu, Abhishek Bhattacharjee*, Gabriel H. Loh, Mark Oskin, Steven K. Reinhardt *Rutgers University, Indian Institute of Science, Advanced Micro Devices

More information

Sequential Consistency for Heterogeneous-Race-Free

Sequential Consistency for Heterogeneous-Race-Free Sequential Consistency for Heterogeneous-Race-Free DEREK R. HOWER, BRADFORD M. BECKMANN, BENEDICT R. GASTER, BLAKE A. HECHTMAN, MARK D. HILL, STEVEN K. REINHARDT, DAVID A. WOOD JUNE 12, 2013 EXECUTIVE

More information

MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE

MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE VIGNESH ADHINARAYANAN, INDRANI PAUL, JOSEPH L. GREATHOUSE, WEI HUANG, ASHUTOSH PATTNAIK, WU-CHUN FENG POWER AND ENERGY ARE FIRST-CLASS

More information

FUSION PROCESSORS AND HPC

FUSION PROCESSORS AND HPC FUSION PROCESSORS AND HPC Chuck Moore AMD Corporate Fellow & Technology Group CTO June 14, 2011 Fusion Processors and HPC Today: Multi-socket x86 CMPs + optional dgpu + high BW memory Fusion APUs (SPFP)

More information

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016 AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING BILL.BRANTLEY@AMD.COM, FELLOW 3 OCTOBER 2016 AMD S VISION FOR EXASCALE COMPUTING EMBRACING HETEROGENEITY CHAMPIONING OPEN SOLUTIONS ENABLING LEADERSHIP

More information

Spatial Data Structures

Spatial Data Structures CSCI 420 Computer Graphics Lecture 17 Spatial Data Structures Jernej Barbic University of Southern California Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees [Angel Ch. 8] 1 Ray Tracing Acceleration

More information

HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D

HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D AMD GRAPHIC CORE NEXT Low Power High Performance Graphics & Parallel Compute Michael Mantor AMD Senior Fellow Architect Michael.mantor@amd.com Mike Houston AMD

More information

Spatial Data Structures

Spatial Data Structures CSCI 480 Computer Graphics Lecture 7 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids BSP Trees [Ch. 0.] March 8, 0 Jernej Barbic University of Southern California http://www-bcf.usc.edu/~jbarbic/cs480-s/

More information

ROCm: An open platform for GPU computing exploration

ROCm: An open platform for GPU computing exploration UCX-ROCm: ROCm Integration into UCX {Khaled Hamidouche, Brad Benton}@AMD Research ROCm: An open platform for GPU computing exploration 1 JUNE, 2018 ISC ROCm Software Platform An Open Source foundation

More information

The Road to the AMD. Fiji GPU. Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015

The Road to the AMD. Fiji GPU. Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015 The Road to the AMD Fiji GPU Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015 Fiji Chip DETAILED LOOK 4GB High-Bandwidth Memory 4096-bit wide interface 512 GB/s

More information

Desktop Telepresence Arrived! Sudha Valluru ViVu CEO

Desktop Telepresence Arrived! Sudha Valluru ViVu CEO Desktop Telepresence Arrived! Sudha Valluru ViVu CEO 3 Desktop Telepresence Arrived! Video Collaboration market Telepresence Telepresence Cost Expensive Expensive HW HW Legacy Apps Interactivity ViVu CONFIDENTIAL

More information

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD S X86 OPEN64 COMPILER. Michael Lai AMD AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) April 1, 2003 [Angel 9.10] Frank Pfenning Carnegie

More information

Cilk Plus: Multicore extensions for C and C++

Cilk Plus: Multicore extensions for C and C++ Cilk Plus: Multicore extensions for C and C++ Matteo Frigo 1 June 6, 2011 1 Some slides courtesy of Prof. Charles E. Leiserson of MIT. Intel R Cilk TM Plus What is it? C/C++ language extensions supporting

More information

AMD HD3D Technology. Setup Guide. 1 AMD HD3D TECHNOLOGY: Setup Guide

AMD HD3D Technology. Setup Guide. 1 AMD HD3D TECHNOLOGY: Setup Guide AMD HD3D Technology Setup Guide 1 AMD HD3D TECHNOLOGY: Setup Guide Contents AMD HD3D Technology... 3 Frame Sequential Displays... 4 Supported 3D Display Hardware... 5 AMD Display Drivers... 5 Configuration

More information

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

Maximizing Six-Core AMD Opteron Processor Performance with RHEL Maximizing Six-Core AMD Opteron Processor Performance with RHEL Bhavna Sarathy Red Hat Technical Lead, AMD Sanjay Rao Senior Software Engineer, Red Hat Sept 4, 2009 1 Agenda Six-Core AMD Opteron processor

More information

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016 Ray Tracing Computer Graphics CMU 15-462/15-662, Fall 2016 Primitive-partitioning vs. space-partitioning acceleration structures Primitive partitioning (bounding volume hierarchy): partitions node s primitives

More information

L10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion

L10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion L10 Layered Depth Normal Images Introduction Related Work Structured Point Representation Boolean Operations Conclusion 1 Introduction Purpose: using the computational power on GPU to speed up solid modeling

More information

Heterogeneous Computing

Heterogeneous Computing Heterogeneous Computing Featured Speaker Ben Sander Senior Fellow Advanced Micro Devices (AMD) DR. DOBB S: GPU AND CPU PROGRAMMING WITH HETEROGENEOUS SYSTEM ARCHITECTURE Ben Sander AMD Senior Fellow APU:

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) March 28, 2002 [Angel 8.9] Frank Pfenning Carnegie

More information

1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING

1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING 1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING TASK-PARALLELISM OpenCL, CUDA, OpenMP (traditionally) and the like are largely data-parallel models Their core unit of parallelism

More information

AMD Radeon ProRender plug-in for Unreal Engine. Installation Guide

AMD Radeon ProRender plug-in for Unreal Engine. Installation Guide AMD Radeon ProRender plug-in for Unreal Engine Installation Guide This document is a guide on how to install and configure AMD Radeon ProRender plug-in for Unreal Engine. DISCLAIMER The information contained

More information

3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components

3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components 3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components Xudong An, Manish Arora, Wei Huang, William C. Brantley, Joseph L. Greathouse AMD Research Advanced Micro Devices, Inc. MOTIVATION

More information

Real-Time Voxelization for Global Illumination

Real-Time Voxelization for Global Illumination Lecture 26: Real-Time Voxelization for Global Illumination Visual Computing Systems Voxelization to regular grid Input: scene triangles Output: surface information at each voxel in 3D grid - Simple case:

More information

Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008

Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008 Run Anywhere The Hardware Platform Perspective Ben Pollan, AMD Java Labs October 28, 2008 Agenda Java Labs Introduction Community Collaboration Performance Optimization Recommendations Leveraging the Latest

More information

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015 SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015 PROBLEM Shaders are compiled in draw calls Emulating certain features in shaders Drivers keep shaders in some intermediate representation And insert

More information

KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015

KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015 KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015 AGENDA Background & Motivation Challenges Native Page Tables Emulating the OS Kernel 2 KVM CPU MODEL IN SYSCALL EMULATION

More information

NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer

NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer SESSION AGENDA Quick Keywords Abstract and Scope Introduction Current User Interface [ UI

More information

Direct Rendering of Trimmed NURBS Surfaces

Direct Rendering of Trimmed NURBS Surfaces Direct Rendering of Trimmed NURBS Surfaces Hardware Graphics Pipeline 2/ 81 Hardware Graphics Pipeline GPU Video Memory CPU Vertex Processor Raster Unit Fragment Processor Render Target Screen Extended

More information

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011 Multi-view Stereo Ivo Boyadzhiev CS7670: September 13, 2011 What is stereo vision? Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape

More information

Pattern-based analytics to estimate and track yield risk of designs down to 7nm

Pattern-based analytics to estimate and track yield risk of designs down to 7nm DAC 2017 Pattern-based analytics to estimate and track yield risk of designs down to 7nm JASON CAIN, MOUTAZ FAKHRY (AMD) PIYUSH PATHAK, JASON SWEIS, PHILIPPE HURAT, YA-CHIEH LAI (CADENCE) INTRODUCTION

More information

DR. LISA SU

DR. LISA SU CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to AMD s strategy and focus, expected datacenter total

More information

Anatomy of AMD s TeraScale Graphics Engine

Anatomy of AMD s TeraScale Graphics Engine Anatomy of AMD s TeraScale Graphics Engine Mike Houston Design Goals Focus on Efficiency f(perf/watt, Perf/$) Scale up processing power and AA performance Target >2x previous generation Enhance stream

More information

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018 clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018 ANECDOTE DISCOVERING A BUFFER OVERFLOW CPU GPU MEMORY MEMORY Data Data Data Data Data 2 clarmor: A

More information

Fast BVH Construction on GPUs

Fast BVH Construction on GPUs Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California

More information

Spatial Data Structures

Spatial Data Structures Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) [Angel 9.10] Outline Ray tracing review what rays matter? Ray tracing speedup faster

More information

Volumetric Scene Reconstruction from Multiple Views

Volumetric Scene Reconstruction from Multiple Views Volumetric Scene Reconstruction from Multiple Views Chuck Dyer University of Wisconsin dyer@cs cs.wisc.edu www.cs cs.wisc.edu/~dyer Image-Based Scene Reconstruction Goal Automatic construction of photo-realistic

More information

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday Announcements Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday 1 Spatial Data Structures Hierarchical Bounding Volumes Grids Octrees BSP Trees 11/7/02 Speeding Up Computations

More information

1401 HETEROGENEOUS HPC How Fusion Designs Can Advance Science

1401 HETEROGENEOUS HPC How Fusion Designs Can Advance Science 1401 HETEROGENEOUS HPC How Fusion Designs Can Advance Science Ben Bergen Los Alamos National Laboratory Research Scientist Marcus Daniels Los Alamos National Laboratory Research Scientist VPIC Team Brian

More information

Graphics Hardware 2008

Graphics Hardware 2008 AMD Smarter Choice Graphics Hardware 2008 Mike Mantor AMD Fellow Architect michael.mantor@amd.com GPUs vs. Multi-core CPUs On a Converging Course or Fundamentally Different? Many Cores Disruptive Change

More information

Vulkan (including Vulkan Fast Paths)

Vulkan (including Vulkan Fast Paths) Vulkan (including Vulkan Fast Paths) Łukasz Migas Software Development Engineer WS Graphics Let s talk about OpenGL (a bit) History 1.0-1992 1.3-2001 multitexturing 1.5-2003 vertex buffer object 2.0-2004

More information

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017 PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017 BACKGROUND-- HARDWARE MEMORY ENCRYPTION AMD Secure Memory Encryption (SME) / AMD Secure Encrypted Virtualization (SEV) Hardware AES engine

More information

AMD EPYC CORPORATE BRAND GUIDELINES

AMD EPYC CORPORATE BRAND GUIDELINES AMD EPYC CORPORATE BRAND GUIDELINES VERSION 1 MAY 2017 CONTACT Address Advanced Micro Devices, Inc 7171 Southwest Pkwy Austin, Texas 78735 United States Phone 1-512-602-1000 Online Email: Brand.Team@amd.com

More information

D3D12 & Vulkan: Lessons learned. Dr. Matthäus G. Chajdas Developer Technology Engineer, AMD

D3D12 & Vulkan: Lessons learned. Dr. Matthäus G. Chajdas Developer Technology Engineer, AMD D3D12 & Vulkan: Lessons learned Dr. Matthäus G. Chajdas Developer Technology Engineer, AMD D3D12 What s new? DXIL DXGI & UWP updates Root Signature 1.1 Shader cache GPU validation PIX D3D12 / DXIL DXBC

More information

Introduction to Mobile Robotics Techniques for 3D Mapping

Introduction to Mobile Robotics Techniques for 3D Mapping Introduction to Mobile Robotics Techniques for 3D Mapping Wolfram Burgard, Michael Ruhnke, Bastian Steder 1 Why 3D Representations Robots live in the 3D world. 2D maps have been applied successfully for

More information

Ray Tracing Acceleration Data Structures

Ray Tracing Acceleration Data Structures Ray Tracing Acceleration Data Structures Sumair Ahmed October 29, 2009 Ray Tracing is very time-consuming because of the ray-object intersection calculations. With the brute force method, each ray has

More information

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory

More information

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo Computer Graphics Bing-Yu Chen National Taiwan University The University of Tokyo Hidden-Surface Removal Back-Face Culling The Depth-Sort Algorithm Binary Space-Partitioning Trees The z-buffer Algorithm

More information

Geometric Representations. Stelian Coros

Geometric Representations. Stelian Coros Geometric Representations Stelian Coros Geometric Representations Languages for describing shape Boundary representations Polygonal meshes Subdivision surfaces Implicit surfaces Volumetric models Parametric

More information

Identifying those parts of a scene that are visible from a chosen viewing position, and only process (scan convert) those parts

Identifying those parts of a scene that are visible from a chosen viewing position, and only process (scan convert) those parts Visible Surface Detection Identifying those parts of a scene that are visible from a chosen viewing position, and only process (scan convert) those parts Two approaches: 1. Object space methods 2. Image

More information

Interactive Isosurface Ray Tracing of Large Octree Volumes

Interactive Isosurface Ray Tracing of Large Octree Volumes Interactive Isosurface Ray Tracing of Large Octree Volumes Aaron Knoll, Ingo Wald, Steven Parker, and Charles Hansen Scientific Computing and Imaging Institute University of Utah 2006 IEEE Symposium on

More information

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management Next-Generation Mobile Computing: Balancing Performance and Power Efficiency HOT CHIPS 19 Jonathan Owen, AMD Agenda The mobile computing evolution The Griffin architecture Memory enhancements Power management

More information

Solid Modeling. Thomas Funkhouser Princeton University C0S 426, Fall Represent solid interiors of objects

Solid Modeling. Thomas Funkhouser Princeton University C0S 426, Fall Represent solid interiors of objects Solid Modeling Thomas Funkhouser Princeton University C0S 426, Fall 2000 Solid Modeling Represent solid interiors of objects Surface may not be described explicitly Visible Human (National Library of Medicine)

More information

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University Ray Tracing III Wen-Chieh (Steve) Lin National Chiao-Tung University Shirley, Fundamentals of Computer Graphics, Chap 10 Doug James CG slides, I-Chen Lin s CG slides Ray-tracing Review For each pixel,

More information

Row Tracing with Hierarchical Occlusion Maps

Row Tracing with Hierarchical Occlusion Maps Row Tracing with Hierarchical Occlusion Maps Ravi P. Kammaje, Benjamin Mora August 9, 2008 Page 2 Row Tracing with Hierarchical Occlusion Maps Outline August 9, 2008 Introduction Related Work Row Tracing

More information

Computer Graphics. Bing-Yu Chen National Taiwan University

Computer Graphics. Bing-Yu Chen National Taiwan University Computer Graphics Bing-Yu Chen National Taiwan University Visible-Surface Determination Back-Face Culling The Depth-Sort Algorithm Binary Space-Partitioning Trees The z-buffer Algorithm Scan-Line Algorithm

More information

Multiview Reconstruction

Multiview Reconstruction Multiview Reconstruction Why More Than 2 Views? Baseline Too short low accuracy Too long matching becomes hard Why More Than 2 Views? Ambiguity with 2 views Camera 1 Camera 2 Camera 3 Trinocular Stereo

More information

Multi-view stereo. Many slides adapted from S. Seitz

Multi-view stereo. Many slides adapted from S. Seitz Multi-view stereo Many slides adapted from S. Seitz Beyond two-view stereo The third eye can be used for verification Multiple-baseline stereo Pick a reference image, and slide the corresponding window

More information

Lecture 17: Solid Modeling.... a cubit on the one side, and a cubit on the other side Exodus 26:13

Lecture 17: Solid Modeling.... a cubit on the one side, and a cubit on the other side Exodus 26:13 Lecture 17: Solid Modeling... a cubit on the one side, and a cubit on the other side Exodus 26:13 Who is on the LORD's side? Exodus 32:26 1. Solid Representations A solid is a 3-dimensional shape with

More information