Bulk-Synchronous Communication Mechanisms in Diderot
|
|
- Paul Briggs
- 6 years ago
- Views:
Transcription
1 Bulk-Synchronous Communication Mechanisms in Diderot John Reppy, Lamont Samuels University of Chicago October 26th, 2015
2 Diderot Diderot overview Diderot is a parallel domain-specific language designed for biomedical image-analysis and visualization algorithms. Its design models the algorithmic structure of its application domain 1 : independent strands computing over a continuous tensor field, which are reconstructed from discrete data using a separable convolution kernel h: F = V h 1 Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis. G Kindlmann, C Chiw, N Seltzer, L Samuels, and J Reppy. (to appear VIS 2015). October 26th, 2015 AGERE 15 - Diderot 2
3 Diderot overview Diderot Diderot is a parallel domain-specific language designed for biomedical image-analysis and visualization algorithms. Its design models the algorithmic structure of its application domain 1 : independent strands computing over a continuous tensor field, which are reconstructed from discrete data using a separable convolution kernel h: F = V h V h F Discrete image data Continuous field 1 Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis. G Kindlmann, C Chiw, N Seltzer, L Samuels, and J Reppy. (to appear VIS 2015). October 26th, 2015 AGERE 15 - Diderot 2
4 Diderot overview Diderot Program Structure Square roots of integers using Heron s method. // global definitions input int N = 1000; input real eps = ; // strand definition strand SqRoot (real val) { output real root = val; update { root = (root + val/root) / 2.0; if ( rootˆ2 - val /val < eps) stabilize; // initialization initially [ SqRoot(real(i)) i in 1..N ] October 26th, 2015 AGERE 15 - Diderot 3
5 Diderot overview Diderot Parallelism Model Bulk-synchronous parallel with deterministic semantics. stabilize execution step strand state update die idle strands October 26th, 2015 AGERE 15 - Diderot 4
6 Diderot overview Diderot Parallelism Model (Cont d) This model only supports applications that allow strands to act independently of each other, such as direct volume rendering or fiber tractography. Diderot is missing features needed for algorithms of interest, such as particle systems. October 26th, 2015 AGERE 15 - Diderot 5
7 Diderot overview Diderot Parallelism Model (Cont d) This model only supports applications that allow strands to act independently of each other, such as direct volume rendering or fiber tractography. Diderot is missing features needed for algorithms of interest, such as particle systems. October 26th, 2015 AGERE 15 - Diderot 5
8 Spatial Design Communication Design Strands interact with each other based on their world coordinates. Strand uses special query functions to identify neighboring strands. Processing the queried sequence of strands is performed using a new mechanism called the foreach statement. Strand state is accessed using the selection operator. October 26th, 2015 AGERE 15 - Diderot 6
9 Spatial Design Communication Design Strands interact with each other based on their world coordinates. Strand uses special query functions to identify neighboring strands. Processing the queried sequence of strands is performed using a new mechanism called the foreach statement. Strand state is accessed using the selection operator. B D A Q C October 26th, 2015 AGERE 15 - Diderot 6
10 Spatial Design Communication Design Strands interact with each other based on their world coordinates. Strand uses special query functions to identify neighboring strands. Processing the queried sequence of strands is performed using a new mechanism called the foreach statement. Strand state is accessed using the selection operator. Circle Query B D A Q r r C October 26th, 2015 AGERE 15 - Diderot 6
11 Spatial Design Communication Design Strands interact with each other based on their world coordinates. Strand uses special query functions to identify neighboring strands. Processing the queried sequence of strands is performed using a new mechanism called the foreach statement. Strand state is accessed using the selection operator. Circle Query A Q r B r C D strand Particle (vec2 initpos) { vec2 pos = initpos; update { foreach (Particle neighbor in sphere(5.0)) {... October 26th, 2015 AGERE 15 - Diderot 6
12 Spatial Design Communication Design Strands interact with each other based on their world coordinates. Strand uses special query functions to identify neighboring strands. Processing the queried sequence of strands is performed using a new mechanism called the foreach statement. Strand state is accessed using the selection operator. Circle Query A Q r B r C D strand Particle (vec2 initpos) { vec2 pos = initpos; update { foreach (Particle neighbor in sphere(5.0)) { real dist = pos - neighbor.pos ;... October 26th, 2015 AGERE 15 - Diderot 6
13 Global Design Communication Design Strands can share state information on a larger scale within the system. Global communication is performed by using common reduction operations. The reduction operations reside in a new definition block called global. A reduction is performed across a set of strands. Strand Active Strand Stable Strand October 26th, 2015 AGERE 15 - Diderot 7
14 Global Design Communication Design Strands can share state information on a larger scale within the system. Global communication is performed by using common reduction operations. The reduction operations reside in a new definition block called global. A reduction is performed across a set of strands. Strand Active Strand Stable Strand Reductions: mean, max, min, all, exists, variance, sum, product October 26th, 2015 AGERE 15 - Diderot 7
15 Global Design Communication Design Strands can share state information on a larger scale within the system. Global communication is performed by using common reduction operations. The reduction operations reside in a new definition block called global. A reduction is performed across a set of strands. Strand Active Strand Stable Strand real gmean = 0; strand Particle (real initenergy) { vec2 initenergy; update { if ( energy < gmean) { // e.g., increase energy global { gmean = mean{ P.energy P in Particle.all Reductions: mean, max, min, all, exists, variance, sum, product October 26th, 2015 AGERE 15 - Diderot 7
16 Global Design Communication Design Strands can share state information on a larger scale within the system. Global communication is performed by using common reduction operations. The reduction operations reside in a new definition block called global. A reduction is performed across a set of strands. Strand Active Strand Stable Strand real gmean = 0; strand Particle (real initenergy) { vec2 initenergy; update { if ( energy < gmean) { Particle.active // e.g., increase energy Partical.stable global { gmean = mean{ P.energy P in Particle.all Reductions: mean, max, min, all, exists, variance, sum, product October 26th, 2015 AGERE 15 - Diderot 7
17 Spatial Implementation Communication Implementation Spatial communication is implemented using a tree-based spatial partitioning scheme (k-d tree). The tree is built using a strand s pos variable and is rebuilt before each iteration. y 24 X (12,4) A (2,22) F (3,15) I (17,20) J (10,19) H (17,18) G (9,17) D (14,13) Y (10,19) (17,18) 12 8 B (16,7) X (9,17) (2,22) (16,7) (17,20) 4 2 0,0 T (12,4) x Y (3,15) (14,13) October 26th, 2015 AGERE 15 - Diderot 8
18 Communication Implementation Global Implementation The global computation phase groups reductions into execution phases. This process is done in two steps: lifting and phase insertion. phase 0 r1 = mean(...); r3 = sum(...); r5 = product(...); Reduction Block phase 1 r2 = sum(...); Variable Phase phase 2 r4 = mean(...); a 0 lift b 1 c 0 d 2 e 0 Global Block a = mean(...); b = sum(...) + a; c = sum(...) * 10; d = mean(...)/ b; e = product(...) + 3; transform phase(0); a = r1; phase(1); b = r2 + a; c = r3 * 10; phase(2); d = r4 /b; e = r5 + 3; October 26th, 2015 AGERE 15 - Diderot 9
19 Dynamic allocation Strand Allocation Strands are dynamically allocated by the new statement. New strands will begin executing in the next iteration. New Strand Iteration i new Particle (pos + d); Iteration i + 1 October 26th, 2015 AGERE 15 - Diderot 10
20 Dynamic allocation Strand Allocation Strands are dynamically allocated by the new statement. New strands will begin executing in the next iteration. New Strand Iteration i new Particle (pos + d); Iteration i + 1 October 26th, 2015 AGERE 15 - Diderot 10
21 Evaluation Live Demo (Particle-Based Isocontour Sampler) Sampling Implicit Surfaces Sampling Valley Lines October 26th, 2015 AGERE 15 - Diderot 11
22 Summary Conclusion Provided new mechanisms to our programming model. Allows for more algorithms to be implemented within Diderot. Future Work Additional query functions Implementation of communication mechanisms on GPUs stabilize new execution step global computation strand state die update idle global computation read spawn strands October 26th, 2015 AGERE 15 - Diderot 12
23 Summary Conclusion Provided new mechanisms to our programming model. Allows for more algorithms to be implemented within Diderot. Future Work Additional query functions Implementation of communication mechanisms on GPUs stabilize new execution step global computation strand state die update idle global computation read spawn strands October 26th, 2015 AGERE 15 - Diderot 12
24 Summary Conclusion Provided new mechanisms to our programming model. Allows for more algorithms to be implemented within Diderot. Future Work Additional query functions Implementation of communication mechanisms on GPUs stabilize new execution step global computation strand state die update idle global computation read spawn strands October 26th, 2015 AGERE 15 - Diderot 12
25 Summary Conclusion Provided new mechanisms to our programming model. Allows for more algorithms to be implemented within Diderot. Future Work Additional query functions Implementation of communication mechanisms on GPUs stabilize new execution step global computation strand state die update idle global computation read spawn strands October 26th, 2015 AGERE 15 - Diderot 12
26 Conclusion Acknowledgements The Diderot Project Charisee Chiw Gordon Kindlmann John Reppy Nick Seltzer University of Chicago University of Chicago University of Chicago University of Chicago October 26th, 2015 AGERE 15 - Diderot 13
27 Conclusion Questions? October 26th, 2015 AGERE 15 - Diderot 14
Specific Language for Scientific Image Analysis and Visualization.
Diderot: Gordon A Parallel Domain- Specific Language for Scientific Image Analysis and Visualization Kindlmann glk@uchicago.edu Joint work with: Prof. John Reppy, Nicholas Seltzer, Charisee Chiw, and Lamont
More informationBulk-Synchronous Communication Mechanisms in Diderot
1 Bulk-Synchronous Communication Mechanisms in Diderot John Reppy, Lamont Samuels University of Chicago {jhr, lamonts}@cs.uchicago.edu Abstract Diderot is a parallel domain-specific language designed to
More informationBulk-Synchronous Communication Mechanisms in Diderot
Bulk-Synchronous Communication Mechanisms in Diderot John Reppy University of Chicago jhr@cs.uchicago.edu Lamont Samuels University of Chicago lamonts@cs.uchicago.edu Abstract Diderot is a parallel domain-specific
More informationDiscovering Stable Features in Scientific Images (and a bit about Diderot)
Discovering Stable Features in Scientific Images (and a bit about Diderot) Gordon Kindlmann http://people.cs.uchicago.edu/~glk/ glk@uchicago.edu Context Imaging Visualization Specimen 3D Image Data Analysis
More informationParticle systems for visualizing the connection between math and anatomy. Gordon L. Kindlmann
Particle systems for visualizing the connection between math and anatomy Gordon L. Kindlmann Context Visualization Imaging Analysis Anatomy 3D Image Data Goal: geometric models of anatomic features for
More informationDiderot: A Parallel DSL for Image Analysis and Visualization
Diderot: A Parallel DSL for Image Analysis and Visualization Charisee Chiw Gordon Kindlmann John Reppy Lamont Samuels Nick Seltzer University of Chicago {cchiw,glk,jhr,lamonts,nseltzer}@cs.uchicago.edu
More informationDiderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis
Oct 29 2015 Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis Gordon Kindlmann, Charisee Chiw, Nicholas Seltzer, Lamont Samuels, John Reppy Scientific
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Pablo Brubeck Department of Physics Tecnologico de Monterrey October 14, 2016 Student Chapter Tecnológico de Monterrey Tecnológico de Monterrey Student Chapter Outline
More informationDiderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis
Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis Gordon Kindlmann Charisee Chiw Nicholas Seltzer Lamont Samuels John Reppy Department of Computer Science,
More informationDiscrete Element Method
Discrete Element Method Midterm Project: Option 2 1 Motivation NASA, loyalkng.com, cowboymining.com Industries: Mining Food & Pharmaceutics Film & Game etc. Problem examples: Collapsing Silos Mars Rover
More informationGPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA
GPGPU LAB Case study: Finite-Difference Time- Domain Method on CUDA Ana Balevic IPVS 1 Finite-Difference Time-Domain Method Numerical computation of solutions to partial differential equations Explicit
More informationGPU-based Distributed Behavior Models with CUDA
GPU-based Distributed Behavior Models with CUDA Courtesy: YouTube, ISIS Lab, Universita degli Studi di Salerno Bradly Alicea Introduction Flocking: Reynolds boids algorithm. * models simple local behaviors
More informationCUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.
Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication
More informationCS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1
CS4961 Parallel Programming Lecture 4: Data and Task Parallelism Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following command:
More informationF# for Scientists. Jon Harrop Flying Frog Consultancy Ltd. Foreword by Don Syme A JOHN WILEY & SONS, INC., PUBLICATION WILEY
F# for Scientists Jon Harrop Flying Frog Consultancy Ltd. Foreword by Don Syme WILEY A JOHN WILEY & SONS, INC., PUBLICATION Preface Acknowledgments List of Figi ares List of Tables Acronyms 1 Introduction
More informationOpenMP 4.0/4.5. Mark Bull, EPCC
OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all
More informationContext Tokens for Parallel Algorithms
Doc No: P0335R1 Date: 2018-10-07 Audience: SG1 (parallelism & concurrency) Authors: Pablo Halpern, Intel Corp. phalpern@halpernwightsoftware.com Context Tokens for Parallel Algorithms Contents 1 Abstract...
More informationvec4f a =...; vec4f b; mat4f M =...; b = M * a; Figure 1: Transforming vector a by matrix M and storing the result in b. 2.2 Stream Functions The elem
Brook: A Streaming Programming Language I. Buck 8 October 2001 1 Motivation 1.1 Why design our own language? ffl Modern processors can be limited by communication rather than computation. However, communication
More informationDiffusion Imaging Visualization
Diffusion Imaging Visualization Thomas Schultz URL: http://cg.cs.uni-bonn.de/schultz/ E-Mail: schultz@cs.uni-bonn.de 1 Outline Introduction to Diffusion Imaging Basic techniques Glyph-based Visualization
More informationCompute Shaders. Christian Hafner. Institute of Computer Graphics and Algorithms Vienna University of Technology
Compute Shaders Christian Hafner Institute of Computer Graphics and Algorithms Vienna University of Technology Overview Introduction Thread Hierarchy Memory Resources Shared Memory & Synchronization Christian
More informationCS4961 Parallel Programming. Lecture 5: Data and Task Parallelism, cont. 9/8/09. Administrative. Mary Hall September 8, 2009.
CS4961 Parallel Programming Lecture 5: Data and Task Parallelism, cont. Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following
More informationOpenMP 4.0. Mark Bull, EPCC
OpenMP 4.0 Mark Bull, EPCC OpenMP 4.0 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all devices!
More informationScalable Shared Memory Programing
Scalable Shared Memory Programing Marc Snir www.parallel.illinois.edu What is (my definition of) Shared Memory Global name space (global references) Implicit data movement Caching: User gets good memory
More informationTo Do. Advanced Computer Graphics. Discrete Convolution. Outline. Outline. Implementing Discrete Convolution
Advanced Computer Graphics CSE 163 [Spring 2018], Lecture 4 Ravi Ramamoorthi http://www.cs.ucsd.edu/~ravir To Do Assignment 1, Due Apr 27. Please START EARLY This lecture completes all the material you
More informationAutomatic translation from CUDA to C++ Luca Atzori, Vincenzo Innocente, Felice Pantaleo, Danilo Piparo
Automatic translation from CUDA to C++ Luca Atzori, Vincenzo Innocente, Felice Pantaleo, Danilo Piparo 31 August, 2015 Goals Running CUDA code on CPUs. Why? Performance portability! A major challenge faced
More informationFast Third-Order Texture Filtering
Chapter 20 Fast Third-Order Texture Filtering Christian Sigg ETH Zurich Markus Hadwiger VRVis Research Center Current programmable graphics hardware makes it possible to implement general convolution filters
More informationOverview and Introduction to Scientific Visualization. Texas Advanced Computing Center The University of Texas at Austin
Overview and Introduction to Scientific Visualization Texas Advanced Computing Center The University of Texas at Austin Scientific Visualization The purpose of computing is insight not numbers. -- R. W.
More informationBrook: A Streaming Programming Language
Brook: A Streaming Programming Language I. Buck 8 October 2001 1 Motivation 1.1 Why design our own language? Modern processors can be limited by communication rather than computation. However, communication
More informationCilk Plus GETTING STARTED
Cilk Plus GETTING STARTED Overview Fundamentals of Cilk Plus Hyperobjects Compiler Support Case Study 3/17/2015 CHRIS SZALWINSKI 2 Fundamentals of Cilk Plus Terminology Execution Model Language Extensions
More informationKSTAR tokamak. /
KSTAR tokamak / spinhalf@nfri.re.kr !!! Data parallelism CUDA programming python! pycuda GPU Development tools Python 2.6+ Scientific libraries as python package for interactive computing (Numpy, Scipy..)
More informationScalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009
Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation
More informationAtomic Operations. Atomic operations, fast reduction. GPU Programming. Szénási Sándor.
Atomic Operations Atomic operations, fast reduction GPU Programming http://cuda.nik.uni-obuda.hu Szénási Sándor szenasi.sandor@nik.uni-obuda.hu GPU Education Center of Óbuda University ATOMIC OPERATIONS
More informationThe GPGPU Programming Model
The Programming Model Institute for Data Analysis and Visualization University of California, Davis Overview Data-parallel programming basics The GPU as a data-parallel computer Hello World Example Programming
More informationTag a Tiny Aggregation Service for Ad-Hoc Sensor Networks. Samuel Madden, Michael Franklin, Joseph Hellerstein,Wei Hong UC Berkeley Usinex OSDI 02
Tag a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden, Michael Franklin, Joseph Hellerstein,Wei Hong UC Berkeley Usinex OSDI 02 Outline Introduction The Tiny AGgregation Approach Aggregate
More informationParallel Sorting Algorithms
CSC 391/691: GPU Programming Fall 015 Parallel Sorting Algorithms Copyright 015 Samuel S. Cho Sorting Algorithms Review Bubble Sort: O(n ) Insertion Sort: O(n ) Quick Sort: O(n log n) Heap Sort: O(n log
More informationGAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES TAKAHIRO HARADA, AMD DESTRUCTION FOR GAMES ERWIN COUMANS, AMD
GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES TAKAHIRO HARADA, AMD DESTRUCTION FOR GAMES ERWIN COUMANS, AMD GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES Jason Yang, Takahiro Harada AMD HYBRID CPU-GPU
More information3D Environment Reconstruction
3D Environment Reconstruction Using Modified Color ICP Algorithm by Fusion of a Camera and a 3D Laser Range Finder The 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems October 11-15,
More informationCUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata
CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids
More informationAbstract Memory The TupleSpace
Abstract Memory The TupleSpace Linda, JavaSpaces, Tspaces and PastSet Tuple Space Main idea: Byte ordered memory is a product of hardware development, not programmers needs Remodel memory to support the
More informationL15: Putting it together: N-body (Ch. 6)!
Outline L15: Putting it together: N-body (Ch. 6)! October 30, 2012! Review MPI Communication - Blocking - Non-Blocking - One-Sided - Point-to-Point vs. Collective Chapter 6 shows two algorithms (N-body
More informationParallel Techniques. Embarrassingly Parallel Computations. Partitioning and Divide-and-Conquer Strategies
slides3-1 Parallel Techniques Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations Load Balancing
More informationPHYSICALLY BASED ANIMATION
PHYSICALLY BASED ANIMATION CS148 Introduction to Computer Graphics and Imaging David Hyde August 2 nd, 2016 WHAT IS PHYSICS? the study of everything? WHAT IS COMPUTATION? the study of everything? OUTLINE
More informationOptimisation Myths and Facts as Seen in Statistical Physics
Optimisation Myths and Facts as Seen in Statistical Physics Massimo Bernaschi Institute for Applied Computing National Research Council & Computer Science Department University La Sapienza Rome - ITALY
More informationHW DUE Floating point
Numerical and Scientific Computing with Applications David F. Gleich CS 314, Purdue In this class: Understand the need for floating point arithmetic and some alternatives. Understand how the computer represents
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 2 Sajjad Haider Spring 2010 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric
More informationUnsupervised Learning Partitioning Methods
Unsupervised Learning Partitioning Methods Road Map 1. Basic Concepts 2. K-Means 3. K-Medoids 4. CLARA & CLARANS Cluster Analysis Unsupervised learning (i.e., Class label is unknown) Group data to form
More informationSource Coding Techniques
Source Coding Techniques Source coding is based on changing the content of the original signal. Also called semantic-based coding. Compression rates may be higher but at a price of loss of information.
More informationOptimal 3D Lattices in Scientific Visualization and Computer Graphics
Optimal 3D Lattices in Scientific Visualization and Computer Graphics School of Computing Science Simon Fraser University Jan, 2007 Outline Visualization and Graphics 1 Visualization and Graphics 2 3 Further
More informationParticle-based Fluid Simulation
Simulation in Computer Graphics Particle-based Fluid Simulation Matthias Teschner Computer Science Department University of Freiburg Application (with Pixar) 10 million fluid + 4 million rigid particles,
More informationSystems Design and Implementation I.4 Naming in a Multiserver OS
Systems Design and Implementation I.4 Naming in a Multiserver OS System, SS 2009 University of Karlsruhe 06.5.2009 Jan Stoess University of Karlsruhe The Issue 2 The Issue In system construction we combine
More informationCS 261 Data Structures. Priority Queue ADT & Heaps
CS 61 Data Structures Priority Queue ADT & Heaps Priority Queue ADT Associates a priority with each object: First element has the highest priority (typically, lowest value) Examples of priority queues:
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationRecovering dual-level rough surface parameters from simple lighting. Graphics Seminar, Fall 2010
Recovering dual-level rough surface parameters from simple lighting Chun-Po Wang Noah Snavely Steve Marschner Graphics Seminar, Fall 2010 Motivation and Goal Many surfaces are rough Motivation and Goal
More informationGPGPU Programming in Haskell with Accelerate
GPGPU Programming in Haskell with Accelerate Trevor L. McDonell University of New South Wales @tlmcdonell tmcdonell@cse.unsw.edu.au https://github.com/acceleratehs Preliminaries http://hackage.haskell.org/package/accelerate
More informationScheduling Image Processing Pipelines
Lecture 15: Scheduling Image Processing Pipelines Visual Computing Systems Simple image processing kernel int WIDTH = 1024; int HEIGHT = 1024; float input[width * HEIGHT]; float output[width * HEIGHT];
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationL3: Data Partitioning and Memory Organization, Synchronization
Outline More on CUDA L3: Data Partitioning and Organization, Synchronization January 21, 2009 1 First assignment, due Jan. 30 (later in lecture) Error checking mechanisms Synchronization More on data partitioning
More informationMay 8-11, 2017 Silicon Valley CUDA 9 AND BEYOND. Mark Harris, May 10, 2017
May 8-11, 2017 Silicon Valley CUDA 9 AND BEYOND Mark Harris, May 10, 2017 INTRODUCING CUDA 9 BUILT FOR VOLTA FASTER LIBRARIES Tesla V100 New GPU Architecture Tensor Cores NVLink Independent Thread Scheduling
More informationIsosurface Visualization of Data with Nonparametric Models for Uncertainty
Isosurface Visualization of Data with Nonparametric Models for Uncertainty Tushar Athawale, Elham Sakhaee, and Alireza Entezari Department of Computer & Information Science & Engineering University of
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationRay Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University
Ray Tracing III Wen-Chieh (Steve) Lin National Chiao-Tung University Shirley, Fundamentals of Computer Graphics, Chap 10 Doug James CG slides, I-Chen Lin s CG slides Ray-tracing Review For each pixel,
More informationWelcome to the "Concurrent programming made easy" demonstration video. Let's begin with the first demo, which shows many actors of the same type,
Modernize common concurrency patterns with actors Barry Feigenbaum Transcript 0:01 Hello, my name is Barry Feigenbaum. 0:03 Welcome to the "Concurrent programming made easy" demonstration video. 0:11 Let's
More informationExercises C-Programming
Exercises C-Programming Claude Fuhrer (claude.fuhrer@bfh.ch) 0 November 016 Contents 1 Serie 1 1 Min function.................................. Triangle surface 1............................... 3 Triangle
More informationExploiting Depth Camera for 3D Spatial Relationship Interpretation
Exploiting Depth Camera for 3D Spatial Relationship Interpretation Jun Ye Kien A. Hua Data Systems Group, University of Central Florida Mar 1, 2013 Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships
More informationMany rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.
1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,
More informationGPU programming basics. Prof. Marco Bertini
GPU programming basics Prof. Marco Bertini CUDA: atomic operations, privatization, algorithms Atomic operations The basics atomic operation in hardware is something like a read-modify-write operation performed
More informationOPENCL. Episode 6 - Shared Memory Kernel Optimization
OPENCL Episode 6 - Shared Memory Kernel Optimization David W. Gohara, Ph.D. Center for Computational Biology Washington University School of Medicine, St. Louis email: sdg0919@gmail.com THANK YOU SHARED
More informationBasic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003 Topic Overview One-to-All Broadcast
More informationCS 179: GPU Computing. Recitation 2: Synchronization, Shared memory, Matrix Transpose
CS 179: GPU Computing Recitation 2: Synchronization, Shared memory, Matrix Transpose Synchronization Ideal case for parallelism: no resources shared between threads no communication between threads Many
More informationVolume Ray Casting Neslisah Torosdagli
Volume Ray Casting Neslisah Torosdagli Overview Light Transfer Optical Models Math behind Direct Volume Ray Casting Demonstration Transfer Functions Details of our Application References What is Volume
More informationBrook+ Data Types. Basic Data Types
Brook+ Data Types Important for all data representations in Brook+ Streams Constants Temporary variables Brook+ Supports Basic Types Short Vector Types User-Defined Types 29 Basic Data Types Basic data
More informationHighly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs
Highly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs Kaixi Hou, Hao Wang, Wu chun Feng {kaixihou, hwang121, wfeng}@vt.edu Jeffrey S. Vetter, Seyong Lee vetter@computer.org, lees2@ornl.gov
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationArizona Academic Standards
Arizona Academic Standards This chart correlates the Grade 8 performance objectives from the mathematics standard of the Arizona Academic Standards to the lessons in Review, Practice, and Mastery. Lesson
More informationModule 12 Floating-Point Considerations
GPU Teaching Kit Accelerated Computing Module 12 Floating-Point Considerations Lecture 12.1 - Floating-Point Precision and Accuracy Objective To understand the fundamentals of floating-point representation
More informationRealtime Water Simulation on GPU. Nuttapong Chentanez NVIDIA Research
1 Realtime Water Simulation on GPU Nuttapong Chentanez NVIDIA Research 2 3 Overview Approaches to realtime water simulation Hybrid shallow water solver + particles Hybrid 3D tall cell water solver + particles
More informationCS240: Programming in C
CS240: Programming in C Lecture 5: Functions. Scope of variables. Program structure. Cristina Nita-Rotaru Lecture 5/ Fall 2013 1 Functions: Explicit declaration Declaration, definition, use, order matters.
More informationImage Processing. Traitement d images. Yuliya Tarabalka Tel.
Traitement d images Yuliya Tarabalka yuliya.tarabalka@hyperinet.eu yuliya.tarabalka@gipsa-lab.grenoble-inp.fr Tel. 04 76 82 62 68 Noise reduction Image restoration Restoration attempts to reconstruct an
More informationApplications of MPI I
Applications of MPI I N-Dimensional Integration Using Monte Carlo Techniques Gürsan ÇOBAN 4.06.202 Outline Numerical Integration Techniques Monte Carlo Techniques for Numerical Integration Some MPI Examples
More information2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123
2.7 Cloth Animation 320491: Advanced Graphics - Chapter 2 123 Example: Cloth draping Image Michael Kass 320491: Advanced Graphics - Chapter 2 124 Cloth using mass-spring model Network of masses and springs
More informationA Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography
1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography
More informationC E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S. Image Operations II
T H E U N I V E R S I T Y of T E X A S H E A L T H S C I E N C E C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S Image Operations II For students of HI 5323
More informationModal Logic: Implications for Design of a Language for Distributed Computation p.1/53
Modal Logic: Implications for Design of a Language for Distributed Computation Jonathan Moody (with Frank Pfenning) Department of Computer Science Carnegie Mellon University Modal Logic: Implications for
More informationIntroduction to Multicore Programming
Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Synchronization 3 Automatic Parallelization and OpenMP 4 GPGPU 5 Q& A 2 Multithreaded
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationPoint Lattices in Computer Graphics and Visualization how signal processing may help computer graphics
Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics Dimitri Van De Ville Ecole Polytechnique Fédérale de Lausanne Biomedical Imaging Group dimitri.vandeville@epfl.ch
More informationIntroduction to Multicore Programming
Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationCS490D: Introduction to Data Mining Prof. Chris Clifton
CS490D: Introduction to Data Mining Prof. Chris Clifton April 5, 2004 Mining of Time Series Data Time-series database Mining Time-Series and Sequence Data Consists of sequences of values or events changing
More informationCS420: Operating Systems
Threading Issues James Moscola Department of Engineering & Computer Science York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threading Issues
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing
More informationHands-On Introduction to. LabVIEW. for Scientists and Engineers. Second Edition. John Essick. Reed College OXFORD UNIVERSITY PRESS
Hands-On Introduction to LabVIEW for Scientists and Engineers Second Edition John Essick Reed College New York Oxford OXFORD UNIVERSITY PRESS Contents. Preface xiii 1. THE WHILE LOOP AND WAVEFORM CHART
More informationImplementation Details of GPU-based Out-of-Core Many-Lights Rendering
Implementation Details of GPU-based Out-of-Core Many-Lights Rendering Rui Wang, Yuchi Huo, Yazhen Yuan, Kun Zhou, Wei Hua, Hujun Bao State Key Lab of CAD&CG, Zhejiang University In this document, we provide
More informationOverview of Traditional Surface Tracking Methods
Liquid Simulation With Mesh-Based Surface Tracking Overview of Traditional Surface Tracking Methods Matthias Müller Introduction Research lead of NVIDIA PhysX team PhysX GPU acc. Game physics engine www.nvidia.com\physx
More informationImage Processing Tricks in OpenGL. Simon Green NVIDIA Corporation
Image Processing Tricks in OpenGL Simon Green NVIDIA Corporation Overview Image Processing in Games Histograms Recursive filters JPEG Discrete Cosine Transform Image Processing in Games Image processing
More informationUberFlow: A GPU-Based Particle Engine
UberFlow: A GPU-Based Particle Engine Peter Kipfer Mark Segal Rüdiger Westermann Technische Universität München ATI Research Technische Universität München Motivation Want to create, modify and render
More informationAliasing. Can t draw smooth lines on discrete raster device get staircased lines ( jaggies ):
(Anti)Aliasing and Image Manipulation for (y = 0; y < Size; y++) { for (x = 0; x < Size; x++) { Image[x][y] = 7 + 8 * sin((sqr(x Size) + SQR(y Size)) / 3.0); } } // Size = Size / ; Aliasing Can t draw
More informationParallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville
Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information
More informationExtraction of Level Sets from 2D Images
Extraction of Level Sets from 2D Images David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License.
More information