Bulk-Synchronous Communication Mechanisms in Diderot

Size: px
Start display at page:

Download "Bulk-Synchronous Communication Mechanisms in Diderot"

Transcription

1 Bulk-Synchronous Communication Mechanisms in Diderot John Reppy, Lamont Samuels University of Chicago October 26th, 2015

2 Diderot Diderot overview Diderot is a parallel domain-specific language designed for biomedical image-analysis and visualization algorithms. Its design models the algorithmic structure of its application domain 1 : independent strands computing over a continuous tensor field, which are reconstructed from discrete data using a separable convolution kernel h: F = V h 1 Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis. G Kindlmann, C Chiw, N Seltzer, L Samuels, and J Reppy. (to appear VIS 2015). October 26th, 2015 AGERE 15 - Diderot 2

3 Diderot overview Diderot Diderot is a parallel domain-specific language designed for biomedical image-analysis and visualization algorithms. Its design models the algorithmic structure of its application domain 1 : independent strands computing over a continuous tensor field, which are reconstructed from discrete data using a separable convolution kernel h: F = V h V h F Discrete image data Continuous field 1 Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis. G Kindlmann, C Chiw, N Seltzer, L Samuels, and J Reppy. (to appear VIS 2015). October 26th, 2015 AGERE 15 - Diderot 2

4 Diderot overview Diderot Program Structure Square roots of integers using Heron s method. // global definitions input int N = 1000; input real eps = ; // strand definition strand SqRoot (real val) { output real root = val; update { root = (root + val/root) / 2.0; if ( rootˆ2 - val /val < eps) stabilize; // initialization initially [ SqRoot(real(i)) i in 1..N ] October 26th, 2015 AGERE 15 - Diderot 3

5 Diderot overview Diderot Parallelism Model Bulk-synchronous parallel with deterministic semantics. stabilize execution step strand state update die idle strands October 26th, 2015 AGERE 15 - Diderot 4

6 Diderot overview Diderot Parallelism Model (Cont d) This model only supports applications that allow strands to act independently of each other, such as direct volume rendering or fiber tractography. Diderot is missing features needed for algorithms of interest, such as particle systems. October 26th, 2015 AGERE 15 - Diderot 5

7 Diderot overview Diderot Parallelism Model (Cont d) This model only supports applications that allow strands to act independently of each other, such as direct volume rendering or fiber tractography. Diderot is missing features needed for algorithms of interest, such as particle systems. October 26th, 2015 AGERE 15 - Diderot 5

8 Spatial Design Communication Design Strands interact with each other based on their world coordinates. Strand uses special query functions to identify neighboring strands. Processing the queried sequence of strands is performed using a new mechanism called the foreach statement. Strand state is accessed using the selection operator. October 26th, 2015 AGERE 15 - Diderot 6

9 Spatial Design Communication Design Strands interact with each other based on their world coordinates. Strand uses special query functions to identify neighboring strands. Processing the queried sequence of strands is performed using a new mechanism called the foreach statement. Strand state is accessed using the selection operator. B D A Q C October 26th, 2015 AGERE 15 - Diderot 6

10 Spatial Design Communication Design Strands interact with each other based on their world coordinates. Strand uses special query functions to identify neighboring strands. Processing the queried sequence of strands is performed using a new mechanism called the foreach statement. Strand state is accessed using the selection operator. Circle Query B D A Q r r C October 26th, 2015 AGERE 15 - Diderot 6

11 Spatial Design Communication Design Strands interact with each other based on their world coordinates. Strand uses special query functions to identify neighboring strands. Processing the queried sequence of strands is performed using a new mechanism called the foreach statement. Strand state is accessed using the selection operator. Circle Query A Q r B r C D strand Particle (vec2 initpos) { vec2 pos = initpos; update { foreach (Particle neighbor in sphere(5.0)) {... October 26th, 2015 AGERE 15 - Diderot 6

12 Spatial Design Communication Design Strands interact with each other based on their world coordinates. Strand uses special query functions to identify neighboring strands. Processing the queried sequence of strands is performed using a new mechanism called the foreach statement. Strand state is accessed using the selection operator. Circle Query A Q r B r C D strand Particle (vec2 initpos) { vec2 pos = initpos; update { foreach (Particle neighbor in sphere(5.0)) { real dist = pos - neighbor.pos ;... October 26th, 2015 AGERE 15 - Diderot 6

13 Global Design Communication Design Strands can share state information on a larger scale within the system. Global communication is performed by using common reduction operations. The reduction operations reside in a new definition block called global. A reduction is performed across a set of strands. Strand Active Strand Stable Strand October 26th, 2015 AGERE 15 - Diderot 7

14 Global Design Communication Design Strands can share state information on a larger scale within the system. Global communication is performed by using common reduction operations. The reduction operations reside in a new definition block called global. A reduction is performed across a set of strands. Strand Active Strand Stable Strand Reductions: mean, max, min, all, exists, variance, sum, product October 26th, 2015 AGERE 15 - Diderot 7

15 Global Design Communication Design Strands can share state information on a larger scale within the system. Global communication is performed by using common reduction operations. The reduction operations reside in a new definition block called global. A reduction is performed across a set of strands. Strand Active Strand Stable Strand real gmean = 0; strand Particle (real initenergy) { vec2 initenergy; update { if ( energy < gmean) { // e.g., increase energy global { gmean = mean{ P.energy P in Particle.all Reductions: mean, max, min, all, exists, variance, sum, product October 26th, 2015 AGERE 15 - Diderot 7

16 Global Design Communication Design Strands can share state information on a larger scale within the system. Global communication is performed by using common reduction operations. The reduction operations reside in a new definition block called global. A reduction is performed across a set of strands. Strand Active Strand Stable Strand real gmean = 0; strand Particle (real initenergy) { vec2 initenergy; update { if ( energy < gmean) { Particle.active // e.g., increase energy Partical.stable global { gmean = mean{ P.energy P in Particle.all Reductions: mean, max, min, all, exists, variance, sum, product October 26th, 2015 AGERE 15 - Diderot 7

17 Spatial Implementation Communication Implementation Spatial communication is implemented using a tree-based spatial partitioning scheme (k-d tree). The tree is built using a strand s pos variable and is rebuilt before each iteration. y 24 X (12,4) A (2,22) F (3,15) I (17,20) J (10,19) H (17,18) G (9,17) D (14,13) Y (10,19) (17,18) 12 8 B (16,7) X (9,17) (2,22) (16,7) (17,20) 4 2 0,0 T (12,4) x Y (3,15) (14,13) October 26th, 2015 AGERE 15 - Diderot 8

18 Communication Implementation Global Implementation The global computation phase groups reductions into execution phases. This process is done in two steps: lifting and phase insertion. phase 0 r1 = mean(...); r3 = sum(...); r5 = product(...); Reduction Block phase 1 r2 = sum(...); Variable Phase phase 2 r4 = mean(...); a 0 lift b 1 c 0 d 2 e 0 Global Block a = mean(...); b = sum(...) + a; c = sum(...) * 10; d = mean(...)/ b; e = product(...) + 3; transform phase(0); a = r1; phase(1); b = r2 + a; c = r3 * 10; phase(2); d = r4 /b; e = r5 + 3; October 26th, 2015 AGERE 15 - Diderot 9

19 Dynamic allocation Strand Allocation Strands are dynamically allocated by the new statement. New strands will begin executing in the next iteration. New Strand Iteration i new Particle (pos + d); Iteration i + 1 October 26th, 2015 AGERE 15 - Diderot 10

20 Dynamic allocation Strand Allocation Strands are dynamically allocated by the new statement. New strands will begin executing in the next iteration. New Strand Iteration i new Particle (pos + d); Iteration i + 1 October 26th, 2015 AGERE 15 - Diderot 10

21 Evaluation Live Demo (Particle-Based Isocontour Sampler) Sampling Implicit Surfaces Sampling Valley Lines October 26th, 2015 AGERE 15 - Diderot 11

22 Summary Conclusion Provided new mechanisms to our programming model. Allows for more algorithms to be implemented within Diderot. Future Work Additional query functions Implementation of communication mechanisms on GPUs stabilize new execution step global computation strand state die update idle global computation read spawn strands October 26th, 2015 AGERE 15 - Diderot 12

23 Summary Conclusion Provided new mechanisms to our programming model. Allows for more algorithms to be implemented within Diderot. Future Work Additional query functions Implementation of communication mechanisms on GPUs stabilize new execution step global computation strand state die update idle global computation read spawn strands October 26th, 2015 AGERE 15 - Diderot 12

24 Summary Conclusion Provided new mechanisms to our programming model. Allows for more algorithms to be implemented within Diderot. Future Work Additional query functions Implementation of communication mechanisms on GPUs stabilize new execution step global computation strand state die update idle global computation read spawn strands October 26th, 2015 AGERE 15 - Diderot 12

25 Summary Conclusion Provided new mechanisms to our programming model. Allows for more algorithms to be implemented within Diderot. Future Work Additional query functions Implementation of communication mechanisms on GPUs stabilize new execution step global computation strand state die update idle global computation read spawn strands October 26th, 2015 AGERE 15 - Diderot 12

26 Conclusion Acknowledgements The Diderot Project Charisee Chiw Gordon Kindlmann John Reppy Nick Seltzer University of Chicago University of Chicago University of Chicago University of Chicago October 26th, 2015 AGERE 15 - Diderot 13

27 Conclusion Questions? October 26th, 2015 AGERE 15 - Diderot 14

Specific Language for Scientific Image Analysis and Visualization.

Specific Language for Scientific Image Analysis and Visualization. Diderot: Gordon A Parallel Domain- Specific Language for Scientific Image Analysis and Visualization Kindlmann glk@uchicago.edu Joint work with: Prof. John Reppy, Nicholas Seltzer, Charisee Chiw, and Lamont

More information

Bulk-Synchronous Communication Mechanisms in Diderot

Bulk-Synchronous Communication Mechanisms in Diderot 1 Bulk-Synchronous Communication Mechanisms in Diderot John Reppy, Lamont Samuels University of Chicago {jhr, lamonts}@cs.uchicago.edu Abstract Diderot is a parallel domain-specific language designed to

More information

Bulk-Synchronous Communication Mechanisms in Diderot

Bulk-Synchronous Communication Mechanisms in Diderot Bulk-Synchronous Communication Mechanisms in Diderot John Reppy University of Chicago jhr@cs.uchicago.edu Lamont Samuels University of Chicago lamonts@cs.uchicago.edu Abstract Diderot is a parallel domain-specific

More information

Discovering Stable Features in Scientific Images (and a bit about Diderot)

Discovering Stable Features in Scientific Images (and a bit about Diderot) Discovering Stable Features in Scientific Images (and a bit about Diderot) Gordon Kindlmann http://people.cs.uchicago.edu/~glk/ glk@uchicago.edu Context Imaging Visualization Specimen 3D Image Data Analysis

More information

Particle systems for visualizing the connection between math and anatomy. Gordon L. Kindlmann

Particle systems for visualizing the connection between math and anatomy. Gordon L. Kindlmann Particle systems for visualizing the connection between math and anatomy Gordon L. Kindlmann Context Visualization Imaging Analysis Anatomy 3D Image Data Goal: geometric models of anatomic features for

More information

Diderot: A Parallel DSL for Image Analysis and Visualization

Diderot: A Parallel DSL for Image Analysis and Visualization Diderot: A Parallel DSL for Image Analysis and Visualization Charisee Chiw Gordon Kindlmann John Reppy Lamont Samuels Nick Seltzer University of Chicago {cchiw,glk,jhr,lamonts,nseltzer}@cs.uchicago.edu

More information

Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis

Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis Oct 29 2015 Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis Gordon Kindlmann, Charisee Chiw, Nicholas Seltzer, Lamont Samuels, John Reppy Scientific

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Pablo Brubeck Department of Physics Tecnologico de Monterrey October 14, 2016 Student Chapter Tecnológico de Monterrey Tecnológico de Monterrey Student Chapter Outline

More information

Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis

Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis Gordon Kindlmann Charisee Chiw Nicholas Seltzer Lamont Samuels John Reppy Department of Computer Science,

More information

Discrete Element Method

Discrete Element Method Discrete Element Method Midterm Project: Option 2 1 Motivation NASA, loyalkng.com, cowboymining.com Industries: Mining Food & Pharmaceutics Film & Game etc. Problem examples: Collapsing Silos Mars Rover

More information

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA GPGPU LAB Case study: Finite-Difference Time- Domain Method on CUDA Ana Balevic IPVS 1 Finite-Difference Time-Domain Method Numerical computation of solutions to partial differential equations Explicit

More information

GPU-based Distributed Behavior Models with CUDA

GPU-based Distributed Behavior Models with CUDA GPU-based Distributed Behavior Models with CUDA Courtesy: YouTube, ISIS Lab, Universita degli Studi di Salerno Bradly Alicea Introduction Flocking: Reynolds boids algorithm. * models simple local behaviors

More information

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions. Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication

More information

CS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1

CS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1 CS4961 Parallel Programming Lecture 4: Data and Task Parallelism Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following command:

More information

F# for Scientists. Jon Harrop Flying Frog Consultancy Ltd. Foreword by Don Syme A JOHN WILEY & SONS, INC., PUBLICATION WILEY

F# for Scientists. Jon Harrop Flying Frog Consultancy Ltd. Foreword by Don Syme A JOHN WILEY & SONS, INC., PUBLICATION WILEY F# for Scientists Jon Harrop Flying Frog Consultancy Ltd. Foreword by Don Syme WILEY A JOHN WILEY & SONS, INC., PUBLICATION Preface Acknowledgments List of Figi ares List of Tables Acronyms 1 Introduction

More information

OpenMP 4.0/4.5. Mark Bull, EPCC

OpenMP 4.0/4.5. Mark Bull, EPCC OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all

More information

Context Tokens for Parallel Algorithms

Context Tokens for Parallel Algorithms Doc No: P0335R1 Date: 2018-10-07 Audience: SG1 (parallelism & concurrency) Authors: Pablo Halpern, Intel Corp. phalpern@halpernwightsoftware.com Context Tokens for Parallel Algorithms Contents 1 Abstract...

More information

vec4f a =...; vec4f b; mat4f M =...; b = M * a; Figure 1: Transforming vector a by matrix M and storing the result in b. 2.2 Stream Functions The elem

vec4f a =...; vec4f b; mat4f M =...; b = M * a; Figure 1: Transforming vector a by matrix M and storing the result in b. 2.2 Stream Functions The elem Brook: A Streaming Programming Language I. Buck 8 October 2001 1 Motivation 1.1 Why design our own language? ffl Modern processors can be limited by communication rather than computation. However, communication

More information

Diffusion Imaging Visualization

Diffusion Imaging Visualization Diffusion Imaging Visualization Thomas Schultz URL: http://cg.cs.uni-bonn.de/schultz/ E-Mail: schultz@cs.uni-bonn.de 1 Outline Introduction to Diffusion Imaging Basic techniques Glyph-based Visualization

More information

Compute Shaders. Christian Hafner. Institute of Computer Graphics and Algorithms Vienna University of Technology

Compute Shaders. Christian Hafner. Institute of Computer Graphics and Algorithms Vienna University of Technology Compute Shaders Christian Hafner Institute of Computer Graphics and Algorithms Vienna University of Technology Overview Introduction Thread Hierarchy Memory Resources Shared Memory & Synchronization Christian

More information

CS4961 Parallel Programming. Lecture 5: Data and Task Parallelism, cont. 9/8/09. Administrative. Mary Hall September 8, 2009.

CS4961 Parallel Programming. Lecture 5: Data and Task Parallelism, cont. 9/8/09. Administrative. Mary Hall September 8, 2009. CS4961 Parallel Programming Lecture 5: Data and Task Parallelism, cont. Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following

More information

OpenMP 4.0. Mark Bull, EPCC

OpenMP 4.0. Mark Bull, EPCC OpenMP 4.0 Mark Bull, EPCC OpenMP 4.0 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all devices!

More information

Scalable Shared Memory Programing

Scalable Shared Memory Programing Scalable Shared Memory Programing Marc Snir www.parallel.illinois.edu What is (my definition of) Shared Memory Global name space (global references) Implicit data movement Caching: User gets good memory

More information

To Do. Advanced Computer Graphics. Discrete Convolution. Outline. Outline. Implementing Discrete Convolution

To Do. Advanced Computer Graphics. Discrete Convolution. Outline. Outline. Implementing Discrete Convolution Advanced Computer Graphics CSE 163 [Spring 2018], Lecture 4 Ravi Ramamoorthi http://www.cs.ucsd.edu/~ravir To Do Assignment 1, Due Apr 27. Please START EARLY This lecture completes all the material you

More information

Automatic translation from CUDA to C++ Luca Atzori, Vincenzo Innocente, Felice Pantaleo, Danilo Piparo

Automatic translation from CUDA to C++ Luca Atzori, Vincenzo Innocente, Felice Pantaleo, Danilo Piparo Automatic translation from CUDA to C++ Luca Atzori, Vincenzo Innocente, Felice Pantaleo, Danilo Piparo 31 August, 2015 Goals Running CUDA code on CPUs. Why? Performance portability! A major challenge faced

More information

Fast Third-Order Texture Filtering

Fast Third-Order Texture Filtering Chapter 20 Fast Third-Order Texture Filtering Christian Sigg ETH Zurich Markus Hadwiger VRVis Research Center Current programmable graphics hardware makes it possible to implement general convolution filters

More information

Overview and Introduction to Scientific Visualization. Texas Advanced Computing Center The University of Texas at Austin

Overview and Introduction to Scientific Visualization. Texas Advanced Computing Center The University of Texas at Austin Overview and Introduction to Scientific Visualization Texas Advanced Computing Center The University of Texas at Austin Scientific Visualization The purpose of computing is insight not numbers. -- R. W.

More information

Brook: A Streaming Programming Language

Brook: A Streaming Programming Language Brook: A Streaming Programming Language I. Buck 8 October 2001 1 Motivation 1.1 Why design our own language? Modern processors can be limited by communication rather than computation. However, communication

More information

Cilk Plus GETTING STARTED

Cilk Plus GETTING STARTED Cilk Plus GETTING STARTED Overview Fundamentals of Cilk Plus Hyperobjects Compiler Support Case Study 3/17/2015 CHRIS SZALWINSKI 2 Fundamentals of Cilk Plus Terminology Execution Model Language Extensions

More information

KSTAR tokamak. /

KSTAR tokamak. / KSTAR tokamak / spinhalf@nfri.re.kr !!! Data parallelism CUDA programming python! pycuda GPU Development tools Python 2.6+ Scientific libraries as python package for interactive computing (Numpy, Scipy..)

More information

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation

More information

Atomic Operations. Atomic operations, fast reduction. GPU Programming. Szénási Sándor.

Atomic Operations. Atomic operations, fast reduction. GPU Programming.   Szénási Sándor. Atomic Operations Atomic operations, fast reduction GPU Programming http://cuda.nik.uni-obuda.hu Szénási Sándor szenasi.sandor@nik.uni-obuda.hu GPU Education Center of Óbuda University ATOMIC OPERATIONS

More information

The GPGPU Programming Model

The GPGPU Programming Model The Programming Model Institute for Data Analysis and Visualization University of California, Davis Overview Data-parallel programming basics The GPU as a data-parallel computer Hello World Example Programming

More information

Tag a Tiny Aggregation Service for Ad-Hoc Sensor Networks. Samuel Madden, Michael Franklin, Joseph Hellerstein,Wei Hong UC Berkeley Usinex OSDI 02

Tag a Tiny Aggregation Service for Ad-Hoc Sensor Networks. Samuel Madden, Michael Franklin, Joseph Hellerstein,Wei Hong UC Berkeley Usinex OSDI 02 Tag a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden, Michael Franklin, Joseph Hellerstein,Wei Hong UC Berkeley Usinex OSDI 02 Outline Introduction The Tiny AGgregation Approach Aggregate

More information

Parallel Sorting Algorithms

Parallel Sorting Algorithms CSC 391/691: GPU Programming Fall 015 Parallel Sorting Algorithms Copyright 015 Samuel S. Cho Sorting Algorithms Review Bubble Sort: O(n ) Insertion Sort: O(n ) Quick Sort: O(n log n) Heap Sort: O(n log

More information

GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES TAKAHIRO HARADA, AMD DESTRUCTION FOR GAMES ERWIN COUMANS, AMD

GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES TAKAHIRO HARADA, AMD DESTRUCTION FOR GAMES ERWIN COUMANS, AMD GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES TAKAHIRO HARADA, AMD DESTRUCTION FOR GAMES ERWIN COUMANS, AMD GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES Jason Yang, Takahiro Harada AMD HYBRID CPU-GPU

More information

3D Environment Reconstruction

3D Environment Reconstruction 3D Environment Reconstruction Using Modified Color ICP Algorithm by Fusion of a Camera and a 3D Laser Range Finder The 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems October 11-15,

More information

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids

More information

Abstract Memory The TupleSpace

Abstract Memory The TupleSpace Abstract Memory The TupleSpace Linda, JavaSpaces, Tspaces and PastSet Tuple Space Main idea: Byte ordered memory is a product of hardware development, not programmers needs Remodel memory to support the

More information

L15: Putting it together: N-body (Ch. 6)!

L15: Putting it together: N-body (Ch. 6)! Outline L15: Putting it together: N-body (Ch. 6)! October 30, 2012! Review MPI Communication - Blocking - Non-Blocking - One-Sided - Point-to-Point vs. Collective Chapter 6 shows two algorithms (N-body

More information

Parallel Techniques. Embarrassingly Parallel Computations. Partitioning and Divide-and-Conquer Strategies

Parallel Techniques. Embarrassingly Parallel Computations. Partitioning and Divide-and-Conquer Strategies slides3-1 Parallel Techniques Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations Load Balancing

More information

PHYSICALLY BASED ANIMATION

PHYSICALLY BASED ANIMATION PHYSICALLY BASED ANIMATION CS148 Introduction to Computer Graphics and Imaging David Hyde August 2 nd, 2016 WHAT IS PHYSICS? the study of everything? WHAT IS COMPUTATION? the study of everything? OUTLINE

More information

Optimisation Myths and Facts as Seen in Statistical Physics

Optimisation Myths and Facts as Seen in Statistical Physics Optimisation Myths and Facts as Seen in Statistical Physics Massimo Bernaschi Institute for Applied Computing National Research Council & Computer Science Department University La Sapienza Rome - ITALY

More information

HW DUE Floating point

HW DUE Floating point Numerical and Scientific Computing with Applications David F. Gleich CS 314, Purdue In this class: Understand the need for floating point arithmetic and some alternatives. Understand how the computer represents

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 2 Sajjad Haider Spring 2010 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric

More information

Unsupervised Learning Partitioning Methods

Unsupervised Learning Partitioning Methods Unsupervised Learning Partitioning Methods Road Map 1. Basic Concepts 2. K-Means 3. K-Medoids 4. CLARA & CLARANS Cluster Analysis Unsupervised learning (i.e., Class label is unknown) Group data to form

More information

Source Coding Techniques

Source Coding Techniques Source Coding Techniques Source coding is based on changing the content of the original signal. Also called semantic-based coding. Compression rates may be higher but at a price of loss of information.

More information

Optimal 3D Lattices in Scientific Visualization and Computer Graphics

Optimal 3D Lattices in Scientific Visualization and Computer Graphics Optimal 3D Lattices in Scientific Visualization and Computer Graphics School of Computing Science Simon Fraser University Jan, 2007 Outline Visualization and Graphics 1 Visualization and Graphics 2 3 Further

More information

Particle-based Fluid Simulation

Particle-based Fluid Simulation Simulation in Computer Graphics Particle-based Fluid Simulation Matthias Teschner Computer Science Department University of Freiburg Application (with Pixar) 10 million fluid + 4 million rigid particles,

More information

Systems Design and Implementation I.4 Naming in a Multiserver OS

Systems Design and Implementation I.4 Naming in a Multiserver OS Systems Design and Implementation I.4 Naming in a Multiserver OS System, SS 2009 University of Karlsruhe 06.5.2009 Jan Stoess University of Karlsruhe The Issue 2 The Issue In system construction we combine

More information

CS 261 Data Structures. Priority Queue ADT & Heaps

CS 261 Data Structures. Priority Queue ADT & Heaps CS 61 Data Structures Priority Queue ADT & Heaps Priority Queue ADT Associates a priority with each object: First element has the highest priority (typically, lowest value) Examples of priority queues:

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Recovering dual-level rough surface parameters from simple lighting. Graphics Seminar, Fall 2010

Recovering dual-level rough surface parameters from simple lighting. Graphics Seminar, Fall 2010 Recovering dual-level rough surface parameters from simple lighting Chun-Po Wang Noah Snavely Steve Marschner Graphics Seminar, Fall 2010 Motivation and Goal Many surfaces are rough Motivation and Goal

More information

GPGPU Programming in Haskell with Accelerate

GPGPU Programming in Haskell with Accelerate GPGPU Programming in Haskell with Accelerate Trevor L. McDonell University of New South Wales @tlmcdonell tmcdonell@cse.unsw.edu.au https://github.com/acceleratehs Preliminaries http://hackage.haskell.org/package/accelerate

More information

Scheduling Image Processing Pipelines

Scheduling Image Processing Pipelines Lecture 15: Scheduling Image Processing Pipelines Visual Computing Systems Simple image processing kernel int WIDTH = 1024; int HEIGHT = 1024; float input[width * HEIGHT]; float output[width * HEIGHT];

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

L3: Data Partitioning and Memory Organization, Synchronization

L3: Data Partitioning and Memory Organization, Synchronization Outline More on CUDA L3: Data Partitioning and Organization, Synchronization January 21, 2009 1 First assignment, due Jan. 30 (later in lecture) Error checking mechanisms Synchronization More on data partitioning

More information

May 8-11, 2017 Silicon Valley CUDA 9 AND BEYOND. Mark Harris, May 10, 2017

May 8-11, 2017 Silicon Valley CUDA 9 AND BEYOND. Mark Harris, May 10, 2017 May 8-11, 2017 Silicon Valley CUDA 9 AND BEYOND Mark Harris, May 10, 2017 INTRODUCING CUDA 9 BUILT FOR VOLTA FASTER LIBRARIES Tesla V100 New GPU Architecture Tensor Cores NVLink Independent Thread Scheduling

More information

Isosurface Visualization of Data with Nonparametric Models for Uncertainty

Isosurface Visualization of Data with Nonparametric Models for Uncertainty Isosurface Visualization of Data with Nonparametric Models for Uncertainty Tushar Athawale, Elham Sakhaee, and Alireza Entezari Department of Computer & Information Science & Engineering University of

More information

PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

More information

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University Ray Tracing III Wen-Chieh (Steve) Lin National Chiao-Tung University Shirley, Fundamentals of Computer Graphics, Chap 10 Doug James CG slides, I-Chen Lin s CG slides Ray-tracing Review For each pixel,

More information

Welcome to the "Concurrent programming made easy" demonstration video. Let's begin with the first demo, which shows many actors of the same type,

Welcome to the Concurrent programming made easy demonstration video. Let's begin with the first demo, which shows many actors of the same type, Modernize common concurrency patterns with actors Barry Feigenbaum Transcript 0:01 Hello, my name is Barry Feigenbaum. 0:03 Welcome to the "Concurrent programming made easy" demonstration video. 0:11 Let's

More information

Exercises C-Programming

Exercises C-Programming Exercises C-Programming Claude Fuhrer (claude.fuhrer@bfh.ch) 0 November 016 Contents 1 Serie 1 1 Min function.................................. Triangle surface 1............................... 3 Triangle

More information

Exploiting Depth Camera for 3D Spatial Relationship Interpretation

Exploiting Depth Camera for 3D Spatial Relationship Interpretation Exploiting Depth Camera for 3D Spatial Relationship Interpretation Jun Ye Kien A. Hua Data Systems Group, University of Central Florida Mar 1, 2013 Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships

More information

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. 1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,

More information

GPU programming basics. Prof. Marco Bertini

GPU programming basics. Prof. Marco Bertini GPU programming basics Prof. Marco Bertini CUDA: atomic operations, privatization, algorithms Atomic operations The basics atomic operation in hardware is something like a read-modify-write operation performed

More information

OPENCL. Episode 6 - Shared Memory Kernel Optimization

OPENCL. Episode 6 - Shared Memory Kernel Optimization OPENCL Episode 6 - Shared Memory Kernel Optimization David W. Gohara, Ph.D. Center for Computational Biology Washington University School of Medicine, St. Louis email: sdg0919@gmail.com THANK YOU SHARED

More information

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003 Topic Overview One-to-All Broadcast

More information

CS 179: GPU Computing. Recitation 2: Synchronization, Shared memory, Matrix Transpose

CS 179: GPU Computing. Recitation 2: Synchronization, Shared memory, Matrix Transpose CS 179: GPU Computing Recitation 2: Synchronization, Shared memory, Matrix Transpose Synchronization Ideal case for parallelism: no resources shared between threads no communication between threads Many

More information

Volume Ray Casting Neslisah Torosdagli

Volume Ray Casting Neslisah Torosdagli Volume Ray Casting Neslisah Torosdagli Overview Light Transfer Optical Models Math behind Direct Volume Ray Casting Demonstration Transfer Functions Details of our Application References What is Volume

More information

Brook+ Data Types. Basic Data Types

Brook+ Data Types. Basic Data Types Brook+ Data Types Important for all data representations in Brook+ Streams Constants Temporary variables Brook+ Supports Basic Types Short Vector Types User-Defined Types 29 Basic Data Types Basic data

More information

Highly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs

Highly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs Highly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs Kaixi Hou, Hao Wang, Wu chun Feng {kaixihou, hwang121, wfeng}@vt.edu Jeffrey S. Vetter, Seyong Lee vetter@computer.org, lees2@ornl.gov

More information

Accelerating CFD with Graphics Hardware

Accelerating CFD with Graphics Hardware Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery

More information

Arizona Academic Standards

Arizona Academic Standards Arizona Academic Standards This chart correlates the Grade 8 performance objectives from the mathematics standard of the Arizona Academic Standards to the lessons in Review, Practice, and Mastery. Lesson

More information

Module 12 Floating-Point Considerations

Module 12 Floating-Point Considerations GPU Teaching Kit Accelerated Computing Module 12 Floating-Point Considerations Lecture 12.1 - Floating-Point Precision and Accuracy Objective To understand the fundamentals of floating-point representation

More information

Realtime Water Simulation on GPU. Nuttapong Chentanez NVIDIA Research

Realtime Water Simulation on GPU. Nuttapong Chentanez NVIDIA Research 1 Realtime Water Simulation on GPU Nuttapong Chentanez NVIDIA Research 2 3 Overview Approaches to realtime water simulation Hybrid shallow water solver + particles Hybrid 3D tall cell water solver + particles

More information

CS240: Programming in C

CS240: Programming in C CS240: Programming in C Lecture 5: Functions. Scope of variables. Program structure. Cristina Nita-Rotaru Lecture 5/ Fall 2013 1 Functions: Explicit declaration Declaration, definition, use, order matters.

More information

Image Processing. Traitement d images. Yuliya Tarabalka Tel.

Image Processing. Traitement d images. Yuliya Tarabalka  Tel. Traitement d images Yuliya Tarabalka yuliya.tarabalka@hyperinet.eu yuliya.tarabalka@gipsa-lab.grenoble-inp.fr Tel. 04 76 82 62 68 Noise reduction Image restoration Restoration attempts to reconstruct an

More information

Applications of MPI I

Applications of MPI I Applications of MPI I N-Dimensional Integration Using Monte Carlo Techniques Gürsan ÇOBAN 4.06.202 Outline Numerical Integration Techniques Monte Carlo Techniques for Numerical Integration Some MPI Examples

More information

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123 2.7 Cloth Animation 320491: Advanced Graphics - Chapter 2 123 Example: Cloth draping Image Michael Kass 320491: Advanced Graphics - Chapter 2 124 Cloth using mass-spring model Network of masses and springs

More information

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography 1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography

More information

C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S. Image Operations II

C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S. Image Operations II T H E U N I V E R S I T Y of T E X A S H E A L T H S C I E N C E C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S Image Operations II For students of HI 5323

More information

Modal Logic: Implications for Design of a Language for Distributed Computation p.1/53

Modal Logic: Implications for Design of a Language for Distributed Computation p.1/53 Modal Logic: Implications for Design of a Language for Distributed Computation Jonathan Moody (with Frank Pfenning) Department of Computer Science Carnegie Mellon University Modal Logic: Implications for

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Synchronization 3 Automatic Parallelization and OpenMP 4 GPGPU 5 Q& A 2 Multithreaded

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics

Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics Dimitri Van De Ville Ecole Polytechnique Fédérale de Lausanne Biomedical Imaging Group dimitri.vandeville@epfl.ch

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming

More information

ECE 574 Cluster Computing Lecture 10

ECE 574 Cluster Computing Lecture 10 ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular

More information

CS490D: Introduction to Data Mining Prof. Chris Clifton

CS490D: Introduction to Data Mining Prof. Chris Clifton CS490D: Introduction to Data Mining Prof. Chris Clifton April 5, 2004 Mining of Time Series Data Time-series database Mining Time-Series and Sequence Data Consists of sequences of values or events changing

More information

CS420: Operating Systems

CS420: Operating Systems Threading Issues James Moscola Department of Engineering & Computer Science York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threading Issues

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing

More information

Hands-On Introduction to. LabVIEW. for Scientists and Engineers. Second Edition. John Essick. Reed College OXFORD UNIVERSITY PRESS

Hands-On Introduction to. LabVIEW. for Scientists and Engineers. Second Edition. John Essick. Reed College OXFORD UNIVERSITY PRESS Hands-On Introduction to LabVIEW for Scientists and Engineers Second Edition John Essick Reed College New York Oxford OXFORD UNIVERSITY PRESS Contents. Preface xiii 1. THE WHILE LOOP AND WAVEFORM CHART

More information

Implementation Details of GPU-based Out-of-Core Many-Lights Rendering

Implementation Details of GPU-based Out-of-Core Many-Lights Rendering Implementation Details of GPU-based Out-of-Core Many-Lights Rendering Rui Wang, Yuchi Huo, Yazhen Yuan, Kun Zhou, Wei Hua, Hujun Bao State Key Lab of CAD&CG, Zhejiang University In this document, we provide

More information

Overview of Traditional Surface Tracking Methods

Overview of Traditional Surface Tracking Methods Liquid Simulation With Mesh-Based Surface Tracking Overview of Traditional Surface Tracking Methods Matthias Müller Introduction Research lead of NVIDIA PhysX team PhysX GPU acc. Game physics engine www.nvidia.com\physx

More information

Image Processing Tricks in OpenGL. Simon Green NVIDIA Corporation

Image Processing Tricks in OpenGL. Simon Green NVIDIA Corporation Image Processing Tricks in OpenGL Simon Green NVIDIA Corporation Overview Image Processing in Games Histograms Recursive filters JPEG Discrete Cosine Transform Image Processing in Games Image processing

More information

UberFlow: A GPU-Based Particle Engine

UberFlow: A GPU-Based Particle Engine UberFlow: A GPU-Based Particle Engine Peter Kipfer Mark Segal Rüdiger Westermann Technische Universität München ATI Research Technische Universität München Motivation Want to create, modify and render

More information

Aliasing. Can t draw smooth lines on discrete raster device get staircased lines ( jaggies ):

Aliasing. Can t draw smooth lines on discrete raster device get staircased lines ( jaggies ): (Anti)Aliasing and Image Manipulation for (y = 0; y < Size; y++) { for (x = 0; x < Size; x++) { Image[x][y] = 7 + 8 * sin((sqr(x Size) + SQR(y Size)) / 3.0); } } // Size = Size / ; Aliasing Can t draw

More information

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information

More information

Extraction of Level Sets from 2D Images

Extraction of Level Sets from 2D Images Extraction of Level Sets from 2D Images David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License.

More information