Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques.

Similar documents
Computer Graphics. Si Lu. Fall uter_graphics.htm 11/22/2017

Motivation. Culling Don t draw what you can t see! What can t we see? Low-level Culling

Specialized Acceleration Structures for Ray-Tracing. Warren Hunt

TDA362/DIT223 Computer Graphics EXAM (Same exam for both CTH- and GU students)

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Computer Graphics Ray Casting. Matthias Teschner

Speeding up your game

Fast BVH Construction on GPUs

REYES REYES REYES. Goals of REYES. REYES Design Principles

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Simulation in Computer Graphics. Introduction. Matthias Teschner. Computer Science Department University of Freiburg

Lecture 11: Ray tracing (cont.)

CS354 Computer Graphics Ray Tracing. Qixing Huang Januray 24th 2017

CS 488. More Shading and Illumination. Luc RENAMBOT

Intersection Acceleration

Intro to Ray-Tracing & Ray-Surface Acceleration

Particle systems, collision detection, and ray tracing. Computer Graphics CSE 167 Lecture 17

Homework #2. Shading, Projections, Texture Mapping, Ray Tracing, and Bezier Curves

Computer Graphics. Lecture 13. Global Illumination 1: Ray Tracing and Radiosity. Taku Komura

Ray Tracing. Kjetil Babington

Ray Tracing Basics I. Computer Graphics as Virtual Photography. camera (captures light) real scene. photo. Photographic print. Photography: processing

9. Three Dimensional Object Representations

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions.

CS Simple Raytracer for students new to Rendering

Photorealism: Ray Tracing

Introduction Ray tracing basics Advanced topics (shading) Advanced topics (geometry) Graphics 2010/2011, 4th quarter. Lecture 11: Ray tracing

Programming projects. Assignment 1: Basic ray tracer. Assignment 1: Basic ray tracer. Assignment 1: Basic ray tracer. Assignment 1: Basic ray tracer

Photorealistic 3D Rendering for VW in Mobile Devices

Rasterization Overview

INFOGR Computer Graphics

CPSC GLOBAL ILLUMINATION

Homework #2. Hidden Surfaces, Projections, Shading and Texture, Ray Tracing, and Parametric Curves

Topic 12: Texture Mapping. Motivation Sources of texture Texture coordinates Bump mapping, mip-mapping & env mapping

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

CS451Real-time Rendering Pipeline

Real-Time Voxelization for Global Illumination

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

Computer Graphics. Lecture 10. Global Illumination 1: Ray Tracing and Radiosity. Taku Komura 12/03/15

Rendering: Reality. Eye acts as pinhole camera. Photons from light hit objects

Topics and things to know about them:

Topic 11: Texture Mapping 11/13/2017. Texture sources: Solid textures. Texture sources: Synthesized

Accelerating Molecular Modeling Applications with Graphics Processors

Administrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting

Chapter 4. Chapter 4. Computer Graphics 2006/2007 Chapter 4. Introduction to 3D 1

CHAPTER 1 Graphics Systems and Models 3

L1 - Introduction. Contents. Introduction of CAD/CAM system Components of CAD/CAM systems Basic concepts of graphics programming

Spatial Data Structures

Topic 11: Texture Mapping 10/21/2015. Photographs. Solid textures. Procedural

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Assignment 6: Ray Tracing

Rendering Algorithms: Real-time indirect illumination. Spring 2010 Matthias Zwicker

Ray Tracing: Whence and Whither?

Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique

Practical Shadow Mapping

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

6.837 Introduction to Computer Graphics Quiz 2 Thursday November 20, :40-4pm One hand-written sheet of notes allowed

Abstract. Introduction. Kevin Todisco

Homework #2. Shading, Ray Tracing, and Texture Mapping

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Spatial Data Structures. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017

Lecture 17: Recursive Ray Tracing. Where is the way where light dwelleth? Job 38:19

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Ray Tracing. Foley & Van Dam, Chapters 15 and 16

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Chapter 7 - Light, Materials, Appearance

Real Time Rendering of Complex Height Maps Walking an infinite realistic landscape By: Jeffrey Riaboy Written 9/7/03

T6: Position-Based Simulation Methods in Computer Graphics. Jan Bender Miles Macklin Matthias Müller

Ray Tracing Foley & Van Dam, Chapters 15 and 16

CS 130 Exam I. Fall 2015

Motivation. Sampling and Reconstruction of Visual Appearance. Effects needed for Realism. Ray Tracing. Outline

Rendering. Mike Bailey. Rendering.pptx. The Rendering Equation

Ray Tracing. CS 4620 Lecture 5

Lighting. To do. Course Outline. This Lecture. Continue to work on ray programming assignment Start thinking about final project

Viewing and Ray Tracing. CS 4620 Lecture 4

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Overview: Ray Tracing & The Perspective Projection Pipeline

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

Curriculum Connections (Fractions): K-8 found at under Planning Supports

TDA361/DIT220 Computer Graphics, January 15 th 2016

Computer Graphics Global Illumination

Graphics (INFOGR ): Example Exam

Viewing and Ray Tracing

Computing Visibility. Backface Culling for General Visibility. One More Trick with Planes. BSP Trees Ray Casting Depth Buffering Quiz

Cloth Simulation on the GPU. Cyril Zeller NVIDIA Corporation

and Parallel Algorithms Programming with CUDA, WS09 Waqar Saleem, Jens Müller

Topic 10: Scene Management, Particle Systems and Normal Mapping. CITS4242: Game Design and Multimedia

Problem Set 4 Part 1 CMSC 427 Distributed: Thursday, November 1, 2007 Due: Tuesday, November 20, 2007

Single Scattering in Refractive Media with Triangle Mesh Boundaries

3D Polygon Rendering. Many applications use rendering of 3D polygons with direct illumination

Ray tracing. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 3/19/07 1

The Animation Process. Lighting: Illusions of Illumination

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

I have a meeting with Peter Lee and Bob Cosgrove on Wednesday to discuss the future of the cluster. Computer Graphics

Texture Mapping II. Light maps Environment Maps Projective Textures Bump Maps Displacement Maps Solid Textures Mipmaps Shadows 1. 7.

Dynamic Ambient Occlusion and Indirect Lighting. Michael Bunnell NVIDIA Corporation

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Transcription:

I. Course Title Parallel Computing 2 II. Course Description Students study parallel programming and visualization in a variety of contexts with an emphasis on underlying and experimental technologies. Topics include orbital mechanics and the N-Body problem, graphics rendering via ray tracing and relaxation methods toward a steady-state. The programming language is C using both MPI and 3- D OpenGL. Additional tools and environments include OpenMP, pthreads, sockets, and Nvidia's CUDA for GPGPU. III. Performance Indicators TJ Specific Performance Indicators Standard 1 The student will investigate and understand that parallelism must scale properly (and efficiently) in the case of large 3-D rendering problems, for example using recursive ray tracing to map a defined geometry onto an output bitmap. Ray tracing models a 3-D geometry involving an eye, a screen, and a set of objects. Vector calculations determine which object is visible and also whether it is in shadow. Recursive calculations determine reflections for those object with that material property. Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques. Indicator 1.a.1 Demonstrate basic lab techniques Demonstrate the following basic lab techniques: output a 2-D bitmap image file, solve a quadratic equation in the rendering code to determine sphere-line intersection, calculate a dot product to determine a gradient color value from a single point-light source. Benchmark 1.b Investigate and Understand Graphics Rendering Techniques The student will investigate and understand graphics rendering techniques. Indicator 1.b.1 Graphics Rendering Demonstrate the following technique: construct a scene containing spheres and infinite planes (axis aligned, but also with a checkerboard pattern), include shadow calculations and reflection, as well as recursive rendering therein. Indicator 1.b.2

Triangulated Geometry Demonstrate the following technique: determine the point-ofintersection for a line and a triangle, such that any geometry whose surface has been triangulated can then be rendered (e.g., teapot, rabbit, pyramid, elephant). Indicator 1.b.3 Animated Output Movie Demonstrate the following technique: loop the rendering function where at least one parameter of the scene is changing, outputting for each particular value of that parameter a single frame of a movie. After the run, as a post-processing step, combine those frames into an animated movie file. Benchmark 1.c Investigate and Understand Texture Mapping The student will investigate and understand texture mapping. Indicator 1.c.1 Texture Mapping Rather than define a solid color for a particular geometric object (e.g., the floor, or a sphere) the student will map the calculated point-ofintersection on that object to an image file and from that image file determine a color for that point. This technique may be used to map geographic data onto a sphere to produce a globe, or photographic data to show a person s face or the image of an animal, separate from changing the actual geometry of the object. Standard 2 The student will investigate and understand that fine-grain parallelism (i.e., not the decomposition of a coarse space, grid or otherwise) may be used for classic algorithms to improve runtime. A summation algorithm may be coded either in a loop or in a parallel tree. More sophisticated parallel tree code involves both up-and-down passes. The merge sort algorithm may then be coded to run in sub-linear time. Benchmark 2.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques. Indicator 2.a.1 Demonstrate basic lab techniques Demonstrate the following basic lab techniques: launch the XMT simulator either directly or by first converting the source code to an OpenMP version, analyze the performance of an XMT-C code in terms of both work and time.

Benchmark 2.b Investigate and Understand the Use of Fine-Grain Parallel Code to Calculate a Summation The student will investigate and understand the use of fine-grain parallel code to calculate a summation, by using a binary tree structure rather than a simple 1-D list of values, and a loop over parallel-spawns rather than a serial loop. Indicator 2.b.1 Investigate and Understand the Use of a Parallel-Spawn for Simultaneous Execution The student will investigate and understand the use of a parallelspawn, a feature of the XMT-C language, which acts essentially like a massively multi-threaded code only with highly efficient hardware and a much simpler coding interface. This spawn command can be used to execute a series of pair-wise sums on a list of data. This process happens simultaneously in O(1) time and O(N) work, reducing the number of values still needed to be summed in half at each step. After O(log_2 N) levels of such spawns, done in a loop, we arrive at the overall sum. Total time is O(log_2 N) and total work is, still, O(N) operations. Work cannot be reduced in theory, all O(N) values must be seen. Indicator 2.b.2 Investigate and Understand the Use of Up-and-Down Passes in a Parallel Binary Tree The student will investigate and understand the use of up-and-down passes in a parallel binary tree in order to code more sophisticated algorithms such as prefix-sum (widely used in general) and prefix-min. In these cases the result of our parallel process is not a single value (i.e., the sum) but rather a list of values (i.e., all of the prefix-sums). Indicator 2.b.3 Investigate and Understand the Parallel Rank Operation on Two Sorted Sub-Lists The student will investigate and understand the parallel rank operation on two sorted sub-lists. The ultimate end goal is a parallel merge sort. A first step toward that goal is the determination of which slot a given value would occupy if it were actually in the other sorted list instead. Its rank in its own list is obviously known (it's the index) and since the second list is sorted the rank in that list can be determined with a binary search in O(log_2 N) time and work. Since all binary searches for all values in both lists can be performed in parallel the total time is also O(log_2 N) but the total work is O(N log_2 N), worse than a serial zipper-merge which requires only O(N) work (i.e., total operations).

Indicator 2.b.4 Implement a Parallel Merge Sort that Runs in Sub-Linear Time Implement a parallel merge sort using the parallel rank operation described above. As described the amount of work on each level of the recursive sort would be O(N log_2 N) rather than O(N), so the total work is O(N (log_2 N)^2) instead of O(N log_2 N). There are O(log_2 N) levels in total. Total time is O((log_2 N)^2) instead of O(N log_2 N). One goal is to maintain the significant time improvement while reducing total work back down to the serial level. Standard 3 The student will investigate and understand that all-pairs communication may be required in a parallel code for problems involving a highly-coupled calculation, such as when physical forces act at any distance. Applications such as gravity simulations use highly-coupled calculations. The simulation progresses by calculating forces and then updating positions. Theoretical scaling of such codes is realized in practice on computing clusters. Benchmark 3.a The student will investigate and understand the construction and analysis of an all-pairs simulation, assuming parallel code with a standard communication protocol on a modern parallel system. Indicator 3.a.1 Investigate and Understand the Construction of an All-Pairs Simulation The student will investigate and understand the construction of an allpairs simulation. Students should write code to build a working version of such a simulation. For instance, if celestial bodies are modeled where the interactions are based solely on gravity (i.e., no charged particles, no collisions) then each body will influence every other body, but perhaps by only a very small amount. Two loops are required, one over all the bodies and then an inner-loop over all the other bodies. Forces are accumulated for each body in the loops, after which a single loop updates all positions. Indicator 3.a.2 Investigate and Understand the Scaling of an All-Pairs Simulation The student will investigate and understand the scaling of an all-pair simulation. On the one hand, theoretical results using Amdhal's Law may determine a bound on the expected speed-up of a parallel code, based on the fraction of the overall code that remains serial. On the other hand, an actual implementation of running code in MPI or OpenMP or pthreads or any other system will show measurable improvement when deployed on an actual parallel system, a dedicated

cluster or otherwise. The observed results can then be compared to theory for a variety of cases. Indicator 3.a.3 Orally present the results of an investigation Orally present the results. Standard 4 The student will investigate and understand the use of massively parallel multithreaded systems such as many core chips, large supercomputers, and general purpose computing on commodity graphics cards. Rather than decompose a problem across nodes one can use threads instead. Threads have access to a shared memory space that sub-processes do not. The potential of appliance-like parallelism involves careful planning for the future. Benchmark 4.a The student will investigate and understand the use of threads, rather than processes, for parallel codes, where typically all sub-tasks are processed within a single machine or even within a single graphics card. Indicator 4.a.1 Investigate and Understand the Use of a Multi-Threaded Code The student will investigate and understand the use of a multi-threaded code. Options include the standard pthread library, scaled XMT-C spawn blocks, and graphics card programming such as for Nvidia's CUDA system. Typically a list of data is decomposed in an embarrassingly parallel way so that individual threads can then compute on a sub-portion of that list, using their thread ID numbers as a convenient instrument for mapping onto a non-overlapping region of the shared list. Indicator 4.a.2 Investigate and Understand the Various Applications of a Multi- Threaded Code The student will investigate and understand various applications of a multi-threaded code. For instance, the discrete cosine transform used in signals processing can be applied to form a JPEG image where a large matrix of pixel color values is decomposed into smaller 8-by-8 pixel blocks. These smaller blocks are then handled by separate threads in order to calculate the DCT and perform other operations required by this particular compression scheme. Other examples of similar calculations include matrix operations to solve linear systems and also recursive ray tracing. Indicator 4.a.3

Investigate and Understand the Potential for Wide-Scale Use of Thread-Based Parallelism The student will investigate and understand the potential for the widescale use of thread-based parallelism, most obviously as a result of the deployment, through current and next generation commodity graphics cards, of massively parallel multi-threaded systems to personal computers, in particular for gaming and entertainment purposes. This mass market effect is driving the rapid deployment of high-end parallel systems and that in turn opens the door for large-scale scientific and other technical applications, because powerful systems are now so widely accessible.