Administrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting

Similar documents
CIS 665: GPU Programming and Architecture. Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

GPU Programming and Architecture: Course Overview

GPU Programming and Architecture: Course Overview

CS 179: GPU Programming

Hardware/Software Co-Design

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Introduction to CUDA (1 of n*)

Spring 2011 Prof. Hyesoon Kim

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Spring 2009 Prof. Hyesoon Kim

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CS427 Multicore Architecture and Parallel Computing

Introduction to GPU hardware and to CUDA

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Lecture 1: Gentle Introduction to GPUs

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

GPU for HPC. October 2010

Graphics Processor Acceleration and YOU

CSE 167: Introduction to Computer Graphics. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2013

CS 668 Parallel Computing Spring 2011

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA (1 of n*)

Current Trends in Computer Graphics Hardware

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Shaders. Slide credit to Prof. Zwicker

CS6963. Course Information. Parallel Programming for Graphics Processing Units (GPUs) Lecture 1: Introduction. Source Materials for Today s Lecture

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 1: Introduction. Markus Hadwiger, KAUST

GPUs and GPGPUs. Greg Blanton John T. Lubia

CIS 581 Interactive Computer Graphics

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

2.11 Particle Systems

CIS 581 Interactive Computer Graphics (slides based on Dr. Han-Wei Shen s slides) Requirements. Reference Books. Textbook

Programming with CUDA

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

Mattan Erez. The University of Texas at Austin

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

CS195V Week 1. Introduction

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Cross Teaching Parallelism and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing

Accelerating CFD with Graphics Hardware

ECE 8823: GPU Architectures. Objectives

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

Com S 336 Final Project Ideas

EE , GPU Programming

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

Mattan Erez. The University of Texas at Austin

CS 179: GPU Programming

Fra superdatamaskiner til grafikkprosessorer og

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques.

Exotic Methods in Parallel Computing [GPU Computing]

Warps and Reduction Algorithms

Threading Hardware in G80

Portland State University ECE 588/688. Graphics Processors

Technology for a better society. hetcomp.com

GPGPU. Peter Laurens 1st-year PhD Student, NSC

Mattan Erez. The University of Texas at Austin

CSCE 441 Computer Graphics Fall 2018

Lecture 1: Introduction and Computational Thinking

Massively Parallel Architectures

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

The Rasterization Pipeline

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Lecture 1. Introduction Course Overview

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

High Performance Computing with Accelerators

How to Write Fast Code , spring th Lecture, Mar. 31 st

Project Kickoff CS/EE 217. GPU Architecture and Parallel Programming

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA)

CME 213 S PRING Eric Darve

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Graphics Processing Unit Architecture (GPU Arch)

GPU Architecture and Function. Michael Foster and Ian Frasch

Antonio R. Miele Marco D. Santambrogio

GPGPU/CUDA/C Workshop 2012

Numerical Simulation on the GPU

Using Graphics Chips for General Purpose Computation

L17: CUDA, cont. 11/3/10. Final Project Purpose: October 28, Next Wednesday, November 3. Example Projects

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Real-time Graphics 9. GPGPU

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

Graphics Hardware. Instructor Stephen J. Guy

GPGPU: Beyond Graphics. Mark Harris, NVIDIA

The Graphics Pipeline and OpenGL I: Transformations!

Graphics and Imaging Architectures

Parallel Programming Concepts. GPU Computing with OpenCL

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN

CSC 7443: Scientific Information Visualization

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

From Brook to CUDA. GPU Technology Conference

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

CUDA (Compute Unified Device Architecture)

Credits. CS758: Multicore Programming. Full Disclosure. Why me? 9/3/10. Prof. David Wood Fall 2010

CS516 Programming Languages and Compilers II

Transcription:

CIS 565: GPU Programming and Architecture Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider and Patrick Cozzi Meeting Monday and Wednesday 6:00 7:30pm Moore 212 Recorded lectures upon request Website: http://www.seas.upenn.edu/~cis565/ Instructor: Patrick Cozzi 99-03: Penn State 00: Intel 02: IBM 03: IBM 04-Present: AGI 06-08: 08: Penn 11: 3D Engine Design for Virtual Globes Email: pjcozzi _at_ siggraph.org Include [CIS565] in subject Office Hours Monday and Wednesday, 7:30-9:00pm Location: SIG Lab Teaching Assistant Jonathan McCaffrey (jmccaf _at_ seas.upenn.edu) Office Hours: Tuesday and Thursday 3-4:30pm 3 Office Location: SIG Lab 1

Prerequisites CIS 460: Introduction to Computer Graphics CIS 501: Computer Architecture Most important: C/C++ and OpenGL CIS 534: Multicore Programming and Architecture Course Description This course is a pragmatic examination of multicore programming and the hardware architecture of modern multicore processors. Unlike the sequential single-core processors of the past, utilizing a multicore processor requires programmers to identify parallelism and write explicitly y parallel code. Topics covered include: the relevant architectural trends and aspects of multicores,, approaches for writing multicore software by extracting data parallelism (vectors and SIMD), thread-level parallelism, and task-based parallelism, efficient synchronization, and program profiling and performance tuning. The course focuses primarily on mainstream shareds hared- memory multicores with some coverage of graphics processing units (GPUs). Cluster-based supercomputing is not a focus of this course. Several programming assignments and a course project will provide students first-hand experience with programming, experimentally analyzing, and tuning multicore software. Students are expected to have a solid understanding of computer architecture and strong programming skills (including experience with C/C++). We will not overlap very much What is GPU (Parallel) Computing Parallel computing: using multiple processors to More quickly perform a computation, or Perform a larger computation in the same time PROGRAMMER expresses parallelism Clusters of Computers : MPI, networks, cloud computing. Shared memory Multiprocessor Called multicore when on the same chip CIS 534 MULTICORE NOT COVERED Course Overview System and GPU architecture Real-time graphics programming with OpenGL and GLSL General purpose programming with CUDA and OpenCL Problem domain: up to you Hands-on GPU: Graphics processing units COURSE FOCUS CIS 565 Slide curiosity of Milo Martin 2

Goals Program massively parallel processors: High performance Functionality and maintainability Scalability Gain Knowledge Parallel programming principles and patterns Processor architecture features and constraints Programming API, tools, and techniques Grading Homeworks (4-5) 40% Paper Presentation 10% Final Project 40% + 5% Final 10% Bonus days: five per person No-questions questions-asked one-day extension Multiple bonus days can be used on the same assignment Can be used for most, but not all assignments Strict late policy: not turned by: 11:59pm of due date: 25% deduction 2 days late: 50% 3 days late: 75% 4 or more days: 100% Add a Readme when using bonus days Academic Honesty Discussion with other students, past or present, is encouraged Any reference to assignments from previous terms or web postings is unacceptable Any copying of non-trivial code is unacceptable Non-trivial = more than a line or so Includes reading someone else s code and then going off to write your own. 3

Academic Honesty Penalties for academic dishonesty: Zero on the assignment for the first occasion Automatic failure of the course for repeat offenses Textbook: None Related graphics books: Graphics Shaders OpenGL Shading Language GPU Gems 1-3 Related general GPU books: Programming Massively Parallel Processors Patterns for Parallel Programming Do I need a GPU? Yes: NVIDIA GeForce 8 series or higher Demo: What GPU do I have? Demo: : What version of OpenGL/CUDA/OpenCL does it support? No Moore 100b - NVIDIA GeForce 9800s SIG Lab - NVIDIA GeForce 8800s, two GeForce 480s, and one Fermi Tesla 4

Aside: This class is about 3 things PERFORMANCE PERFORMANCE PERFORMANCE Parallel Sorting Exercise Ok, not really Also about correctness, -abilities, etc. Nitty Gritty real world wall-clock performance No Proofs! Slide curiosity of Milo Martin Credits David Kirk (NVIDIA) Wen-mei Hwu (UIUC) David Lubke Wolfgang Engel Etc. etc. What is a GPU? GPU: Graphics Processing Unit Processor that resides on your graphics card. GPUs allow us to achieve the unprecedented graphics capabilities now available in games 5

What is a GPU? Why Program the GPU? Demo: NVIDIA GTX 400 Demo: Triangle throughput Chart from: http://ixbtlabs.com/articles3/video/cuda-1-p1.html Why Program the GPU? Compute Intel Core i7 4 cores 100 GFLOP NVIDIA GTX280 240 cores 1 TFLOP Memory Bandwidth System Memory 60 GB/s NVIDIA GT200 150 GB/s Install Base Over 200 million NVIDIA G80s shipped How did this happen? Games demand advanced shading Fast GPUs = better shading Need for speed = continued innovation The gaming industry has overtaken the defense, finance, oil and healthcare industries as the main driving factor for high performance processors. 6

GPU = Fast co-processor? GPU speed increasing at cubed-moore s Law. This is a consequence of the data-parallel streaming aspects of the GPU. GPUs are cheap! Put a couple together, and you can get a super-computer. So can we use the GPU for general-purpose computing? NYT May 26, 2003: TECHNOLOGY; From PlayStation to Supercomputer for $50,000: National Center for Supercomputing Applications at University of Illinois at Urbana-Champaign builds supercomputer using 70 individual Sony Playstation 2 machines; project required no hardware engineering other than mounting Playstations in a rack and connecting them with high-speed network switch Yes! Wealth of applications Data Analysis Motion Planning Voronoi Diagrams Geometric Optimization Physical Simulation Matrix Multiplication Conjugate Gradient Finance Optimization Planning Image Processing Signal Processing Particle Systems Force-field simulation Molecular Dynamics Graph Drawing Database queries Radar, Sonar, Oil Exploration Sorting and Searching Range queries and graphics too!! When does GPU=fast co-processor work? Real-time visualization of complex phenomena The GPU (like a fast parallel processor) can simulate physical processes like fluid flow, n-body n systems, molecular dynamics When does GPU=fast co- processor work? Interactive data analysis For effective visualization of data, interactivity is key In general: Massively Parallel Tasks 7

When does GPU=fast co-processor work? Rendering complex scenes Procedural shaders can offload much of the expensive rendering work to the GPU. Still not the Holy Grail of 80 million triangles at 30 frames/sec*, but it helps. * Alvy Ray Smith, Pixar. Stream Programming A stream is a sequence of data (could be numbers, colors, RGBA vectors, ) A kernel is a (fragment) program that runs on each element of a stream, generating an output stream (pixel buffer). Note: NVIDIA Quadro 5000 is calculated to push 950 million triangles per second http://www.nvidia.com/object/product-quadro-5000-us.html Stream Programming Kernel = vertex/fragment shader Input stream = stream of vertices, primitives, or fragments Output stream = frame buffer or other buffer (transform feedback) Multiple kernels = multi-pass rendering sequence on the GPU. To program the GPU, one must think of it as a (parallel) stream processor. 8

What is the cost of a stream program? Number of kernels Readbacks from the GPU to main memory are expensive, and so is transferring data to the GPU. Complexity of kernel More complexity takes longer to move data through a rendering pipeline Number of memory accesses Non-local memory access is expensive Number of branches Divergent branches are expensive What will this course cover? 1. Stream Programming Principles OpenGL programmable pipeline The principles of stream hardware How do we program with streams? 2. Shaders and Effects How do we compute complex effects found in today s games? Examples: Parallax Mapping Reflections Skin and Hair Particle Systems Deformable Mesh Morphing Animation 9

3. GPGPU / GPU Computing How do we use the GPU as a fast co-processor? GPGPU Languages: CUDA and OpenCL High Performance Computing Numerical methods and linear algebra: Inner products Matrix-vector operations Matrix-Matrix operations Sorting Fluid Simulations Fast Fourier Transforms Graph Algorithms And More At what point does the GPU become faster than the CPU for matrix operations? For other operations? 4. Optimizations How do we use the full potential of the GPU? What tools are there to analyze the performance of our algorithms? What we want you to get out of this course! 1. Understanding of the GPU as a graphics pipeline 2. Understanding of the GPU as a high performance compute device 3. Understanding of GPU architectures 4. Programming in GLSL, CUDA, and OpenCL 5. Exposure to many core graphics effects performed on GPUs 6. Exposure to many core parallel algorithms performed on GPUs 10