What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

Similar documents
CS 316: Multicore/GPUs

What does the Future Hold? Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

CS 316: Multicore/GPUs-II

Computer Graphics. Hardware Pipeline. Visual Imaging in the Electronic Age Prof. Donald P. Greenberg October 23, 2014 Lecture 16

Computer Graphics Software & Hardware. NBAY 6120 Lecture 6 Donald P. Greenberg March 16, 2016

Computer Graphics Software & Hardware. NBAY 6120 Lecture 6 Donald P. Greenberg March 16, 2017

Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

GPU Architecture. Michael Doggett Department of Computer Science Lund university

Multi-Processors and GPU

Outline Marquette University

CS 3410: Computer System Organization and Programming

CONSOLE ARCHITECTURE

CS427 Multicore Architecture and Parallel Computing

Computer Architecture!

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

Computer Architecture

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

CS Computer Architecture Spring Lecture 01: Introduction

Spring 2011 Prof. Hyesoon Kim

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

Threading Hardware in G80

Portland State University ECE 588/688. Graphics Processors

IT 252 Computer Organization and Architecture. Introduction. Chia-Chi Teng

Spring 2009 Prof. Hyesoon Kim

Computer Architecture!

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Antonio R. Miele Marco D. Santambrogio

Current Trends in Computer Graphics Hardware

Computer Architecture!

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

GRAPHICS HARDWARE. Niels Joubert, 4th August 2010, CS147

Multimedia in Mobile Phones. Architectures and Trends Lund

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CMPSCI 201: Architecture and Assembly Language

Standard Graphics Pipeline

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis

GPU-Based Volume Rendering of. Unstructured Grids. João L. D. Comba. Fábio F. Bernardon UFRGS

Computer Architecture Computer Architecture. Computer Architecture. What is Computer Architecture? Grading

Graphics Processing Unit Architecture (GPU Arch)

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

Accelerating CFD with Graphics Hardware

Mattan Erez. The University of Texas at Austin

GPU Architecture. Alan Gray EPCC The University of Edinburgh

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

Lecture 1: Gentle Introduction to GPUs

Computer Architecture

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN

CS 3410: Intro to Computer System Organization and Programming

Graphics and Imaging Architectures

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

Computer Graphics. Hardware Pipeline. Visual Imaging in the Electronic Age Prof. Donald P. Greenberg October 13, 2016 Lecture 15

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include

Introduction. Summary. Why computer architecture? Technology trends Cost issues

NVidia s GPU Microarchitectures. By Stephen Lucas and Gerald Kotas

Computer Architecture

1/26/09. Administrative. L4: Hardware Execution Model and Overview. Recall Execution Model. Outline. First assignment out, due Friday at 5PM

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

GPUs and GPGPUs. Greg Blanton John T. Lubia

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

Mattan Erez. The University of Texas at Austin

Parallel Computing: Parallel Architectures Jin, Hai

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

CS 3410 Computer System Organization and Programming

Mattan Erez. The University of Texas at Austin

CS 3410 Computer System Organization and Programming

CS4961 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/30/11. Administrative UPDATE. Mary Hall August 30, 2011

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CME 213 S PRING Eric Darve

Windowing System on a 3D Pipeline. February 2005

MANY-CORE COMPUTING. 7-Oct Ana Lucia Varbanescu, UvA. Original slides: Rob van Nieuwpoort, escience Center

3D Computer Games Technology and History. Markus Hadwiger VRVis Research Center

Introduction to CUDA (1 of n*)

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

CS516 Programming Languages and Compilers II

GPU Architecture and Function. Michael Foster and Ian Frasch

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

ECE 571 Advanced Microprocessor-Based Design Lecture 18

INF5063: Programming heterogeneous multi-core processors Introduction

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort

Lecture 1: Introduction and Basics

Computer Architecture. Fall Dongkun Shin, SKKU

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

How to Write Fast Code , spring th Lecture, Mar. 31 st

COMP 605: Introduction to Parallel Computing Lecture : GPU Architecture

The NVIDIA GeForce 8800 GPU

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university

The Rasterization Pipeline

Transcription:

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others

Final Project Demo Sign-Up: Will be posted outside my office after lecture today. 10 time slots on Wednesday, May 19 24 time slots on Thursday, May 20 12 time slots on Friday, May 21 CMS submission due: Afternoon of Friday, May 21 2

Moore s Law introduced in 1965 Number of transistors that can be integrated on a single die would double every 18 to 24 months (i.e., grow exponentially with time). Amazingly visionary 2300 transistors, 1 MHz clock (Intel 4004) - 1971 16 Million transistors (Ultra Sparc III) 42 Million transistors, 2 GHz clock (Intel Xeon) 2001 55 Million transistors, 3 GHz, 130nm technology, 250mm2 die (Intel Pentium 4) 2004 290+ Million transistors, 3 GHz (Intel Core 2 Duo) 2007 Moore s Law 3

Performance (SPEC Int) Processor Performance Increase 10000 1000 100 Slope ~1.7x/year Intel Pentium 4/3000 Intel Xeon/2000 DEC Alpha 21264A/667 DEC Alpha 21264/600 DEC Alpha 5/500 DEC Alpha 5/300 DEC Alpha 4/266 DEC AXP/500 IBM POWER 100 10 1 IBM RS6000 MIPS M2000 MIPS M/120 SUN-4/260 HP 9000/750 1987 1989 1991 1993 1995 1997 1999 2001 2003 Year 4

5

Brief History Display The dark ages (early-mid 1990 s), when there were only frame buffers for normal PC s. Rasterization Some accelerators were no more than a simple chip that sped up linear interpolation along a single span, so increasing fill rate. Projection & Clipping This is where pipelines start for PC commodity graphics, prior to Fall of 1999. Transform & Lighting This part of the pipeline reaches the consumer level with the introduction of the NVIDIA GeForce256. Application Hardware today is moving traditional application processing (surface generation, occlusion culling) into the graphics accelerator. 6

FIGURE A.2.1 Historical PC. VGA controller drives graphics display from framebuffer memory. Copyright 2009 Elsevier, Inc. All rights reserved. 7

8

Peak Performance ( 's/sec) One-pixel polygons (~10M polygons @ 30Hz) Faster than Moore s Law 10 9 Slope ~2.4x/year (Moore's Law ~ 1.7x/year) nvidia G70 10 8 10 7 10 6 10 5 UNC Pxpl4 HP CRX 10 4 Flat shading SGI Iris HP VRX SGI GT SGI VGX UNC Pxpl5 SGI SkyWriter HP TVRX Stellar GS1000 Gouraud shading SGI RE1 E&S F300 SGI RE2 Antialiasing Year UNC/HP PixelFlow Division Pxpl6 Textures Megatek E&S Freedom 86 88 90 92 94 96 98 00 SGI IR Accel/VSIS Division VPX E&S Harmony Graph courtesy of Professor John Poulton (from Eric Haines) Voodoo SGI R-Monster Nvidia TNT GeForce 3DLabs SGI Glint Cobalt PC Graphics ATI Radeon 256 9

NVidia Tesla Architecture 10

Why are GPUs so fast? FIGURE A.3.1 Direct3D 10 graphics pipeline. Each logical pipeline stage maps to GPU hardware or to a GPU processor. Programmable shader stages are blue, fixed-function blocks are white, and memory objects are grey. Each stage processes a vertex, geometric primitive, or pixel in a streaming dataflow fashion. Copyright 2009 Elsevier, Inc. All rights reserved. Pipelined and parallel Very, very parallel: 128 to 1000 cores 11

FIGURE A.2.5 Basic unified GPU architecture. Example GPU with 112 streaming processor (SP) cores organized in 14 streaming multiprocessors (SMs); the cores are highly multithreaded. It has the basic Tesla architecture of an NVIDIA GeForce 8800. The processors connect with four 64-bit-wide DRAM partitions via an interconnection network. Each SM has eight SP cores, two special function units (SFUs), instruction and constant caches, a multithreaded instruction unit, and a shared memory. Copyright 2009 Elsevier, Inc. All rights reserved. 12

Can we use these for general computation? Scientific Computing MATLAB codes Convex hulls Molecular Dynamics Etc. General computing with GPUs NVIDIA s answer: Compute Unified Device Architecture (CUDA) MATLAB/Fortran/etc. C for CUDA GPU Codes 13

AMD s Answer: Hybrid CPU/GPU AMDs Hybrid CPU/GPU 14

IBM/Sony/Toshiba Cell Sony Playstation 3 PPE SPEs (synergestic) 15

Must exploit parallelism for performance Lots of parallelism in graphics applications Lots of parallelism in scientific computing SIMD: single instruction, multiple data Perform same operation in parallel on many data items Data parallelism MIMD: multiple instruction, multiple data Run separate programs in parallel (on different data) Task parallelism Parallelism 16

Millions of Computers Where is the Market? 1200 1000 800 Embedded Desktop Servers 892 862 1122 600 488 400 290 200 0 1998 1999 2000 2001 2002 17

Millions of Computers Where is the Market? 1200 1000 800 Embedded Desktop Servers 892 862 1122 600 488 400 200 0 290 93 114 135 129 131 1998 1999 2000 2001 2002 18

Millions of Computers Where is the Market? 1200 1000 800 Embedded Desktop Servers 892 862 1122 600 488 400 200 0 290 93 114 135 129 131 3 3 4 4 5 1998 1999 2000 2001 2002 19

20

Smart Dust. Where to? 21

Smart Cards Security? 22

Cryptography and security TPM 1.2 IBM 4758 Secure Cryptoprocessor 23

Games Embedded Computing Smart Dust & Sensor Networks Graphics Scientific Computing Cryptography Security Quantum Computing? 24

How useful is this class, in all seriousness, for a computer scientist going into software engineering, meaning not low-level stuff? Survey Questions How much of computer architecture do software engineers actually have to deal with? What are the most important aspects of computer architecture that a software engineer should keep in mind while programming? 25

These days, programs run on hardware... more than ever before Why? Google Chrome Operating Systems Multi-Core & Hyper-Threading Datapath Pipelines, Caches, MMUs, I/O & DMA Busses, Logic, & State machines Gates Transistors Silicon Electrons 26

CS 3110: Better concurrent programming Where to? CS 4410: The Operating System! CS 4450: Networking CS 4620: Graphics CS 4821: Quantum Computing And many more 27

Thank you! If you want to make an apple pie from scratch, you must first create the universe. Carl Sagan 28