Introduction to Parallel Scientific Computing
|
|
- Susanna Hodge
- 6 years ago
- Views:
Transcription
1 Introduction to Parallel Scientific Computing TDT Lect. 3 & 4 Anne C. Elster Dept. of Computer & Info. Sci. (IDI) Norwegian Univ. of Science Technology Trondheim, Norway 8-11 Aug Anne C. Elster NTNU/IDI 1
2 KDS IDI 2011 Studieprogram vs. seksjoner Komplekse Datasystemer
3 TDT 4200 Fall 2011 Instructor: Dr. Anne C. Elster Support staff: Vit.ass.: TBA Und. ass.: Ruben Spaans Web page: Lectures: Wednesdays 10:15-12:00 in F3 (may move due to class size?) Thursdays 113:1 5-14:00 in F3 Recitation (øvingstimer): Thursdays 14:15-16:00 in F3 It s Learning!
4 Courses Taught by Dr. Elster: Beregningsvitenskap. TDT4200 Parallel Computing (Parallel programming with MPI & threads) TDT24 Parallell environments & Numerical Computing - 2-day IBM CellBE Course (Fall 2007) - GPU & Thread programming TDT 4205 Compilers DTD 8117 Grid and Heterogeneous Computing
5 HPC History: Personal perspective 1980 s: Concurrent and Parallel Pascal 1986: Intel ipsc Hypercube CMI (Bergen) and Cornell (Cray at NTNU) 1987: Cluster of 4 IBM 3090s : Intel hypercubes Some on BBN : KSR (MPI1 & 2) : SGI systems (some IBM SP) 2001-current: Clusters 2006: IBM NTNU (Njord, 7+ TFLOPS, proprietory switch) GPU programming (Cg) 2008: Quadcore Supercomputer at UiTø (Stallo) HPC-LAB at IDI/NTNU opens with several NVIDIA donation Several quad-core machines (1-2 donated by Schlumberger) 2009: More NVIDIA donations: NVIDIA Tesla s1070 and two Quadro FX 5800 cards (jan 09) 8-11 Anne C. Elster NTNU/IDI 5
6 Hypercubes Distributed processor systems with log n processor connection pattern Intel ipsc Connection Machine Maps tree structures well using Gray codes (each dimension in cube maps to a level in the tree) 8-11 Anne C. Elster NTNU/IDI 6
7 Shared vs. Distributed Memory Processors Processors Shared Mem Distrubuted Memory
8 Replicated vs. Distributed grids Pros: - Easier to implement - Load balanced Con: - Does not scale well (e.g. need to sum grids) Pros: - Scales much better, especially for large problems Con: - More complicated message passing - May need load-balancing of particles 8
9 9 COT -based SUPERCOMPUTER HARDWARE TRENDS: Intel ipsc (mid-1980 s) The first ipsc had no separate communication processor... Specialized OS nodes Today s PC clusters Fast Ethernet or better (more expensive interconnect) Linux OS 32-bit cheapest, but many 64-bit cluster vendors Top500 supercomputers Today s GPU farms entering Top500 list!! 9 Anne C. Elster TDT 4200 Parallel Computing intro Lect 3-4, IDI/NTNU, Aug 2011
10 10 HPC Hardware Trends at IDI Clustis3 (Quad-core cluster) Beregningsvitenskap Installed Spring (9) stk ProLiant DL160GS Server with: 2 stk E5405 2,0GHz Quad-Core 9GB FB-dimm memory 160GB SATA disc 2 stk GbE network cards 10 TDT 4200 Parallel Computing -- Anne C. Elster
11 11 HPC Hardware Trends at IDI Beregningsvitenskap NVIDA 280 Tesla card Unpacking NVIDA s1070 and Quadro FX 5800 cards 11 TDT 4200 Parallel Computing -- Anne C. Elster
12 New architectural features to consider: Register usage On-chip memory Cache / stack manipulation Precision effects Multiplication time addition time Note: Recursion benefits from on-chip registers available for stack 8-11 TDT 4200 Parallel Computing -- Anne C. Elster 12
13 13 Memory Hierachy Registers Cache ( Level 1-3) On-chip RAM RAM Disks - SSD Disks - HD Slower Tapes (Robot storage) 8-11 TDT 4200 Parallel Computing Anne C. Elster
14 14 Memory Access Floating point optimization: Factor 2 (in-cache) Memory access optimizations: Factor 10 or more!! (out of cache data) Much more for RAM vs. disk!! 8-11 TDT 4200 Parallel Computing Anne C. Elster
15 Main HW/SW Challenges: Slow interconnects (improving, but at a cost...) Slow protocols (TCP/IP VIA/new technologies) MEMORY BANDWIDTH!!! 15 TDT 4200 Anne C. Elster
16 16 Multi-level Caching Access 2-3D cells for large grid in same cacheline can give large performance improvements E.g. 128 byte cache-line (max bit float.pt. no.s or 4x4 grid) Traditional: 2*16 cache hits = 32 Cell-caching: ((3*3)*1)+ ((3*3)*2) + 4 = 25 cache hits 25% improvement! 8-11 TDT 4200 Parallel Computing Anne C. Elster
17 Cluster technologies for HPC Advantage: Very cost-effective hardware since uses COTS (Commercial Of-The-Shelf) parts BUT: Typically much slower processor interconnects than traditional HPC systems What about usability? 17 TDT 4200 Parallel Computing Anne C. Elster
18 MESSAGE PASSING CAVEAT: Global operations have more severe impact on cluster performance than traditional supercomputers since communication between processors takes relatively more of the total execution time 18 TDT 4200 Parallel Computing Anne C. Elster
19 HARDWARE TRENDS CONTIN.: 32-bit 64-bit architectures 1 CPU multiple CPUs (2-4) THE WAL-MART EFFECT: game stations (e.g. Playstation-2 farm at UIUC) graphics cards Low-power COTS devices?? 19 TDT 4200 Parallel Computing Anne C. Elster
20 The Ideal Cluster -- Hardware High-bandwidth network Low-latency network Low Operating System overhead (TCP causes slow start ) Great floating-point performance (64-bit processors or more?) 20 TDT 4200 Parallel Computing Anne C. Elster
21 The Ideal Cluster -- Software Compiler that is: Portable Optimizing Do extra work to save communication Self-tuning /Load -balanced Automatic selection of best algorithm One-sided communication support? Optimized middleware 21 TDT 4200 Parallel Computing Anne C. Elster
22 22 The Wal-Mart Effect (PARA02) Wal-Mart bigger than Sears, K-mart and JC Penney s combined predicted to influence $40 billion of IT investments (MIT Review) has much more impact than Microsoft and Cisco could ever hope for Not driven my latest technology, but by business model bad news for HPC? Game market --> HPC market Future high-performance chips and systems --> NVIDIA Tesla! 22 TDT 4200 Parallel Computing -- Anne C. Elster
23 23 COT -based SUPERCOMPUTER HARDWARE TRENDS: Intel ipsc (mid-1980 s) The first ipsc had no separate communication processor... Specialized OS nodes Today s PC clusters Fast Ethernet or better (more expensive interconnect) Linux OS 32-bit cheapest, but many 64-bit cluster vendors Top500 supercomputers Today s GPU farms entering Top500 list.. 23 TDT 4200 Parallel Computing -- Anne C. Elster
24 24 Main HW/SW Challenges: Slow interconnects (improving, but at a cost...) Slow protocols (TCP/IP VIA/new technologies) MEMORY BANDWIDTH!!! 24 TDT 4200 Parallel Computing -- Anne C. Elster
25 25 MPI (Message Passing Interface) < Communication routines standard developed for multiprocessor systems and clusters of workstations Orginally targeted Fortran and C Now also C++ Newer strains: OpenMPI and MPI-Java 8-11 Anne C. Elster 25 NTNU/IDI TDT 4200 Parallel Computing -- Anne C. Elster 25
26 26 What is MPI? -- continued Message passing model Standard (specification) Many implementations MPICH was first most widely used OpenMPI currently most used impl.? Two phases: MPI 1: Traditional message-passing MPI 2: Remote memory (one-sided communications), parallel I/O, and dynamic processes TDT 4200 Parallel Computing -- Anne C. Elster
27 27 Notes on black board re. MPI basics From A User s Guide to MPI by Peter Pacheco 1. Intro 2. Greetings! 3. Collective Communication 4. Grouping Data for Communication 27 TDT 4200 Parallel Computing -- Anne C. Elster
28 28 LIBRARIES: MPICH (ANL) public domain (working with LBNL on VIA version) MPI LAM (more MPI-2 features) public domain MPI-FM (UIUC/UCSD) public domain MPICH built on top of Fast Messages MPI/Pro (MPI Technologies, Inc) commercial (working on VIA version) PaTENT MPI 4.0 (Genias GmbH) commercial MPI for Windows NT (PaTENT = Tool Envirnonment for NT) SCALI, Norway commercial MPI from MESH Technologies (Brian Vinter) commercial Threaded MPI (Penti Hutnanen, others) OpenMP for clusters (B. Champman), Hybrid OpenMP/MPI 28 TDT 4200 Parallel Computing -- Anne C. Elster
29 29 GPUs: Graphical Processor Units HISTORY: Late 70 s/ Early 80 s: Grafic drawing calculations on CPUs Xerox Alto computer: first special bit block transfer instruct Comodore Amiga: first mass-market video accelerator able to draw fills shapes & animations in HW. Graphics sub-system w/ several chips, incl. Dedicated to bit blk xfer Early 90 s: 2D accelleration Ca. 1995: VIDEO GAMES! --> 3D GPUs 29 TDT 4200 Parallel Computing -- Anne C. Elster
30 30 GPU History continued: : 3D rasterization (converting simple 3D geometric primitives (e.g. lines, triangles, rectangles) to 2D screen pixels) Texture mapping (mapping 2D texture image to planar 3D surface) : 3D translation, rotation & scaling Towards 2000: GPUs more configurable, 2001 and beyond: programmable individual pixels) (ability to change 30 TDT 4200 Parallel Computing -- Anne C. Elster
31 31 Limitations Branching usually not a good idea GPU cache is different from CPU cache Optimized for 2D locality Random memory access problematic Floating point precision No integers or booleans (also currently no bit-wise operators, but Cg reerved symbols for these) 31 TDT 4200 Parallel Computing -- Anne C. Elster
32 32 GPU: general programming view Programmable MIMD processor: the vertex processor (one vector & once scalar/clock cycle Rasterizer: pass thru or interpolate values (e.g. passing 4 coordinates to draw rectangle leads to interpolation of pixel coordinates of vertices Programmable SIMD processor (fragment processor w/ up to 32 ops/cycle) Simple blending unit (serial) - z-compares and sends to memory
33 33 GPU -- Outside view Memory Programmable MIMD proc Rasterization Programmable SIMD processor Blend output
34 34 GPU Internal Structure
35 35 General programming on GPUs Rendering = executing GPU textures = CPU arrays Fragment shader programs = inner loops Rendering to texture memory = feedback Vertex coordinates = computational range Texture coordinates = Computational domain Now have NVIDIA s CUDA library! (BLAS & FFT)
36 36 Limitations Branching usually not a good idea GPU cache is different from CPU cache Optimized for 2D locality Random memory access problematic Floating point precision No integers or booleans (also currently no bit-wise operators, but Cg reerved symbols for these) 36 TDT 4200 Parallel Computing -- Anne C. Elster
37 37 SNOW SIMULATION DEMO! Robin Eidissen (Teaching Assitant) 37 TDT 4200 Parallel Computing -- Anne C. Elster
38 38 38 TDT 4200 Parallel Computing -- Anne C. Elster
39 Modularizing Large Codes Split large codes into separate independent modules (e.g. Initializer, solvers, trackers, etc.) Easer to maintain and debug Allows use of external packages (BLAS, LAPACK, PETSc) Can use code as test-bed for part of future codes 8-11 Anne C. Elster NTNU/IDI 39
Fra superdatamaskiner til grafikkprosessorer og
Fra superdatamaskiner til grafikkprosessorer og Brødtekst maskinlæring Prof. Anne C. Elster IDI HPC/Lab Parallel Computing: Personal perspective 1980 s: Concurrent and Parallel Pascal 1986: Intel ipsc
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationReal-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010
1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationHeadline in Arial Bold 30pt. Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008
Headline in Arial Bold 30pt Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008 Agenda Visualisation Today User Trends Technology Trends Grid Viz Nodes Software Ecosystem
More informationLecture 25: Board Notes: Threads and GPUs
Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel
More informationGPU Architecture. Alan Gray EPCC The University of Edinburgh
GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From
More informationAdministrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting
CIS 565: GPU Programming and Architecture Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider and Patrick Cozzi Meeting Monday and Wednesday 6:00 7:30pm Moore 212 Recorded lectures upon
More informationECE 574 Cluster Computing Lecture 16
ECE 574 Cluster Computing Lecture 16 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 26 March 2019 Announcements HW#7 posted HW#6 and HW#5 returned Don t forget project topics
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationGPUs and GPGPUs. Greg Blanton John T. Lubia
GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationWhat Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others
What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others Final Project Demo Sign-Up: Will be posted outside my office after lecture today.
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationLecture 3: Intro to parallel machines and models
Lecture 3: Intro to parallel machines and models David Bindel 1 Sep 2011 Logistics Remember: http://www.cs.cornell.edu/~bindel/class/cs5220-f11/ http://www.piazza.com/cornell/cs5220 Note: the entire class
More informationGraphics Processing Unit Architecture (GPU Arch)
Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics
More informationExperts in Application Acceleration Synective Labs AB
Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg
More informationAbout Phoenix FD PLUGIN FOR 3DS MAX AND MAYA. SIMULATING AND RENDERING BOTH LIQUIDS AND FIRE/SMOKE. USED IN MOVIES, GAMES AND COMMERCIALS.
About Phoenix FD PLUGIN FOR 3DS MAX AND MAYA. SIMULATING AND RENDERING BOTH LIQUIDS AND FIRE/SMOKE. USED IN MOVIES, GAMES AND COMMERCIALS. Phoenix FD core SIMULATION & RENDERING. SIMULATION CORE - GRID-BASED
More informationApproaches to Parallel Computing
Approaches to Parallel Computing K. Cooper 1 1 Department of Mathematics Washington State University 2019 Paradigms Concept Many hands make light work... Set several processors to work on separate aspects
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationN-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo
N-Body Simulation using CUDA CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo Project plan Develop a program to simulate gravitational
More informationSplotch: High Performance Visualization using MPI, OpenMP and CUDA
Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,
More informationWhat are Clusters? Why Clusters? - a Short History
What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by
More informationCornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary
Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008
More informationGraphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics
Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationFast Tridiagonal Solvers on GPU
Fast Tridiagonal Solvers on GPU Yao Zhang John Owens UC Davis Jonathan Cohen NVIDIA GPU Technology Conference 2009 Outline Introduction Algorithms Design algorithms for GPU architecture Performance Bottleneck-based
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationCS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:
CS550 Advanced Operating Systems (Distributed Operating Systems) Instructor: Xian-He Sun Email: sun@iit.edu, Phone: (312) 567-5260 Office hours: 1:30pm-2:30pm Tuesday, Thursday at SB229C, or by appointment
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history
More informationECE 571 Advanced Microprocessor-Based Design Lecture 20
ECE 571 Advanced Microprocessor-Based Design Lecture 20 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 12 April 2016 Project/HW Reminder Homework #9 was posted 1 Raspberry Pi
More informationBlue-Steel Ray Tracer
MIT 6.189 IAP 2007 Student Project Blue-Steel Ray Tracer Natalia Chernenko Michael D'Ambrosio Scott Fisher Russel Ryan Brian Sweatt Leevar Williams Game Developers Conference March 7 2007 1 Imperative
More informationParallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)
Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Today Finishing up from last time Brief discussion of graphics workload metrics
More informationGPU Fundamentals Jeff Larkin November 14, 2016
GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationMultiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationParallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008
Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008 Serial Computing Serially solving a problem Parallel Computing Parallelly solving a problem Parallel Computer Memory Architecture Shared
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University Project3 Cache Race Games night Monday, May 4 th, 5pm Come, eat, drink, have fun and be merry! Location: B17 Upson Hall
More informationBİL 542 Parallel Computing
BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationFrom Brook to CUDA. GPU Technology Conference
From Brook to CUDA GPU Technology Conference A 50 Second Tutorial on GPU Programming by Ian Buck Adding two vectors in C is pretty easy for (i=0; i
More informationECE 574 Cluster Computing Lecture 1
ECE 574 Cluster Computing Lecture 1 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 22 January 2019 ECE574 Distribute and go over syllabus http://web.eece.maine.edu/~vweaver/classes/ece574/ece574_2019s.pdf
More information6.1 Multiprocessor Computing Environment
6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,
More informationIntroduction to CELL B.E. and GPU Programming. Agenda
Introduction to CELL B.E. and GPU Programming Department of Electrical & Computer Engineering Rutgers University Agenda Background CELL B.E. Architecture Overview CELL B.E. Programming Environment GPU
More informationCS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter
More informationPart IV. Review of hardware-trends for real-time ray tracing
Part IV Review of hardware-trends for real-time ray tracing Hardware Trends For Real-time Ray Tracing Philipp Slusallek Saarland University, Germany Large Model Visualization at Boeing CATIA Model of Boeing
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationChapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348
Chapter 1 Introduction: Part I Jens Saak Scientific Computing II 7/348 Why Parallel Computing? 1. Problem size exceeds desktop capabilities. Jens Saak Scientific Computing II 8/348 Why Parallel Computing?
More informationGPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017
1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA How Can You Gain Access to GPU Power? 3
More informationGPU 101. Mike Bailey. Oregon State University
1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA 1 How Can You Gain Access to GPU Power?
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationGPU for HPC. October 2010
GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,
More informationWindowing System on a 3D Pipeline. February 2005
Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April
More informationGraphics Processor Acceleration and YOU
Graphics Processor Acceleration and YOU James Phillips Research/gpu/ Goals of Lecture After this talk the audience will: Understand how GPUs differ from CPUs Understand the limits of GPU acceleration Have
More informationAuto-tunable GPU BLAS
Auto-tunable GPU BLAS Jarle Erdal Steinsland Master of Science in Computer Science Submission date: June 2011 Supervisor: Anne Cathrine Elster, IDI Norwegian University of Science and Technology Department
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationGraphics Hardware. Instructor Stephen J. Guy
Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!
More informationIntroduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014
Introduction to MPI EAS 520 High Performance Scientific Computing University of Massachusetts Dartmouth Spring 2014 References This presentation is almost an exact copy of Dartmouth College's Introduction
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationNVidia s GPU Microarchitectures. By Stephen Lucas and Gerald Kotas
NVidia s GPU Microarchitectures By Stephen Lucas and Gerald Kotas Intro Discussion Points - Difference between CPU and GPU - Use s of GPUS - Brie f History - Te sla Archite cture - Fermi Architecture -
More informationHigh Performance Computing (HPC) Introduction
High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing
More informationCSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com History of GPUs
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationGPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.
Basics of s Basics Introduction to Why vs CPU S. Sundar and Computing architecture August 9, 2014 1 / 70 Outline Basics of s Why vs CPU Computing architecture 1 2 3 of s 4 5 Why 6 vs CPU 7 Computing 8
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationGPGPU introduction and network applications. PacketShaders, SSLShader
GPGPU introduction and network applications PacketShaders, SSLShader Agenda GPGPU Introduction Computer graphics background GPGPUs past, present and future PacketShader A GPU-Accelerated Software Router
More informationCOMP Preliminaries Jan. 6, 2015
Lecture 1 Computer graphics, broadly defined, is a set of methods for using computers to create and manipulate images. There are many applications of computer graphics including entertainment (games, cinema,
More informationCloth Simulation on the GPU. Cyril Zeller NVIDIA Corporation
Cloth Simulation on the GPU Cyril Zeller NVIDIA Corporation Overview A method to simulate cloth on any GPU supporting Shader Model 3 (Quadro FX 4500, 4400, 3400, 1400, 540, GeForce 6 and above) Takes advantage
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationSpring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University
18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?
More informationCourse Recap + 3D Graphics on Mobile GPUs
Lecture 18: Course Recap + 3D Graphics on Mobile GPUs Interactive Computer Graphics Q. What is a big concern in mobile computing? A. Power Two reasons to save power Run at higher performance for a fixed
More informationCurrent Trends in Computer Graphics Hardware
Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationHigh Performance Computing Course Notes Course Administration
High Performance Computing Course Notes 2009-2010 2010 Course Administration Contacts details Dr. Ligang He Home page: http://www.dcs.warwick.ac.uk/~liganghe Email: liganghe@dcs.warwick.ac.uk Office hours:
More informationParallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein
Parallel & Cluster Computing cs 6260 professor: elise de doncker by: lina hussein 1 Topics Covered : Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster
More information