Introduction to High-Performance Computing
|
|
- Morris Robertson
- 5 years ago
- Views:
Transcription
1 Introduction to High-Performance Computing Simon D. Levy BIOL November 2010 Chapter 12
2 12.1: Concurrent Processing
3 High-Performance Computing A fancy term for computers significantly faster than your average desktop machine (Dell, Mac) For most computational modelling, High Productivity Computing (C. Moler) is more important (human time more costly than machine time). But there will always be applications (like Gromacs) for super-fast computers, so HPC is worth knowing about
4 Background: Moore s Law Moore s Law: Computing power (number of transistors, or switches, basic unit of computation) available at a given price doubles roughly every 18 months Morgan Sparks ( ) with an early transistor
5 Background: Moore s Law
6 Computer Architecture Basics Architecture is used in two different senses in computer science 1. Processor Architecture (Pentium architecture, RISC architecture, etc.): the basic instruction set (operations) provided by a given chip 2. Layout of CPU & Memory (& disk) We will use the latter (more common) sense
7 Computer Architecture Basics Cost per Byte CENTRAL PROCESSING UNIT (RANDOM ACCESS) MEMORY DISK Access Speed
8 Spreadsheet Example Double-click on (open) document: loads spreadsheet data and program (Excel) from disk into memory Type a formula (= A1*B3 > C2) and hit return: 1. Numbers are loaded into CPU s registers from memory 2. CPU performs arithmetic & logic to compute answer (ALU = Arithmetic / Logic Unit) 3. Answer is copied out to memory (& displayed) Frequently accessed memory areas may be stored in CPU s cache Hit Save: memory is copied back to disk
9 Sequential Processing From an HPC perspective, the important things are CPU, memory, and how they are connected. Standard desktop machine is (until recently!) sequential: one CPU, one memory, one task at a time: CPU Memory
10 Concurrent Processing The dream has always been to break through the von Neumann bottleneck and do more than one computation at a given time Two basic varieties Parallel Processing: several CPUs inside the same hardware box Distributed Processing: multiple CPUs connected over a network
11 Parallel Processing: A Brief History In general, the lesson is that it is nearly impossible to make money from special-purpose parallel hardware boxes 1980 s s: Yesterday s HPC is tomorrow s doorstop Connection Machine MasPar Japan s Fifth Generation The revenge of Moore s Law by the time you finish building the $$$$ supercomputer, the $$ computer is fast enough (though there was always a market for supercomputers like the Cray)
12 Supercomputers of Yesteryear Connection Machine CM-1 (1985) Cray YM-P (1988) MasPar MP-1 (1990)
13 Distributed Processing: A Brief (er) History 1990 s s: Age of the cluster Beowulf: lots of commodity (inexpensive) desktop machines (Dell) wired together in a rack with fast connections, running Linux (free, open-source OS) Cloud Computing: The internet is the computer (like Gmail, but for computing services)
14 Today: Back to Parallel Processing Clusters take up lots of room, require lots of air conditioning, and require experts to build, maintain, & program Cloud Computing sabotaged by industry hype (c.f. Larry Ellison rant) Sustaining Moore s Law requires increasingly sophisticated advanced in semiconductor physics
15 Today: Back to Parallel Processing Two basic directions Multicore / multiprocessor machines : lots of little CPUs inside your desktop/laptop computer Inexpensive special-purpose hardware like Graphical Processing Units
16 Multiprocessor Architectures Two basic designs Shared memory multiprocessor: all processors can access all memory modules Message-passing multiprocessor Each CPU has its own memory CPU s pass messages around to request/provide computation
17 Shared Memory Multiprocessor CPU CPU CPU Connecting Network Memory Memory Memory
18 Message-Passing Multiprocessor Connecting Network CPU CPU CPU Memory Memory Memory
19 MPI: A Message-Passing Interface Interface: A specification for how to use the software Implemented mainly in C and FORTRAN mpirun -np X mdrun_mpi_d -deffnm md_0_1
20 Which is better? Scalability is Everything $1000 today $100 today, plus a way of making $100 more every day in the future? Scalability is the central question not just for hardware, but also for software and algorithms (think economy of scale )
21 Processes & Streams Process: an executing instance of a program (J. Plank) Instruction stream: sequence of instructions coming from a single process Data stream: sequence of data items on which to perform computation
22 Flynn s Four-Way Classification 1. SISD: Single Instruction stream, Single Data stream. You rarely hear this term, because it s the default (though this is changing) 2. MIMD: Multiple Instruction streams, Multiple Data streams Thread (of execution): lightweight process executing on some part of a multiprocessor GPU is probably best current exemplar
23 Flynn s Four-Way Classification 3. SIMD: Single Instruction stream, Multiple Data streams -- same operation on all data at once (like Matlab, though it s not (yet) truly SIMD) 4. MISD: Disagreement exists on whether this category has any systems Pipelining is perhaps an example: think of breaking weekly laundry into two loads, drying first load while washing second
24 Communication Pure parallelism : like physics without friction It s useful as a first approximation to pretend that processors don t have to communicate results But then you have to deal with the real issues
25 Granularity & Speedup Granularity: ratio of computation time to communication time Lots of tiny little computers (grains) means small granularity (because they have to communicate a lot) Speedup: how much faster is it to execute the program on n processors vs. 1 processor?
26 Linear Speedup In principle, maximum speedup is linear: n times faster on n processors This gives a decaying (k/n) exponential curve of execution time vs. processors Super-linear speedup is sometimes possible, if each of the processors can access memory more efficiently than a single processor (recall cache concept)
27 12.2: Parallel Algorithms
28 Some Problems Are Embarrassingly Parallel Embarrassingly Parallel (C. Moler): A problem in which the same operation is applied to all elements (e.g., of a grid), with little or no communication among elements
29 The Parallel Data Partition (Master/Slave) Approach Master process communicates with user and with slaves 1. Partition data into n chunks and send each chunk to one of the p slave processes 2. Receive partial answers from slaves 3. Put partial answers together into single answer (grid, sum, etc.) and report it to user
30 The Parallel Data Partition (Master/Slave) Approach Slave processes communicate with master 1. Receive one data chunk from master 2. Run the algorithm (grid initialization, update, sum, etc.) on the chunk 3. Send the result back to the master
31 Parallel Data Partition: Speedup Ignoring communication time, the theoretical speedup for n data items running on p slave processes is As n grows large, this value approaches p: speedup is linear in # of slaves.
32 Simple Parallel Data Partition Is Inefficient Communication time: Master has to send and receive p messages out one at a time, like dealing a deck of cards. Idle time: Master and some slaves are sitting idle until all slaves have computed their results
33 The Divide-and-Conquer Approach We can arrange things so that each process only sends/receives only two messages: less communication, less idling Think of dividing a pile of coins in half, then dividing each pile in half, till each pile has only one coin (or some fixed minimum # of coins) Example: summing 256 values with 8 processors...
34 Divide Phase p 0 : x 0 x 255 p 0 : p 4 : x 0 x 127 x 128 x 255 p 0 : p 2 : p 4 : p 6 : x 0 x 63 x 64 x 127 x 128 x 191 x 192 x 255 p 0 : p 1 : p 2 : p 3 : p 4 : p 5 : p 6 : p 7 : x 0 x 31 x 32 x 63 x 64 x 95 x 96 x 127 x 128 x 159 x 160 x 191 x 192 x 223 x 223 x 255
35 Conquer Phase p 0 : x 0 x 31 p 1 : x 32 x 63 p 2 : x 64 x 95 p 3 : x 96 x 127 p 4 : x 128 x 159 p 5 : x 160 x 191 p 6 : x 192 x 223 p 7 : x 223 x 255 p 0 : p 2 : p 4 : p 6 : x 0 x 63 x 64 x 127 x 128 x 191 x 192 x 255 p 0 : x 0 x 127 p 4 : x 128 x 255 p 0 : x 0 x 255
36 The N-Body Problem Consider a large number N of bodies interacting with each other through gravity (e.g. galaxy of stars) As time progresses, each body moves based on the gravitational forces acting on it from all the others: where m 1, m 2 are the masses of two objects, r is the distance between them, and G is Newton s Gravitational Constant.
37 The N-Body Problem Problem: for each of the N bodies, we must compute the force between it and the other N -1 bodies. This is N*(N-1)/2 = (N 2 N)/2 computations, which is proportional to N 2 as N grows large. Even with perfect parallelism, we still perform 1/ p * N 2 computations; i.e., still proportional to N 2.
38 The N-Body Problem: Barnes-Hut Solution Division by r 2 means that bodies distant from each other have relatively low mutual force. So we can focus on small clusters of stars for the formula, and then treat each cluster as a single body acting on other clusters This is another instance of divide-and-conquer. We will treat space as 2D for illustration purposes.
39
40
41 Divide
42 Divide
43 Build Labelled Hierarchy of Clusters a
44 Build Labelled Hierarchy of Clusters a b d c 11 8
45 Build Labelled Hierarchy of Clusters a b d e c 11 8
46 Produces a Quad-Tree a 1 b c d e Each node (circle) stores the total mass and center-of-mass coordinates for its members. If two nodes (e.g. 1, 5) are more than some predetermined distance apart, we use their clusters instead (1, e)
47 Barnes-Hut Solution: Speedup Each of the N bodies is on average compared with log N other bodies, so instead of N 2 we have N * log N Can have each processor do the movement of its cluster in parallel with others (no communication)
Unit 9 : Fundamentals of Parallel Processing
Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationComputer Architecture
Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationIntroduction to parallel computing
Introduction to parallel computing 2. Parallel Hardware Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Motherboard Processor https://sites.google.com/
More informationLet s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.
Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete
More informationIntroduction. CSCI 4850/5850 High-Performance Computing Spring 2018
Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel
More informationParallel Computing Introduction
Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices
More informationParallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor
Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationIntroduction to High Performance Computing
Introduction to High Performance Computing Gregory G. Howes Department of Physics and Astronomy University of Iowa Iowa High Performance Computing Summer School University of Iowa Iowa City, Iowa 25-26
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally
More informationCourse II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan
Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationMulti-core Programming - Introduction
Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
More informationrepresent parallel computers, so distributed systems such as Does not consider storage or I/O issues
Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines
More informationOnline Course Evaluation. What we will do in the last week?
Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationNormal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory
Parallel Machine 1 CPU Usage Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Solution Use multiple CPUs or multiple ALUs For simultaneous
More information3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:
BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationFundamentals of Computer Design
Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology
More informationFundamentals of Computers Design
Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2
More informationMultiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
More informationParallel Computing Why & How?
Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel
More informationHigh Performance Computing Systems
High Performance Computing Systems Shared Memory Doug Shook Shared Memory Bottlenecks Trips to memory Cache coherence 2 Why Multicore? Shared memory systems used to be purely the domain of HPC... What
More information10th August Part One: Introduction to Parallel Computing
Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationTest on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs)
Test on Wednesday! 50 minutes Closed notes, closed computer, closed everything Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs) Study notes and readings posted on course
More informationParallelism. CS6787 Lecture 8 Fall 2017
Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does
More informationEECS4201 Computer Architecture
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be
More informationParallel Computing Concepts. CSInParallel Project
Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................
More informationFundamentals of Quantitative Design and Analysis
Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists
More informationBİL 542 Parallel Computing
BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,
More informationObjectives of the Course
Objectives of the Course Parallel Systems: Understanding the current state-of-the-art in parallel programming technology Getting familiar with existing algorithms for number of application areas Distributed
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationParallel Programming. Presentation to Linux Users of Victoria, Inc. November 4th, 2015
Parallel Programming Presentation to Linux Users of Victoria, Inc. November 4th, 2015 http://levlafayette.com 1.0 What Is Parallel Programming? 1.1 Historically, software has been written for serial computation
More informationThe Art of Parallel Processing
The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a
More informationCPS311 Lecture: Parallelism November 29, Objectives:
Objectives: CPS311 Lecture: Parallelism November 29, 2011 To introduce Flynn s taxonomy 1. To introduce various SIMD approaches (Vector processors, MMX) 2. To introduce multicore CPU s 3. To introduce
More informationCA463 Concurrent Programming
CA463 Concurrent Programming Lecturer Dr. Martin Crane mcrane@computing.dcu.ie Office: L2.51 Ph: x8974 CA463D Lecture Notes (Martin Crane 2014) 1 Recommended Texts (online/in Library) www.computing.dcu.ie/~mcrane/ca463.html
More informationParallel Processors. Session 1 Introduction
Parallel Processors Session 1 Introduction Applications of Parallel Processors Structural Analysis Weather Forecasting Petroleum Exploration Fusion Energy Research Medical Diagnosis Aerodynamics Simulations
More informationMoore s Law. Computer architect goal Software developer assumption
Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationParallelism and Concurrency. COS 326 David Walker Princeton University
Parallelism and Concurrency COS 326 David Walker Princeton University Parallelism What is it? Today's technology trends. How can we take advantage of it? Why is it so much harder to program? Some preliminary
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message
More informationCA463 Concurrent Programming
CA463 Concurrent Programming Lecturer Dr. Martin Crane mcrane@computing.dcu.ie Office: L2.51 Ph: x8974 CA463D Lecture Notes (Martin Crane 2014) 1 Recommended Texts (online/in Library) www.computing.dcu.ie/~mcrane/ca463.html
More informationLecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter
Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)
More informationCS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2
CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationIssues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance
More informationParallel Computers. c R. Leduc
Parallel Computers Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Why Parallel Computing?
More informationMultiprocessors - Flynn s Taxonomy (1966)
Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The
More informationParallel Computing Basics, Semantics
1 / 15 Parallel Computing Basics, Semantics Landau s 1st Rule of Education Rubin H Landau Sally Haerer, Producer-Director Based on A Survey of Computational Physics by Landau, Páez, & Bordeianu with Support
More informationProcessor Architecture and Interconnect
Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing
More information3.3 Hardware Parallel processing
Parallel processing is the simultaneous use of more than one CPU to execute a program. Ideally, parallel processing makes a program run faster because there are more CPUs running it. In practice, it is
More informationParallel Systems. Introduction. Principles of Parallel Programming, Calvin Lin & Lawrence Snyder, Chapters 1 & 2
Parallel Systems Introduction Principles of Parallel Programming, Calvin Lin & Lawrence Snyder, Chapters 1 & 2 Jan Lemeire Parallel Systems September - December 2011 Goals of course Understand architecture
More informationAlternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model
What is Computer Architecture? Structure: static arrangement of the parts Organization: dynamic interaction of the parts and their control Implementation: design of specific building blocks Performance:
More informationNon-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.
CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says
More informationVon Neumann architecture. The first computers used a single fixed program (like a numeric calculator).
Microprocessors Von Neumann architecture The first computers used a single fixed program (like a numeric calculator). To change the program, one has to re-wire, re-structure, or re-design the computer.
More informationChapter 1: Introduction to Parallel Computing
Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing Jun Zhang Laboratory for High Performance Computing & Computer Simulation Department of Computer Science University of Kentucky
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More informationHigh-Performance and Parallel Computing
9 High-Performance and Parallel Computing 9.1 Code optimization To use resources efficiently, the time saved through optimizing code has to be weighed against the human resources required to implement
More informationChapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348
Chapter 1 Introduction: Part I Jens Saak Scientific Computing II 7/348 Why Parallel Computing? 1. Problem size exceeds desktop capabilities. Jens Saak Scientific Computing II 8/348 Why Parallel Computing?
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationCS377P Programming for Performance Multicore Performance Multithreading
CS377P Programming for Performance Multicore Performance Multithreading Sreepathi Pai UTCS October 14, 2015 Outline 1 Multiprocessor Systems 2 Programming Models for Multicore 3 Multithreading and POSIX
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationComputer Architecture Crash course
Computer Architecture Crash course Frédéric Haziza Department of Computer Systems Uppsala University Summer 2008 Conclusions The multicore era is already here cost of parallelism is dropping
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel
More informationCSE 260 Introduction to Parallel Computation
CSE 260 Introduction to Parallel Computation Larry Carter carter@cs.ucsd.edu Office Hours: AP&M 4101 MW 10:00-11 or by appointment 9/20/2001 Topics Instances Principles Theory Hardware specific machines
More informationParallel Programming Programowanie równoległe
Parallel Programming Programowanie równoległe Lecture 1: Introduction. Basic notions of parallel processing Paweł Rzążewski Grading laboratories (4 tasks, each for 3-4 weeks) total 50 points, final test
More informationHigh Performance Computing
The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical
More informationParallel Systems. Introduction. Principles of Parallel Programming, Calvin Lin & Lawrence Snyder, Chapters 1 & 2
Parallel Systems Introduction Principles of Parallel Programming, Calvin Lin & Lawrence Snyder, Chapters 1 & 2 Jan Lemeire Parallel Systems September - December 2010 Goals of course Understand architecture
More informationWhat are Clusters? Why Clusters? - a Short History
What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by
More informationTools and techniques for optimization and debugging. Fabio Affinito October 2015
Tools and techniques for optimization and debugging Fabio Affinito October 2015 Fundamentals of computer architecture Serial architectures Introducing the CPU It s a complex, modular object, made of different
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationChapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance
Chapter 7 Multicores, Multiprocessors, and Clusters Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level)
More informationMulticores, Multiprocessors, and Clusters
1 / 12 Multicores, Multiprocessors, and Clusters P. A. Wilsey Univ of Cincinnati 2 / 12 Classification of Parallelism Classification from Textbook Software Sequential Concurrent Serial Some problem written
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it!
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it! Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Memory Computer Technology
More informationWorkloads Programmierung Paralleler und Verteilter Systeme (PPV)
Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment
More informationChapter Seven. Idea: create powerful computers by connecting many smaller ones
Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:
More informationSchool of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor
School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Intro to HPC Architecture Instructor: Ekpe Okorafor A little about me! PhD Computer Engineering Texas A&M University Computer
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationRISC Processors and Parallel Processing. Section and 3.3.6
RISC Processors and Parallel Processing Section 3.3.5 and 3.3.6 The Control Unit When a program is being executed it is actually the CPU receiving and executing a sequence of machine code instructions.
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More information