Chapter 1: Perspectives

Size: px
Start display at page:

Download "Chapter 1: Perspectives"

Transcription

1 Chapter 1: Perspectives Yan Solihin Copyright notice: No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical, photocopying, recording, or otherwise) without the prior written permission of the author. An exception is granted for academic lectures at universities and colleges, provided that the following text is included in such copy: Source: Yan Solihin, Fundamentals of Parallel Computer Architecture, 2008.

2 Outline for Lecture 1 Introduction Types of parallelism Architectural trends Why parallel computers? Scope of CSC/ECE 506 Fundamentals of Computer Architecture - Chapter 1 2

3 Evolution in Microprocessors QuickTime and a decompressor are needed to see this picture. Fundamentals of Computer Architecture - Chapter 1 3

4 Key Points More and more components can be integrated on a single chip Speed of integration tracks Moore s law, doubling every months. Exercise: Look up how the number of transistors per chip has changed, esp. since Submit here. Until recently, performance tracked speed of integration At the architectural level, two techniques facilitated this: Instruction-level parallelism Cache memory Performance gain from uniprocessor system was high enough that multiprocessor systems were not viable for most uses. Fundamentals of Computer Architecture - Chapter 1 4

5 Illustration 100-processor system with perfect speedup Compared to a single processor system Year 1: 100x faster Year 2: 62.5x faster Year 3: 39x faster Year 10: 0.9x faster Single-processor performance catches up in just a few years! Even worse It takes longer to develop a multiprocessor system Low volume means prices must be very high High prices delay adoption Perfect speedup is unattainable Fundamentals of Computer Architecture - Chapter 1 5

6 Why did uniprocessor performance grow so fast? half from circuit improvement (smaller transistors, faster clock, etc.) half from architecture/organization: Instruction-level parallelism (ILP) Pipelining: RISC, CISC with RISC back-end Superscalar Out-of-order execution Memory hierarchy (caches) Exploit spatial and temporal locality Multiple cache levels Fundamentals of Computer Architecture - Chapter 1 6

7 But uniprocessor perf. growth is stalling Source of uniprocessor performance growth: ILP Parallel execution of independent instructions from a single thread ILP growth has slowed abruptly Memory wall: Processor speed grows at 55%/year, memory speed grows at 7% per year ILP wall: achieving higher ILP requires quadratically increasing complexity (and power) Power efficiency Thermal packaging limit vs. cost Fundamentals of Computer Architecture - Chapter 1 7

8 Types of parallelism Instruction level (cf. ECE 521) Pipelining A (a load) IF ID EX MEM WB B IF ID EX MEM WB C IF ID EX MEM WB Fundamentals of Computer Architecture - Chapter 1 8

9 Types of parallelism, cont. Superscalar/ VLIW Original: LD F0, 34(R2) ADDD F4, F0, F2 LD F7, 45(R3) ADDD F8, F7, F6 Schedule as: LD F0, 34(R2) LD F7, 45(R3) ADDD F4, F0, F2 ADDD F8, F0, F6 + Moderate degree of parallelism (sometimes 50) Requires fast communication (register level) Fundamentals of Computer Architecture - Chapter 1 9

10 Why ILP is slowing Branch-prediction accuracy is already > 90% Hard to improve it even more Number of pipeline stages is already deep ( stages) But critical dependence loops do not change Memory latency requires more clock cycles to satisfy Processor width is already high Quadratically increasing complexity to increase the width Cache size Effective, but also shows diminishing returns In general, the size must be doubled to reduce miss rate by a half Fundamentals of Computer Architecture - Chapter 1 10

11 Current trends: multicore and manycore Aspect Intel Clovertown AMD Barcelona IBM Cell # cores Clock frequency 2.66 GHz 2.3 GHz 3.2 GHz Core type OOO Superscalar OOO Superscalar 2-issue SIMD Caches 2x4MB L2 512KB L2 (private), 2MB L3 (shd) 256KB local store Chip power 120 watts 95 watts 100 watts Exercise: Browse the Web for information on more recent processors, and for each processor, fill out this form. (You can view the submissions here.) Fundamentals of Computer Architecture - Chapter 1 11

12 Historical perspectives 80s early 90s: Prime time for parallel architecture research A microprocessor cannot fit on a chip, so naturally need multiple chips (and processors) J-machine, M-machine, Alewife, Tera, HEP, DASH, etc. Exercise: Pick one of these machines, and identify a major innovation that it introduced. Submit here. 90s: At the low end, uniprocessor systems speed grows much faster than parallel systems speed A microprocessor fits on a chip. So do branch predictor, multiple functional units, large caches, etc.! Microprocessor also exploits parallelism (pipelining, multiple-issue, VLIW) parallelisms originally invented for multiprocessors Many parallel computer vendors went bankrupt Prestigious but small high-performance computing market Fundamentals of Computer Architecture - Chapter 1 12

13 If the automobile industry advanced as rapidly as the semiconductor industry, a Rolls Royce would get a half million miles per gallon and it would be cheaper to throw it away than to park it. Gordon Moore, Intel Corporation, 1998 Fundamentals of Computer Architecture - Chapter 1 13

14 Historical perspectives, cont. 90s: Emergence of distributed (vs. parallel) machines Progress in network technologies: Network bandwidth grows faster than Moore s law Fast interconnection networks getting cheap Connects cheap uniprocessor systems into a large distributed machine Network of Workstations, Clusters, Grid 00s: Parallel architectures are back! Transistors per chip >> microprocessor transistors Harder to get more performance from a uniprocessor SMT (Simultaneous multithreading), CMP (Chip Multi- Processor), ultimately Massive CMP E.g. Intel Pentium D, Core Duo, AMD Dual Core, IBM Power5, Sun Niagara. Fundamentals of Computer Architecture - Chapter 1 14

15 Parallel computers A parallel computer is a collection of processing elements that can communicate and cooperate to solve a large problem fast. [Almasi & Gottlieb] collection of processing elements How many? How powerful each? Scalability? Few very powerful (e.g., Altix) vs. many small ones (BlueGene) that can communicate How do PEs communicate? (shared memory vs. messagepassing) Interconnection network (bus, multistage, crossbar, ) Metrics: cost, latency, throughput, scalability, fault tolerance Fundamentals of Computer Architecture - Chapter 1 15

16 Parallel computers, cont. and cooperate Issues: granularity, synchronization, and autonomy Synchronization allows sequencing of operations to ensure correctness Granularity up parallelism down, communication down, overhead down Statement/instruction level: 2 10 instructions (ECE 521) Loop level: 10 1K instructions Task level: 1K 1M instructions Program level: > 1M instructions Autonomy SIMD (single instruction stream) vs. MIMD (multiple instruction streams) Fundamentals of Computer Architecture - Chapter 1 16

17 Parallel computers, cont. solve a large problem fast General- vs. special-purpose machine? Any machine can solve certain problems well What domains? Highly (embarassingly) parallel apps Many scientific codes Medium parallel apps Many engineering apps (finite-element, VLSI-CAD) Not parallel apps Compilers, editors (do we care?) Fundamentals of Computer Architecture - Chapter 1 17

18 Why parallel computers? Absolute performance: Can we afford to wait? Folding of a single protein takes years to simulate on the most advanced microprocessor. It only takes days on a parallel computer Weather forecast: timeliness is crucial Cost/performance Harder to improve performance on a single processor Bigger monolithic processor vs. many, simple processors Power/performance Reliability and availability Key enabling technologies Advances in microprocessor and interconnect technology Advances in software technology Fundamentals of Computer Architecture - Chapter 1 18

19 Scope of CSC/ECE 506 Parallelism Loop-level and task-level parallelism Flynn taxonomy: SIMD (vector architecture) MIMD Shared memory machines (SMP and DSM) Clusters Programming Model: Shared memory Message-passing Hybrid Fundamentals of Computer Architecture - Chapter 1 19

20 Loop-level parallelism Sometimes each iteration can be performed independently for (i=0; i<8; i++) a[i] = b[i] + c[i]; Sometimes iterations cannot be performed independently no loop-level parallelism for (i=0; i<8; i++) a[i] = b[i] + a[i-1]; + Very high parallelism > 1K + Often easy to achieve load balance Some loops are not parallel Some apps do not have many loops Fundamentals of Computer Architecture - Chapter 1 20

21 Task-level parallelism Arbitrary code segments in a single program Across loops: Subroutines: Threads: e.g. editor: GUI, printing, parsing + Larger granularity => low overheads, communication Low degree of parallelism Hard to balance for (i=0; i<n; i++) sum = sum + a[i]; for (i=0; i<n; i++) prod = prod * a[i]; Cost = getcost(); A = computesum(); B = A + Cost; Fundamentals of Computer Architecture - Chapter 1 21

22 Program-level parallelism Various independent programs execute together gmake: gcc c code1.c // assign to proc1 gcc c code2.c // assign to proc2 gcc c main.c // assign to proc3 gcc main.o code1.o code2.o + No communication Hard to balance Few opportunities Fundamentals of Computer Architecture - Chapter 1 22

23 Scope of CSC/ECE 506 Parallelism Loop-level and task-level parallelism Flynn taxonomy: SIMD (vector architecture) MIMD *Shared memory machines (SMP and DSM) Clusters Programming Model: Shared memory Message-passing Hybrid Fundamentals of Computer Architecture - Chapter 1 23

24 Taxonomy of parallel computers The Flynn taxonomy: Single or multiple instruction streams. Single or multiple data streams. 1. SISD machine (Most desktops, laptops) Only one instruction fetch stream Most of today s workstations or desktops Control unit Instruction stream ALU Data stream Fundamentals of Computer Architecture - Chapter 1 24

25 SIMD Examples: Vector processors, SIMD extensions (MMX) A single instruction operates on multiple data items. SISD: for (i=0; i<8; i++) a[i] = b[i] + c[i]; SIMD: a = b + c; // vector addition ALU 1 Data stream 1 Control unit Instruction stream ALU 2 Data stream 2 ALU n Data stream n Fundamentals of Computer Architecture - Chapter 1 25

26 MISD Example: CMU Warp Systolic arrays Data stream Control unit 1 Instruction stream 1 ALU 1 Control unit 2 Instruction stream 2 ALU 2 Control unit n Instruction stream n ALU n Fundamentals of Computer Architecture - Chapter 1 26

27 Systolic arrays (contd.) Example: Systolic array for 1-D convolution y(i) = w1 x(i) + w2 x(i + 1) + w3 x(i + 2) + w4 x(i + 3) x8 x7 x6 x5 x4 x3 x2 x1 y3 y2 y1 w4 w3 w2 w1 xin yin x w xout yout xout = x x = xin yout = yin + w xin Practical realizations (e.g. iwarp) use quite general processors Enable variety of algorithms on same hardware But dedicated interconnect channels Data transfer directly from register to register across channel Specialized, and same problems as SIMD General purpose systems work well for same algorithms (locality etc.) Fundamentals of Computer Architecture - Chapter 1 27

28 MIMD Independent processors connected together to form a multiprocessor system. Physical organization: Determines which memory hierarchy level is shared Programming abstraction: Shared Memory: on a chip: Chip Multiprocessor (CMP) Interconnected by a bus: Symmetric multiprocessors (SMP) Point-to-point interconnection: Distributed Shared Memory (DSM) Distributed Memory: Clusters, Grid Fundamentals of Computer Architecture - Chapter 1 28

29 MIMD Physical Organization P P caches M Shared-cache architecture: - CMP (or Simultaneous Multi-Threading) - e.g.: Pentium 4 chip, IBM Power4 chip, Sun Niagara, Pentium D, etc. - Implies shared-memory hardware P caches Network M P caches UMA (Uniform Memory Access) Shared Memory : - Pentium Pro Quad, Sun Enterprise, etc. - What interconnection network? - Bus - Multistage - Crossbar - etc. - Implies shared-memory hardware Fundamentals of Computer Architecture - Chapter 1 29

30 MIMD Physical Organization (2) P P caches caches M M Network NUMA (Non-Uniform Memory Access) Shared Memory : - SGI Origin, Altix, IBM p690, AMD Hammer-based system - What interconnection network? - Crossbar - Mesh - Hypercube - etc. - Also referred to as Distributed Shared Memory Fundamentals of Computer Architecture - Chapter 1 30

31 MIMD Physical Organization (3) P caches M P caches M Distributed System/Memory: - Also called clusters, grid - Don t confuse it with distributed shared memory I/O I/O Network Fundamentals of Computer Architecture - Chapter 1 31

32 Scope of CSC/ECE 506 Parallelism Loop-level and task-level parallelism Flynn taxonomy: MIMD Shared memory machines (SMP and DSM) Programming Model: Shared memory Message-passing Hybrid (e.g., UPC) Data parallel Fundamentals of Computer Architecture - Chapter 1 32

33 Programming models: shared memory Shared Memory / Shared Address Space: Each processor can see the entire memory P P P Shared M Programming model = thread programming in uniprocessor systems Fundamentals of Computer Architecture - Chapter 1 33

34 Programming models: message-passing Distributed Memory / Message Passing / Multiple Address Space: a processor can only directly access its own local memory. All communication happens by explicit messages. P M P M P M P M Fundamentals of Computer Architecture - Chapter 1 34

35 Programming models: data parallel Programming model Operations performed in parallel on each element of data structure Logically single thread of control, performs sequential or parallel steps Conceptually, a processor associated with each data element Architectural model Array of many simple, cheap processing elements (PEs) with little memory each Processing elements don t sequence through instructions PEs are attached to a control processor that issues instructions Specialized and general communication, cheap global synchronization Original motivation Matches simple differential equation solvers Centralize high cost of instruction fetch/sequencing Fundamentals of Computer Architecture - Chapter 1 35

36 Top 500 supercomputers Let s look at the Earth Simulator Was #1 in 2004, now #10 in 2006 Hardware: 5,120 (640 8-way nodes) 500 MHz NEC CPUs 8 GFLOPS per CPU (41 TFLOPS total) 30s TFLOPS sustained performance! 2 GB (4 512 MB FPLRAM modules) per CPU (10 TB total) Shared memory inside the node 10 TB total memory crossbar switch between the nodes 16 GB/s inter-node bandwidth 20 kva power consumption per node Fundamentals of Computer Architecture - Chapter 1 36

37 Fundamentals of Computer Architecture - Chapter 1 37

38 Exercise Go to and look at the Statistics menu near the right-hand side. Click on one of the statistics, e.g., Vendors, Processor Architecture, and examine what kind of systems are prevalent. Then do the same for earlier lists, and report on the trend. You may find interesting results by clicking on the Development tab. For example, if you choose Processor Architecture, the current list, will tell you how many vector and scalar architectures there are. Change the 34 to 32 to get last year s list. Change it to a lower number to get an earlier year s list. You can go all the way back to the first list from Submit your results here. Fundamentals of Computer Architecture - Chapter 1 38

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

Convergence of Parallel Architecture

Convergence of Parallel Architecture Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Lect. 2: Types of Parallelism

Lect. 2: Types of Parallelism Lect. 2: Types of Parallelism Parallelism in Hardware (Uniprocessor) Parallelism in a Uniprocessor Pipelining Superscalar, VLIW etc. SIMD instructions, Vector processors, GPUs Multiprocessor Symmetric

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University 18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018 Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel

More information

CMSC 611: Advanced. Parallel Systems

CMSC 611: Advanced. Parallel Systems CMSC 611: Advanced Computer Architecture Parallel Systems Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University COSC4201 Multiprocessors and Thread Level Parallelism Prof. Mokhtar Aboelaze York University COSC 4201 1 Introduction Why multiprocessor The turning away from the conventional organization came in the

More information

Parallel Architecture. Hwansoo Han

Parallel Architecture. Hwansoo Han Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance

More information

Computer Architecture. Fall Dongkun Shin, SKKU

Computer Architecture. Fall Dongkun Shin, SKKU Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses

More information

Parallel Computer Architecture

Parallel Computer Architecture Parallel Computer Architecture What is Parallel Architecture? A parallel computer is a collection of processing elements that cooperate to solve large problems fast Some broad issues: Resource Allocation:»

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy

More information

BİL 542 Parallel Computing

BİL 542 Parallel Computing BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,

More information

What is a parallel computer?

What is a parallel computer? 7.5 credit points Power 2 CPU L 2 $ IBM SP-2 node Instructor: Sally A. McKee General interconnection network formed from 8-port switches Memory bus Memory 4-way interleaved controller DRAM MicroChannel

More information

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

Dr. Joe Zhang PDC-3: Parallel Platforms

Dr. Joe Zhang PDC-3: Parallel Platforms CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Parallel Systems I The GPU architecture. Jan Lemeire

Parallel Systems I The GPU architecture. Jan Lemeire Parallel Systems I The GPU architecture Jan Lemeire 2012-2013 Sequential program CPU pipeline Sequential pipelined execution Instruction-level parallelism (ILP): superscalar pipeline out-of-order execution

More information

COSC4201 Multiprocessors

COSC4201 Multiprocessors COSC4201 Multiprocessors Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Multiprocessing We are dedicating all of our future product development to multicore

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

ECE 588/688 Advanced Computer Architecture II

ECE 588/688 Advanced Computer Architecture II ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Fall 2009 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2009 1 When and Where? When:

More information

Multi-core Programming - Introduction

Multi-core Programming - Introduction Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2 CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information

THREAD LEVEL PARALLELISM

THREAD LEVEL PARALLELISM THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III/VI Section : CSE-1 & CSE-2 Subject Code : CS2354 Subject Name : Advanced Computer Architecture Degree & Branch : B.E C.S.E. UNIT-1 1.

More information

EECS4201 Computer Architecture

EECS4201 Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be

More information

CS Parallel Algorithms in Scientific Computing

CS Parallel Algorithms in Scientific Computing CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan

More information

Chapter 2 Parallel Computer Architecture

Chapter 2 Parallel Computer Architecture Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general

More information

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers. CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes

More information

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner CS104 Computer Organization and rogramming Lecture 20: Superscalar processors, Multiprocessors Robert Wagner Faster and faster rocessors So much to do, so little time... How can we make computers that

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

Static Compiler Optimization Techniques

Static Compiler Optimization Techniques Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed at improving pipelined CPU performance: Static pipeline scheduling. Loop unrolling. Static branch

More information

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate to solve large problems fast Broad issues involved: Resource Allocation: Number of processing elements

More information

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Overview. CS 472 Concurrent & Parallel Programming University of Evansville Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University

More information

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:

More information

CS/COE1541: Intro. to Computer Architecture

CS/COE1541: Intro. to Computer Architecture CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information