Parallel Architectures

Size: px
Start display at page:

Download "Parallel Architectures"

Transcription

1 Parallel Architectures Instructor: Tsung-Che Chiang Department of Science and Information Engineering National Taiwan Normal University Introduction In the roughly three decades between 1960s and the mid-1990s, scientists and engineers explored a wide variety of parallel computer architectures. Experts passionately debated whether the dominant parallel computer systems would contain at most a few dozen high-performance processors or thousands of less-powerful processors. Today, most contemporary parallel computers are constructed out of commodity s. 2

2 Outline Interconnection Networks Processor Arrays Multiprocessors Multicomputers Flynn s Taxonomy Summary 3 Interconnection Networks Shared Medium It allows only one message at a time. Each processor listens to every message and receives the ones for which it is the destination. Ethernet is a well-known example. Message collisions can significantly degrade the performance of a heavily utilized shared medium. shared medium 4

3 Interconnection Networks Switched Medium It supports point-to-point messages among pairs of processors. Each processor has its own communication path to the switch. Two advantages over shared medium support of concurrent transmission support of network scaling switched medium 5 Switch Network Topologies A switch network can be represented by a graph nodes: processors/switches Each processor is connected to one switch. Switches connect processors and/or other switches. edges: communication paths Direct vs. Indirect topology Direct: the ratio of switch nodes to processor nodes is 1:1. Indirect: the above ratio is greater than 1:1. 6

4 Switch Network Topologies Evaluation criteria Diameter ( ): the largest distance between two switch nodes Bisection width ( ): the minimum number of edges between switch nodes that must be removed to divide the network into two halves Edges per switch node It is best if this value is a constant independent of the network size (better scalability). Constant edge length It is best if the nodes and edges of the network can be laid out in 3-D space so that the maximum edge length is a constant independent of the network size. 7 Switch Network Topologies In the following, we will discuss six switch network topologies: 2-D mesh binary tree hypertree butterfly hypercube shuffle-exchange 8

5 2-D Mesh Network Properties direct topology Assuming n switch nodes and no wraparound connections minimum diameter: 2(n 1/2 1) maximum bisection width: n 1/2 edges/node: 4 constant edge length switch processor 9 Binary Tree Network Properties indirect toplogy Assuming n = 2 d processors (with 2n 1 switches) diameter: 2 log n bisection width: 1 edges/node: 3 non-constant edge length depth d 10

6 Hypertree Network (1/3) It shares the low diameter of binary tree but has an improved bisection width. For a hypertree of degree k and depth d: From the front, it looks like a complete k-ary tree. From the side, it looks like an upside-down binary tree. k = 4, d = 2 front view side view 11 Hypertree Network (2/3) 12

7 Hypertree Network (3/3) Properties indirect topology Assuming k = 4, n = 4 d processors, 2 d (2 d +1 1)switches diameter: 2d (i.e., log n) bisection width: 2 d+1 edges/node: no more than 6 non-constant edge length 13 Butterfly Network (1/6) n = 2 d processors Rank 0 0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 Rank 1 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 Rank 2 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 Rank d 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 14

8 Butterfly Network (2/6) ,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 (i, j 1) 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 (i, j) 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 15 Butterfly Network (3/6) , ,0 2,0 0,1 1,1 2,1 0,2 1,2 2,2 0,3 1,3 2,3 0,4 1, ,4 0,5 1,5 2,5 0,6 1,6 2, ,7 1,7 2,7 inverting the i th most significant bit in the binary representation of j (i, m) (i, j) 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,

9 Butterfly Network (4/6) Where the butterfly is. As the rank number decrease, the widths of the wings of the butterflies increase exponentially. (Hence, non-constant edge length) ,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 17 Butterfly Network (5/6) Message routing Each switch node picks off the lead bit from the message. 0 left, 1 right message 0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 01 message 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 assume the same 1 message 2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 message 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,

10 Butterfly Network (6/6) Properties indirect topology Assuming n = 2 d processors, n(log n + 1) switches, switch nodes on ranks 0 and log n are the same diameter: log n bisection width: n/2 edges/node: 4 non-constant edge length 19 Hypercube Network (1/4) A hypercube network, also called a binary n-cube, is a butterfly in which each column of switch nodes is collapsed into a single node ,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 3-D Hypercube ,0 1,1 1,2 1,3 1,4 1,5 1,6 1, ,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,

11 Hypercube Network (2/4) The processor and its associated switches are labeled 0, 1,..., 2 d 1; two switches are adjacent if their binary labels differ in exactly one bit position. d = d = Hypercube Network (3/4) Properties direct topology Assuming n = 2 d processors, diameter: log n bisection width: n/2 edges/node: log n non-constant edge length 22

12 Hypercube Network (4/4) Message Routing Note that edges always connect switches whose addresses differ in exactly one bit position. Example: Send a message from 0101 to 0011 Path1: Path2: Shuffle-Exchange Network (1/5) Perfect shuffle sorting, dividing it exactly in half, and shuffling the two halves perfectly

13 Shuffle-Exchange Network (2/5) Perfect shuffle The new position can be calculated by performing a left cyclic rotation of the binary number Shuffle-Exchange Network (3/5) Connections exchange: link switches whose numbers differ in their least significant bit shuffle: links switch i and j, where j is the result of cycling the bits of i left one position

14 Shuffle-Exchange Network (4/5) Properties direct topology Assuming n = 2 d processors, diameter: 2log n 1 bisection width: n/log n edges/node: 2 non-constant edge length Shuffle-Exchange Network (5/5) Message Routing The worst-case scenario is routing a message from switch 0 to switch n 1 (or vice versa) From 0000 to 1111: E S E S E S E From 0011 to 0101: E S E S S

15 Interconnection Networks No network can be optimal in every regard. #Processors #switch diameter bisection width edges/ node eonstant edge len. 2-D mesh n = d 2 n 2(n 1/2 1) n 1/2 4 Yes Binary tree n = 2 d 2n 1 2 log n 1 3 No 4-ary hypertree n = 4 d 2n n 1/2 log n 2 n 1/2 6 No Butterfly n = 2 d n(logn+1) log n n / 2 4 No Hypercube n = 2 d n log n n / 2 log n No Shuffleexchange n = 2 d n 2log n 1 n log n 2 No 29 Processor Arrays (1/11) Vector computer a computer whose instruction set includes operations on vectors as well as scalars Two general ways of implementation pipelined vector processor It streams vectors from memory to the, where pipelined arithmetic units manipulate them. Early supercomputers (Cray-1) are well-known examples. processor arrays It has a set of identical, synchronized processing elements capable of simultaneously performing the same operation on different data. motivation: high price of a control unit, data parallelism 30

16 Processor Arrays (2/11) Architecture Front-end computer Memory I/O processors Processor array scalar memory bus global result bus P P P P P P P P instruction broadcast bus M M M M M M M M Interconnection network Parallel I/O devices 31 Processor Arrays (3/11) Performance the amount of work accomplished per time unit depends on the utilization of its processors Example 2.1 a processor array with 1024 processors adding two integers in 1 µ second performance when adding two integer vectors of length 1024, assuming each vector is allocated to the processors in a balanced fashion 1024 operations Performance = = µ second 9 (operations/second) 32

17 Processor Arrays (4/11) Performance the amount of work accomplished per time unit depends on the utilization of its processors Example 2.2 a processor array with 512 processors adding two integers in 1 µ second performance when adding two integer vectors of length 600, assuming each vector is allocated to the processors in a balanced fashion 600 operations Performance = = µ second 8 (operations/second) 88 processors add 2 pairs of integers. The others add only one pair and sit idle while the 88 processors add their second integer pair. 33 Processor Arrays (5/11) Interconnection Network It is used to bring together operands stored in the memories of different processors. The most popular interconnection network for processor arrays is the 2-D mesh. It has the advantage of a relatively straightforward implementation in VLSI, where a single chip may contain a large number of processors. 4x4 8x12 34

18 Processor Arrays (6/11) Enabling & Disabling Processors It is possible for only a subset of the processors to perform an instruction by masking. useful when the number of data items is not an exact multiple of the size of the processor array useful to support conditionally executed parallel operations 35 Processor Arrays (7/11) Enabling & Disabling Processors Example (Fig. 2.12) if (a[i]!= 0) a[i] = 1; else a[i] = -1; indicates the processors that are masked out (inactive) 36

19 Processor Arrays (8/11) Enabling & Disabling Processors Efficiency of the processor array can drop rapidly when the programs enter conditionally executed code. There is additional overhead of performing the tests to set the mask bits. There is the inefficiency caused by having to work through different branches of control structures sequentially. In the previous example, the performance is less than 50% (of the performance when performing operations across the entire processor array) when the additional overhead is considered. 37 Processor Arrays (9/11) Additional Architecture Features Front-end computer Memory I/O processors Processor array scalar memory bus global result bus P P P P P P P P instruction broadcast bus M M M M M M M M Interconnection network Parallel I/O devices 38

20 Processor Arrays (10/11) Memory bus It allows particular elements of parallel variables to be used or defined in sequential code. In this way, the processor array can be viewed as an extension of the memory space of the front-end. Global result bus It enables values from the processor array to be combined and returned to the front end. The ability to compute a global and is valuable. 39 Processor Arrays (11/11) Shortcomings 1. Not all problems map well into a strict data-parallel solution. 2. The efficiency drops when entering conditionally executed parallel code. 3. They do not easily accommodate multiple users. 4. They do not scale down well due to the cost of highbandwidth communication networks. 5. They are built using custom VLSI, and thus losing the costeffectiveness of commodity s. 6. The original motivation the relatively high cost of control units is no longer valid in today s s. Processor arrays are no longer considered a viable option for general-purpose parallel computers. 40

21 Multiprocessors A multiprocessor is a multi- computer with shared memory. The same address on two different s refer to the same memory location. Comparing with processor arrays, they can be built out of commodity s, they naturally support multiple users, and they do not lose efficiency when encountering conditionally executed parallel code. 41 Multiprocessors We discuss two fundamental types of multiprocessors: centralized multiprocessors, in which all the primary memory is in one place distributed multiprocessors, in which the primary memory is distributed among the processors 42

22 Centralized Multiprocessors (1/5) A centralized multiprocessor is a straightforward extension of the uniprocessor. It is also called uniform memory access (UMA) symmetric multiprocessor (SMP) The presence of large and efficient caches makes multiprocessors practical. Still, memory bus bandwidth typically limits to a few dozen the number of processors that can be employed. Cache Cache Cache Cache Primary memory Bus I/O devices 43 Centralized Multiprocessors (2/5) Data private: used only by a single processor shared: used by multiple processors Designers of centralized multiprocessors must address two problems associated with shared data: cache coherence problem processor synchronization 44

23 Centralized Multiprocessors (3/5) Cache Coherence Problem Memory Memory Memory Memory X 7 X 7 X 7 X 2 Cache Cache A B A B A B A B 45 Centralized Multiprocessors (4/5) Cache Coherence Problem Snooping protocol are typically used to maintain cache coherence on centralized multiprocessors. Each s cache controller monitors the bus to identify which cache blocks are being requested by other s. Before the write occurs, all copies of the data item cached by other processors are invalidated. If two processors simultaneously try to write to the same memory location, only one of them wins the race. 46

24 Centralized Multiprocessors (5/5) Processor Synchronization mutual exclusion a situation in which at most one process can be engaged in a specified activity at any time barrier synchronization guarantees that no process will proceed beyond a designated point in the program until every process has reached the barrier 47 Distributed Multiprocessors (1/10) Architecture Cache memory Cache memory Cache memory Cache memory Bus Primary memory I/O devices Cache memory Cache memory Cache memory Memory I/O devices Memory I/O devices Memory I/O devices Interconnection network 48

25 Distributed Multiprocessors (2/10) Rationale & advantage spatial and temporal locality memory references are between processor and its local memory higher aggregate memory bandwidth and lower memory access time higher processor count Distributing I/O, too, can also improve scalability. 49 Distributed Multiprocessors (3/10) The same address on different processors refers to the same memory location. Memory access time varies considerably, depending upon whether the address being referenced is in that processor s local memory. Thus, it is also called a nonuniform memory access (NUMA) multiprocessor. 50

26 Distributed Multiprocessors (4/10) Cache Coherence Alternative1: Only storing instructions and private data in a processor s cache poor performance due to huge time difference between a local cache access and a nonlocal memory access Alternative2: Snooping methods do not scale well as # of processors because a cache controller cannot simply snoop on a shared memory bus and a more complicated protocol is needed Cache Cache Cache Cache memory memory memory memory Cache memory Cache memory Cache memory Bus Memory I/O Memory devices I/O Memory devices I/O devices Primary memory I/O devices Interconnection network 51 Distributed Multiprocessors (5/10) Cache Coherence Alternative3: directory-based protocol a single directory contains sharing information about every memory block that may be cached Status of a memory block uncached: not currently in any processor s cache shared: cached by one or more processors, and the copy in memory is correct exclusive: cached by exactly one processor that has written the block, so that the copy in memory is obsolete 52

27 Distributed Multiprocessors (6/10) Cache Coherence In addition to the block status, we also need to keep track of which processors have copies of any cache block, so that these copies can be invalidated when one processor writes a value to that block. To prevent accesses to the cache directory from becoming a performance bottleneck, the directory itself should be distributed among the computer s local memories. The information about a particular memory block is in exactly one location. 53 Distributed Multiprocessors (7/10) Directories Memories Caches Interconnection network U000 X Interconnection network S101 Interconnection network S100 X 7 X read Interconnection network E100 X 7 X 7 out-of-date X 7 X 7 X 6 invalidate read write 54

28 Distributed Multiprocessors (8/10) Interconnection network Interconnection network Directories E100 S110 Memories X 7 X 6 Caches X 6 X 6 X read Interconnection network Interconnection network E001 E100 X 6 X 5 X 5 X write write 55 Distributed Multiprocessors (9/10) Directories Memories Caches Interconnection network E100 X 5 X Interconnection network U000 X flush 56

29 Multicomputers A multicomputer is another example of a distributed-memory, multiple- computer. Unlike a NUMA multiprocessor, a multicomputer has disjoint local address space. The same address on different processors refers to different physical memory locations. Each processor only has direct access to its local memory. Processors interact with each other by passing messages, and there is no cache coherence problem. 57 Multicomputers Commercial multicomputers vs. commodity multicomputers custom vs. mass-produced computers low-latency vs. high-latency expensive vs. cheap 58

30 Multicomputers Designs User Asymmetrical Multicomputer Front-end computer Interconnection network Internet Symmetrical Multicomputer File server Interconnection network 59 Asymmetrical Multicomputers Advantages Back-end processors are used exclusively for executing parallel programs. They may be running a primitive OS. It is easier for the manufacturer to develop the primitive OS. Without processes occupying cycles or sending messages, it is easier to understand, model, and tune the performance of a parallel application. 60

31 Asymmetrical Multicomputers Disadvantages Users login into the front-end computer, which executes a full, multiprogrammed OS and provides all functions needed for program development. single point of failure scalability limited by the front-end multiple front-end computers? How do users know which front-end computer to log in? How will the workload be balanced? How are back-end nodes assigned to front-end processors? a centralized multicomputer? Underutilization condition might be frustrating. 61 Asymmetrical Multicomputers Disadvantages (continued) program debugging Without supporting I/O operations, the back-end nodes must send a message to the front-end computer to print the contents to users. requirement of development of two distinct programs front-end: interacting with users and the file system, transmitting data to the back-end processors, forwarding results to the outside world back-end: computationally intensive portion 62

32 Symmetrical Multicomputers The difficulty of debugging parallel programs is a strong incentive to provide full-featured I/O facilities on back-end nodes. A straightforward way is to run a multiprogrammed OS on the back-end processors, too. In a symmetrical multicomputer, every computer executes the same OS and has identical functionality. Users may log into any computer to edit and compile their programs. Any or all of the computers may be involved in the execution of a particular parallel program. 63 Symmetrical Multicomputers Advantages over asymmetrical multicomputers They alleviate the performance bottleneck caused by the front-end computer. Support for debugging is better since every computer runs a full-fledged OS. They also eliminate the front-end/back-end programming problem. Every processor executes the same problem. The if statement can be used to select partial processors. 64

33 Symmetrical Multicomputers Disadvantages It is more difficult to maintain the illusion of a single parallel computer. There is no simple way to balance the program development workload among all the processors. It is more difficult to achieve high performance from parallel programs when processes must compete with other processes for cycles cache space memory bandwidth 65 A Mixed Model ParPar cluster at the Hebrew University of Jerusalem Multicomputer User Front-end computer Internet Switched Ethernet Myrinet Switch File server 66

34 Commodity Cluster vs. Networks of Workstation A network of workstation It is a dispersed collection of computers, typically located on users desks. It is to serve the needs of the person using it. Individual workstations may have different OS and executable programs. Commodity cluster It is a co-located collocation of mass-produced computers. The computers are usually accessible only via the network. Some of the computers may not allow users to log in. The networking medium should have high speed. Latency Bandwidth Cost/node Fast Ethernet 100 µsec 100 Mbit/sec < $100 Gigabit Ethernet 100 µsec 1,000 Mbit/sec < $1,000 Myrinet 7 µsec 1,920 Mbit/sec < $2, Flynn s Taxonomy Data stream Single Multiple SISD SIMD Instruction stream Single Multiple Uniprocessors MISD Systolic arrays Processor arrays Pipelined vector processors MIMD Multiprocessors Multicomputers 68

35 Flynn s Taxonomy Systolic array, an example of an MISD computer primitive sorting element b c a First phase min(a, b, c) med(a, b, c) max(a, b, c) Second phase 69 Systolic Array Insert 7 4 Host Host

36 Systolic Array Extract minimum 4 8 Host Host Systolic Array Extract minimum 5 7 Host Host 72

37 Summary Processor Arrays 1. Not all problems map well into a strict data-parallel solution. 2. The efficiency drops when entering conditionally executed parallel code. 3. They do not easily accommodate multiple users. 4. They do not scale down well due to the cost of highbandwidth communication networks. 5. They are built using custom VLSI, and thus losing the costeffectiveness of commodity s. 6. The original motivation the relatively high cost of control units is no longer valid in today s s. Processor arrays are no longer considered a viable option for general-purpose parallel computers. 73 Summary Centralized Multiprocessors Cache coherence problem snooping protocol write invalidation protocol Cache Cache Cache Cache Bus Synchronization Primary I/O memory devices mutual exclusion & barrier replying upon hardware instructions that have the net effect of atomically reading and updating a memory location Small number of s limited by the shared memory bus 74

38 Summary Distributed Multiprocessors a single global address space more difficult cache coherence directory-based scheme Cache memory Cache memory Cache memory Memory I/O Memory devices I/O Memory devices I/O devices Interconnection network 75 Summary Multicomputers multiple joint address spaces no cache coherence problem Whether or not a copy of a data item is up-to-date or not depends entirely upon the programmer. Symmetrical Multicomputer Interconnection network User Internet File server Asymmetrical Multicomputer Front-end computer Interconnection network 76

Parallel Architectures

Parallel Architectures Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s

More information

SHARED MEMORY VS DISTRIBUTED MEMORY

SHARED MEMORY VS DISTRIBUTED MEMORY OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Lecture 24: Virtual Memory, Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Multi-Processor / Parallel Processing

Multi-Processor / Parallel Processing Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms

More information

Lecture 24: Memory, VM, Multiproc

Lecture 24: Memory, VM, Multiproc Lecture 24: Memory, VM, Multiproc Today s topics: Security wrap-up Off-chip Memory Virtual memory Multiprocessors, cache coherence 1 Spectre: Variant 1 x is controlled by attacker Thanks to bpred, x can

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE The most popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn s classification scheme is based on the notion of a stream of information.

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

Chapter-4 Multiprocessors and Thread-Level Parallelism

Chapter-4 Multiprocessors and Thread-Level Parallelism Chapter-4 Multiprocessors and Thread-Level Parallelism We have seen the renewed interest in developing multiprocessors in early 2000: - The slowdown in uniprocessor performance due to the diminishing returns

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

Physical Organization of Parallel Platforms. Alexandre David

Physical Organization of Parallel Platforms. Alexandre David Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

CMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ

CMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ CMPE 511 TERM PAPER Distributed Shared Memory Architecture by Seda Demirağ 2005701688 1. INTRODUCTION: Despite the advances in processor design, users still demand more and more performance. Eventually,

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Why Multiprocessors?

Why Multiprocessors? Why Multiprocessors? Motivation: Go beyond the performance offered by a single processor Without requiring specialized processors Without the complexity of too much multiple issue Opportunity: Software

More information

Computer Organization. Chapter 16

Computer Organization. Chapter 16 William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11

More information

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple

More information

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Parallel Systems Prof. James L. Frankel Harvard University Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Architectures SISD (Single Instruction, Single Data)

More information

Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory

Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Parallel Machine 1 CPU Usage Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Solution Use multiple CPUs or multiple ALUs For simultaneous

More information

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Parallel Processing Basics Prof. Onur Mutlu Carnegie Mellon University Readings Required Hill, Jouppi, Sohi, Multiprocessors and Multicomputers, pp. 551-560 in Readings in Computer

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Dr. Joe Zhang PDC-3: Parallel Platforms

Dr. Joe Zhang PDC-3: Parallel Platforms CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model

More information

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2 CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

CS Parallel Algorithms in Scientific Computing

CS Parallel Algorithms in Scientific Computing CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan

More information

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1 Credits:4 1 Understand the Distributed Systems and the challenges involved in Design of the Distributed Systems. Understand how communication is created and synchronized in Distributed systems Design and

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

Chapter 18. Parallel Processing. Yonsei University

Chapter 18. Parallel Processing. Yonsei University Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Flynn s Classification

Flynn s Classification Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:

More information

CSCI 4717 Computer Architecture

CSCI 4717 Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

Multiprocessors 1. Outline

Multiprocessors 1. Outline Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples

More information

Introduction to parallel computing

Introduction to parallel computing Introduction to parallel computing 2. Parallel Hardware Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Motherboard Processor https://sites.google.com/

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

CS/COE1541: Intro. to Computer Architecture

CS/COE1541: Intro. to Computer Architecture CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency

More information

Chapter 17 - Parallel Processing

Chapter 17 - Parallel Processing Chapter 17 - Parallel Processing Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 17 - Parallel Processing 1 / 71 Table of Contents I 1 Motivation 2 Parallel Processing Categories

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy

More information

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih ARCHITECTURAL CLASSIFICATION Mariam A. Salih Basic types of architectural classification FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FENG S CLASSIFICATION Handler Classification Other types of architectural

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Three basic multiprocessing issues

Three basic multiprocessing issues Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated

More information

Parallel Architecture. Hwansoo Han

Parallel Architecture. Hwansoo Han Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM UNIT III MULTIPROCESSORS AND THREAD LEVEL PARALLELISM 1. Symmetric Shared Memory Architectures: The Symmetric Shared Memory Architecture consists of several processors with a single physical memory shared

More information

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections ) Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections 4.1-4.2) 1 Taxonomy SISD: single instruction and single data stream: uniprocessor

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

Aleksandar Milenkovich 1

Aleksandar Milenkovich 1 Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection

More information

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued) Cluster Computing Dichotomy of Parallel Computing Platforms (Continued) Lecturer: Dr Yifeng Zhu Class Review Interconnections Crossbar» Example: myrinet Multistage» Example: Omega network Outline Flynn

More information

Parallel Computers. c R. Leduc

Parallel Computers. c R. Leduc Parallel Computers Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Why Parallel Computing?

More information

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011 CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel

More information

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 18-447 Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 Assignments Lab 7 out Due April 17 HW 6 Due Friday (April 10) Midterm II April

More information

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information