Chapter 18. Parallel Processing. Yonsei University

Size: px
Start display at page:

Download "Chapter 18. Parallel Processing. Yonsei University"

Transcription

1 Chapter 18 Parallel Processing

2 Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2

3 Types of Parallel Processor Multiple Processor Organizations Types of Parallel Processor Systems Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD Multiple instruction, multiple data stream- MIMD 18-3

4 Taxonomy Of Parallel Processor Multiple Processor Organizations 18-4

5 SISD Single processor Single instruction stream Data stored in single memory Uni-processor Multiple Processor Organizations 18-5

6 SIMD Multiple Processor Organizations Single machine instruction Controls simultaneous execution Number of processing elements Lockstep basis Each processing element has associated data memory Each instruction executed on different set of data by different processors Vector and array processors 18-6

7 SIMD With distributed memory Multiple Processor Organizations 18-7

8 MISD Multiple Processor Organizations Sequence of data Transmitted to set of processors Each processor executes different instruction sequence Never been implemented 18-8

9 MIMD Multiple Processor Organizations Set of processors Simultaneously execute different instruction sequences Different sets of data SMPs, clusters and NUMA systems 18-9

10 MIMD With shared memory Multiple Processor Organizations 18-10

11 MIMD With distributed memory Multiple Processor Organizations 18-11

12 MIMD - Overview Multiple Processor Organizations General purpose processors Each can process all instructions necessary Further classified by method of processor communication 18-12

13 Tightly Coupled - SMP Multiple Processor Organizations Processors share memory Communicate via that shared memory Symmetric Multiprocessor (SMP) Share single memory or pool Shared bus to access memory Memory access time to given area of memory is approximately the same for each processor 18-13

14 Tightly Coupled - NUMA Multiple Processor Organizations Nonuniform memory access Access times to different regions of memroy may differ 18-14

15 Loosely Coupled - Clusters Multiple Processor Organizations Collection of independent uniprocessors or SMPs Interconnected to form a cluster Communication via fixed path or network connections 18-15

16 Design Issues Multiple Processor Organizations Physical Organization Interconnection Structures Interprocessor Communication Operating System Design Application Software Techniques 18-16

17 SMP Characteristics Symmetric Multiprocessors Two or more similar processors Share the same main memory and I/O facilities All processors share access to I/O devices All processors can perform the same functions The system is controlled by an integrated OS Providing interaction between processors Interaction at job, task, file and data element levels 18-17

18 SMP Advantages Symmetric Multiprocessors Performance If some work can be done in parallel Availability Since all processors can perform the same functions, failure of a single processor does not halt the system Incremental growth User can enhance performance by adding additional processors Scaling Vendors can offer range of products based on number of processors 18-18

19 Multiprogramming & Multiprocessing Interleaving(multiprogramming) Symmetric Multiprocessors 18-19

20 Multiprogramming & Multiprocessing Interleaving and overlapping(multiprocessing) Symmetric Multiprocessors 18-20

21 Tightly Coupled Symmetric Multiprocessors Time-shared or common bus Multiport memory Central control unit 18-21

22 Time-Shared Bus Symmetric Multiprocessors To facilitate DMA transfers from I/O processors, the following features are provided Addressing Distinguish modules on the bus to determine the source and destination of data Arbitration I/O module can temporarily function as master Time sharing When module is controlling the bus, other modules are locked out and must, if necessary, suspend operation until bus access is achieved 18-22

23 Time Shared Bus Symmetric Multiprocessors 18-23

24 Time Shared Bus : Advantages Symmetric Multiprocessors Simplicity The simplest approach to multiprocessor organization Flexibility Easy to expand the system by attaching more processors to the bus Reliability The bus is essentially a passive medium 18-24

25 Time Shared Bus : Drawbacks Symmetric Multiprocessors Performance The speed of the system is limited by the bus cycle time To equip each processor with a cache memory for improving performance Each processor should have local cache Reduce number of bus accesses Leads to problems with cache coherence Solved in hardware - see later 18-25

26 Multiport Memory Symmetric Multiprocessors The multiport memory allows the direct, independent access of main memory modules by each processor and I/O module Direct independent access of memory modules by each processor Logic required to resolve conflicts Little or no modification to processors or modules required 18-26

27 Multiport Memory Symmetric Multiprocessors Advantages Little or no modification is needed for either processor or I/O modules to accommodate multiport memory Better performance than bus approach Security To configure portions of memory as private to one or more processors and/or I/O module Disadvantage Logic associated with memory is required for resolving conflicts More complex than the bus approach 18-27

28 Central Control Unit Symmetric Multiprocessors To separate data streams back and forth between independent modules: processor, memory, I/O module The controller can buffer requests and perform arbitration and timing functions To pass status and control messages between processors To perform cache update alerting Advantages flexibility and simplicity of interfacing of the bus approach Disadvantages the control unit is quite complex A potential performance bottleneck 18-28

29 Multiprocessor OS Deisgn Symmetric Multiprocessors An SMP OS manages processor and other computer resources so that the user perceives a single OS controlling system resources The key design issues Simultaneous concurrent processes Scheduling Synchronization Memory management Reliability and fault tolerance 18-29

30 Symmetric S/390 SMP Multiprocessors 18-30

31 S/390 SMP Components PU CISC microprocessor 64-KB L1 cache L2 cache 384 KB To be arranged in cluster of two Each cluster supports three PUs Bus-switching network adapter(bsn) to interconnect the L2 caches and the main memory To includes a level 3 (L3) cache (2 MB) Memory card 8G per card Symmetric Multiprocessors 18-31

32 Switched Interconnection Symmetric Multiprocessors The single bus becomes a bottleneck affecting the scalability The S/390 copes with this problem in two ways Main memory is split into four separate cards, each with its own storage controller that can handle memory accesses at high speeds The connection from processors to a single memory card is not in the form of a shared bus but rather point to point links, where each link connects a group of three processors via an L2 cache to a BSN 18-32

33 Shared L2 Caches Symmetric Multiprocessors Needs of Shared L2 Caches In moving from G3(generation 3) to G4, IBM doubled the speed of the microprocessors. If the G3 organization was retained, a significant increase in bus traffic would occur. At the same time, it was desired to reuse as many G3 components as possible. Without a significant bus upgrade, the BSNs would become a bottleneck Analysis of typical S/390 workloads revealed a large degree of sharing of instructions and data among processors The use of one or more L2 caches, each of which was shared by multiple processors 18-33

34 L3 Cache Symmetric Multiprocessors Each L3 cache provides a buffer between L2 caches and one memory card To provides the data much more quickly than a main memory access if the requested cache line is already shared by other processors but was not recently used by the requesting processor Memory subsystem Access Penalty (PU cycles) Cache Size Hit Rate(%) L1 cache 1 32 KB 89 L2 cache KB 5 L3 cache 14 2 MB 3 Memory 32 8 GB

35 Cache Coherence Problem Cache Coherence and the MESI Protocol Problem - multiple copies of same data in different caches Can result in an inconsistent view of memory Write back Write operations are usually made only to the cache Main memory is only updated when the corresponding cache line is flushed from the cache Write through All write operations are made to main memory as well as to the cache, ensuring that main memory is always valid 18-35

36 Software Solutions Cache Coherence and the MESI Protocol Compiler-based coherence mechanisms : Compiler and operating system deal with problem Overhead transferred to compile time Design complexity transferred from hardware to software Simplest approach Prevent any shared data variables from being cached Efficient approach Analyze the code to determine safe periods for shared variables

37 Hardware Solutions Cache Coherence and the MESI Protocol Cache coherence protocols Dynamic recognition at run time of potential inconsistency conditions More efficient use of cache Hardware schemes can be divided into two categories Directory protocols Snoopy protocols 18-37

38 Directory Protocols Cache Coherence and the MESI Protocol Collect and maintain information about copies of data in cache Directory stored in main memory Requests are checked against directory Appropriate transfers are performed Creates central bottleneck Effective in large scale systems with complex interconnection schemes 18-38

39 Snoopy Protocols Cache Coherence and the MESI Protocol Distribute cache coherence responsibility among cache controllers Cache recognizes that a line is shared Updates announced to other caches Suited to bus based multiprocessor Increases bus traffic 18-39

40 Snoopy Protocols Cache Coherence and the MESI Protocol Two approach to the snoopy protocol Write-invalidate Write-update (Write-broadcast) Performance depends on the number of local caches and pattern of memory reads and writes To adaptive protocols that employ both 18-40

41 Write Invalidate Cache Coherence and the MESI Protocol Multiple readers, one writer When a write is required, all other caches of the line are invalidated Writing processor then has exclusive (cheap) access until line required by another processor Used in Pentium II and PowerPC systems State of every line is marked as modified, exclusive, shared or invalid MESI 18-41

42 Write Update Cache Coherence and the MESI Protocol Multiple readers and writers When a processor wishes to update a shared line, the word to be updated is distributed to all others and caches containing that line can update it 18-42

43 MESI Protocol Cache Coherence and the MESI Protocol Modified : The line in the cache has been modified and is available only in this cache Exclusive : The line in the cache is the same as that in main memory and is not present in any other cache Shared : The line in the cache is the same as that in main memory and may be present in another cache Invalid : The line in the cache does not contain valid data 18-43

44 MESI State Transition Diagram Cache Coherence and the MESI Protocol 18-44

45 MESI State Transition Diagram (a) Line in cache at initiating processor Cache Coherence and the MESI Protocol 18-45

46 MESI State Transition Diagram (b) Line in snooping cache Cache Coherence and the MESI Protocol 18-46

47 Read Miss Cache Coherence and the MESI Protocol When a read miss occurs in the local cache, the processor initiates a memory read to read the line of main memory containing the missing address The processor inserts a signal on the bus that alerts all other processor/cache units to snoop the transaction 18-47

48 Read Hit Cache Coherence and the MESI Protocol When a read hit occurs on a line currently in the local cache, the processor simply reads the required item There is no state change The state remains modified, shared, or exclusive 18-48

49 Write Miss Cache Coherence and the MESI Protocol When a write miss occurs in the local cache, the processor initiates a memory read to read the line of main memory containing the missing address RWITM(read-with-intent-to-modify) When the line is loaded, it is immediately marked modified 18-49

50 Write Hit Cache Coherence and the MESI Protocol When a write hit occurs on a line currently in the local cache, the effect depends on the current state of that line in the local cache: Shared Exclusive Modified 18-50

51 L1-L2 L2 Cache Consistency Cache Coherence and the MESI Protocol Needs to maintain data integrity across both levels of cache and across all caches in the SMP configuration To extend the MESI protocol to the L1 caches Each line in the L1 cache includes bits to indicate the state Why? To adopt the write-through policy in the L1 cache If L1 has a write-back, more complex The write through is to the L2 cache and not to the memory 18-51

52 Clustering Clusters Clustering is an alternative to symmetric multiprocessing(smp) Cluster A group of interconnected, whole computers working together as a unified computing resource that can create the illusion of being one machine Whole Computer A system that can run on its own, apart from the cluster Benefits Absolute scalability Incremental scalability High availability Superior price/performance 18-52

53 Cluster Configurations (a) Standby Server with No Shared Disk Clusters 18-53

54 Cluster Configurations (b) Shared Disk Clusters 18-54

55 Clustering Methods Clusters Clustering Method Description Benefits Limitations Passive standby A secondary server takes over in case of primary server failure Easy to implement High cost because the secondary server is unavailable for other processing tasks. Active secondary The secondary server is also used for processing tasks Reduced cost because secondary servers can be used for processing Increased complexity Separate servers Servers connected to disks Servers share disks Separate servers have their own disks. Data is continuously copied from primary to secondary server Servers are cabled to the same disks, but each server owns its disks. If one server fails, its disks are taken over by the other server Multiple servers simultaneously share access to disks High availability Reduced network and server overhead due to elimination of copying operations Low network and server overhead. Reduced risk of downtime caused by disk failure High network and server overhead due to copying operations Usually requires disk mirroring or RAID technology to compensate for risk of disk failure. Requires lock manager software. Usually used with disk mirroring or RAID technology

56 Operating System Design Issues Clusters Failure Management Failover Failback Load Balancing Clusters versus SMP 18-56

57 Failure Management Clusters A highly available cluster A high probability that all resources will be in service If a failure does occur, the queries in progress are lost Any lost query, if retried, will be serviced by a different computer in the cluster A fault-tolerant cluster All resources are always available Achieved by the use of redundant shared disks and mechanisms for backing out uncommitted transactions Failover : Switching resources over from a failed to alternative system Failback : restoring resources to the original system 18-57

58 Load Balancing Clusters Incremental scalability Automatically include new computers in scheduling Middleware needs to recognise that processes may switch between machines 18-58

59 Parallelizing Clusters Single application executing in parallel on a number of machines in cluster Complier Determines at compile time which parts can be executed in parallel Split off for different computers Application Application written from scratch to be parallel Message passing to move data between nodes Hard to program Best end result Parametric computing If a problem is repeated execution of algorithm on different s ets of data e.g. simulation using different scenarios Needs effective tools to organize and run 18-59

60 Cluster Computer Architecture Clusters 18-60

61 Cluster Middleware Unified image to user Single system image Single point of entry Single file hierarchy Single control point Single virtual networking Single memory space Single job management system Single user interface Single I/O space Single process space Checkpointing Process migration Clusters 18-61

62 Clusters vs SMP Both provide multiprocessor support to high dem and applications Both available commercially SMP for longer The advantages of the SMP approach SMP is easier to manage and configure than a cluster SMP usually takes up less physical space and draws less power than cluster The advantages of the cluster approach To result in clusters dominating the high-performance server market Superior in terms of incremental and absolute scalability Superior in terms of availability All components of the system can readily be made highly redundant Clusters

63 Nonuniform Memory Access (NUMA) Nonuniform Memory Access Alternative to SMP & clustering Uniform memory access All processors have access to all parts of memory using load & store Access time to all regions of memory is the same Access time to memory for different processors same As used by SMP Nonuniform memory access All processors have access to all parts of memory using load & store Access time of processor differs depending on region of memory Different processors access different regions of memory at different speeds Cache coherent NUMA Cache coherence is maintained among the caches of the various processors Significantly different from SMP and clusters 18-63

64 Motivation Nonuniform Memory Access SMP has practical limit to number of processors Bus traffic limits to between 16 and 64 processors In clusters each node has own memory Apps do not see large global memory Coherence maintained by software not hardware NUMA retains SMP flavour while giving large scale multiprocessing e.g. Silicon Graphics Origin NUMA 1024 MIPS R10000 processors The objective with NUMA is to maintain a transparent system-wide memory while permitting multiple multiprocessor nodes, each with its own bus or other internal interconnect system 18-64

65 CC-NUMA Organization Nonuniform Memory Access 18-65

66 CC-NUMA Operation Each processor has own L1 and L2 cache Each node has own main memory Nodes connected by some networking facility Each processor sees single addressable memory space Memory request order: L1 cache (local to processor) L2 cache (local to processor) Main memory (local to node) Remote memory Delivered to requesting (local to processor) cache Automatic and transparent Nonuniform Memory Access

67 Organization Nonuniform Memory Access Example Suppose the P2-3 requests a memory location 798, the memory of node 1 1. P2-3 issues read request on the snoopy bus of node 2 2. The directory on node 2 sees the request and recognizes that the location is in lode1 3. Node 2 s directory sends a request to node 1 4. Node 1 s directory requests the contents of Node 1 s main memory responds by putting the requested data on bus 6. Node 1 s directory picks up the data from the bus 7. The value is transferred back to node 2 s directory 8. Node 2 s directory places the data back no node 2 bus 9. The value is picked up and placed in P2-3 s cache and delivered to P

68 Cache Coherence Nonuniform Memory Access Node 1 directory keeps note that node 2 h as copy of data If data modified in cache, this is broadcast to other nodes Local directories monitor and purge local cache if necessary Local directory monitors changes to local data in remote caches and marks memory invalid until writeback Local directory forces writeback if memor y location requested by another processor 18-68

69 NUMA Pros and Cons Nonuniform Memory Access The advantage of a CC-NUMA system Can deliver effective performance at higher levels of parallelism than SMP, without requiring major software changes The disadvantages of a CC-NUMA system Performance can breakdown if too much access to remote memory Can be avoided by: L1 & L2 cache design reducing all memory access» Need good temporal locality of software Good spatial locality of software Virtual memory management moving pages to nodes that are using them most Not transparently look like an SMP Software changes will be required to move an operating system and applications from an SMP to a CC-NUMA system Availability 18-69

70 Vector Computation Vector Computation Maths problems involving physical processes present different difficulties for computation Aerodynamics, seismology, meteorology Continuous field simulation High precision Repeated floating point calculations on large arrays of numbers 18-70

71 Vector Computation Vector Computation Supercomputers handle these types of problem Hundreds of millions of flops $10-15 million Optimised for calculation rather than multitasking and I/O Limited market Research, government agencies, meteorology Array processor Alternative to supercomputer Configured as peripherals to mainframe & mini Just run vector portion of problems 18-71

72 Example of Vector Addition Vector Computation 18-72

73 Matrix Multiplication (C = A x B) Vector Computation 18-73

74 Approaches Vector Computation General purpose computers rely on iteration to do vector calculations In example this needs six calculations Vector processing Assume possible to operate on one-dimensional vector of data All elements in a particular row can be calculated in parallel Parallel processing Independent processors functioning in parallel Use FORK N to start individual process at location N JOIN N causes N independent processes to join and merge following JOIN O/S Co-ordinates JOINs Execution is blocked until all N processes have reached JOIN 18-74

75 Processor Designs Vector Computation Pipelined ALU Within operations Across operations Parallel ALUs Parallel processors 18-75

76 Approaches to Vector Computation (a) Pipelined ALU Vector Computation 18-76

77 Approaches to Vector Computation (b) Parallel ALUs Vector Computation 18-77

78 Pipelined Processing Vector Computation 18-78

79 Ex. Computing C=(s x A) + B Vector Computation 1. Vector load A->Vector Register(VR1) 2. Vector load B->VR2 3. Vector multiply s x VR1->VR3 4. Vector add VR3 + VR2 -> VR4 5. Vector store VR4->C 18-79

80 Chaining Vector Computation Cray Supercomputers Vector operation may start as soon as first element of operand vector available and functional unit is free Result from one functional unit is fed immediately into another If vector registers used, intermediate results do not have to be stored in memory 18-80

81 Taxonomy of Computer Organization Vector Computation 18-81

82 Vector Taxonomy of Computer Organization Computation Parallel processors The multiple processors can function cooperatively on a given task Vector processor Often equated with pipelined ALU organization Also designed by a parallel ALU organization and a parallel processor Array processing Usually refers to a Parallel ALU But any of the three organizations is optimized for the processing of array Usually refers to an auxiliary processor attached to a generalpurpose processor and used to perform vector computation The pipelined ALU organization dominates the marketplace Less complex 18-82

83 IBM 3090 Vector Facility Vector Computation IBM facility makes use of a number of vector registers Each register is actually a bank of scalar registers To compute vector sum C=A+B, the vectors A and B are loaded into two vector registers The data from these registers are passed through the ALU as fast as possible The computation overlap, and the loading of the input data into the resisters in a block, results in a significant speeding up over an ordinary ALU operation

84 Organization Vector Computation The fixed and predetermined structure of vector data permits housekeeping instructions inside the loop to be replaced by faster internal machine operations Data-access and arithmetic operations on several successive vector elements can proceed concurrently by overlapping such operations in a pipelined design or by performing multiple-element operations in parallel The use of vector registers for intermediate results avoids additional storage reference 18-84

85 IBM 3090 with Vector Facility Vector Computation 18-85

86 Registers Vector Computation The IBM organization is referred to as register-to-register, because the vector operands, both input and output, can be stored in vector registers The main disadvantage of the use of vectro registers is that the programmer or compiler must take them into account 18-86

87 Alternative Programs Vector Computation 18-87

88 Alternative Programs (a) Storage to Storage Vector Computation 18-88

89 Alternative Programs Vector Computation (b) Register to Register 18-89

90 Alternative Programs (c) Storage to Register Vector Computation 18-90

91 Alternative Programs (d) Compound Instructions Vector Computation 18-91

92 Registers Vector Computation The vector registers can also be coupled to form 8 64-bit vector registers The architecture specifies that each register contains from 8 to 512 scalar elements Three additional registers are needed by the vector facility 18-92

93 Registers of the IBM 3090 Vector Computation 18-93

94 Compound Instruction Vector Computation Instruction execution can be overlapped using chining to improve performance The designers of the IBM vector facility chose not to include this capability for several reasons Compound instruction do not require the use of additional registers for temporary storage of intermediate results and they require one less register 18-94

95 The Instruction Set Vector Computation There are memory-to-register load and register-to-memory store instructions Many of instruction use a three operand format 18-95

96 IBM 3090 Vector Facility Vector Computation 18-96

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

Computer Organization. Chapter 16

Computer Organization. Chapter 16 William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data

More information

CSCI 4717 Computer Architecture

CSCI 4717 Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

Multi-Processor / Parallel Processing

Multi-Processor / Parallel Processing Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

Chapter 17 - Parallel Processing

Chapter 17 - Parallel Processing Chapter 17 - Parallel Processing Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 17 - Parallel Processing 1 / 71 Table of Contents I 1 Motivation 2 Parallel Processing Categories

More information

Memory Systems in Pipelined Processors

Memory Systems in Pipelined Processors Advanced Computer Architecture (0630561) Lecture 12 Memory Systems in Pipelined Processors Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interleaved Memory: In a pipelined processor data is required every

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Characteristics of Mult l ip i ro r ce c ssors r

Characteristics of Mult l ip i ro r ce c ssors r Characteristics of Multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input output equipment. The term processor in multiprocessor can mean either a central

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Lecture 24: Virtual Memory, Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large

More information

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1 Credits:4 1 Understand the Distributed Systems and the challenges involved in Design of the Distributed Systems. Understand how communication is created and synchronized in Distributed systems Design and

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Chapter Seven. Idea: create powerful computers by connecting many smaller ones Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:

More information

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

Ministry of Education and Science of Ukraine Odessa I.I.Mechnikov National University

Ministry of Education and Science of Ukraine Odessa I.I.Mechnikov National University Ministry of Education and Science of Ukraine Odessa I.I.Mechnikov National University 1 Modern microprocessors have one or more levels inside the crystal cache. This arrangement allows to reach high system

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Instructor: Tsung-Che Chiang tcchiang@ieee.org Department of Science and Information Engineering National Taiwan Normal University Introduction In the roughly three decades between

More information

Flynn s Classification

Flynn s Classification Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:

More information

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All

More information

Parallel Processors. Session 1 Introduction

Parallel Processors. Session 1 Introduction Parallel Processors Session 1 Introduction Applications of Parallel Processors Structural Analysis Weather Forecasting Petroleum Exploration Fusion Energy Research Medical Diagnosis Aerodynamics Simulations

More information

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration MULTIPROCESSORS Characteristics of Multiprocessors Interconnection Structures Interprocessor Arbitration Interprocessor Communication and Synchronization Cache Coherence 2 Characteristics of Multiprocessors

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS TECHNOLOGY BRIEF March 1999 Compaq Computer Corporation ISSD Technology Communications CONTENTS Executive Overview1 Notice2 Introduction 3 8-Way Architecture Overview 3 Processor and I/O Bus Design 4 Processor

More information

Lecture 24: Memory, VM, Multiproc

Lecture 24: Memory, VM, Multiproc Lecture 24: Memory, VM, Multiproc Today s topics: Security wrap-up Off-chip Memory Virtual memory Multiprocessors, cache coherence 1 Spectre: Variant 1 x is controlled by attacker Thanks to bpred, x can

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Aleksandar Milenkovich 1

Aleksandar Milenkovich 1 Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Why Multiprocessors?

Why Multiprocessors? Why Multiprocessors? Motivation: Go beyond the performance offered by a single processor Without requiring specialized processors Without the complexity of too much multiple issue Opportunity: Software

More information

Three basic multiprocessing issues

Three basic multiprocessing issues Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated

More information

Portland State University ECE 588/688. Cray-1 and Cray T3E

Portland State University ECE 588/688. Cray-1 and Cray T3E Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory

Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Parallel Machine 1 CPU Usage Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Solution Use multiple CPUs or multiple ALUs For simultaneous

More information

Intro to Multiprocessors

Intro to Multiprocessors The Big Picture: Where are We Now? Intro to Multiprocessors Output Output Datapath Input Input Datapath [dapted from Computer Organization and Design, Patterson & Hennessy, 2005] Multiprocessor multiple

More information

Client Server & Distributed System. A Basic Introduction

Client Server & Distributed System. A Basic Introduction Client Server & Distributed System A Basic Introduction 1 Client Server Architecture A network architecture in which each computer or process on the network is either a client or a server. Source: http://webopedia.lycos.com

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

Multiprocessors 1. Outline

Multiprocessors 1. Outline Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Chapter 8 : Multiprocessors

Chapter 8 : Multiprocessors Chapter 8 Multiprocessors 8.1 Characteristics of multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input-output equipment. The term processor in multiprocessor

More information

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY Department of Computer science and engineering Year :II year CS6303 COMPUTER ARCHITECTURE Question Bank UNIT-1OVERVIEW AND INSTRUCTIONS PART-B

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Dr e v prasad Dt

Dr e v prasad Dt Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction

More information

Shared Symmetric Memory Systems

Shared Symmetric Memory Systems Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Chapter-4 Multiprocessors and Thread-Level Parallelism

Chapter-4 Multiprocessors and Thread-Level Parallelism Chapter-4 Multiprocessors and Thread-Level Parallelism We have seen the renewed interest in developing multiprocessors in early 2000: - The slowdown in uniprocessor performance due to the diminishing returns

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) 1 MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) Chapter 5 Appendix F Appendix I OUTLINE Introduction (5.1) Multiprocessor Architecture Challenges in Parallel Processing Centralized Shared Memory

More information

Dr. Joe Zhang PDC-3: Parallel Platforms

Dr. Joe Zhang PDC-3: Parallel Platforms CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

Introduction. Distributed Systems. Introduction. Introduction. Instructor Brian Mitchell - Brian

Introduction. Distributed Systems. Introduction. Introduction. Instructor Brian Mitchell - Brian Distributed 1 Directory 1 Cache 1 1 2 Directory 2 Cache 2 2... N Directory N Interconnection Network... Cache N N Instructor Brian Mitchell - Brian bmitchel@mcs.drexel.edu www.mcs.drexel.edu/~bmitchel

More information

Computer-System Organization (cont.)

Computer-System Organization (cont.) Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

2 MARKS Q&A 1 KNREDDY UNIT-I

2 MARKS Q&A 1 KNREDDY UNIT-I 2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,

More information

Computer Organization

Computer Organization University of Pune S.E. I.T. Subject code: 214442 Computer Organization Part 44 Cluster Processors, UMA, NUMA UNIT VI Tushar B. Kute, Department of Information Technology, Sandip Institute of Technology

More information

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE The most popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn s classification scheme is based on the notion of a stream of information.

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD Pipeline and Vector Processing 1. Parallel Processing Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data-processing tasks for the purpose

More information

Interconnect Routing

Interconnect Routing Interconnect Routing store-and-forward routing switch buffers entire message before passing it on latency = [(message length / bandwidth) + fixed overhead] * # hops wormhole routing pipeline message through

More information

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1 Pipelining and Vector Processing Parallel Processing: The term parallel processing indicates that the system is able to perform several operations in a single time. Now we will elaborate the scenario,

More information

Parallel Architecture. Hwansoo Han

Parallel Architecture. Hwansoo Han Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range

More information