Module 5 Introduction to Parallel Processing Systems

Size: px
Start display at page:

Download "Module 5 Introduction to Parallel Processing Systems"

Transcription

1 Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this can be achieved using mutliprocessors, coding in a way that at a time different parts of program could be executed simultaneously..for example threading etc. Pipelining is a particular form of parallelism. Pipelining is a particular arrangement of functions so that different portions of an operation flow through a particular set of sub-functions, with the sub-functions happening in parallel. For example, you have an equation a=b*c+d*e+f*g. step 1: computer shall multiply b*c(multiplier) and forward it to adder. step 2: while previous calculation now is added to zero (adder)...d*e is being calculated in multiplier which will then forward it to adder. Thus, various operations are pipelined so as to work parallely! 2. What is pipelining? Explain with suitable example the difference between serial instruction execution and pipelined instruction execution. Also prove that why pipelined instruction execution is faster as compared to serial instruction execution. 3. What is pipelining? Show that K stage pipelined processor has K times speed up compared to a non pipelined systems. Pipelining is used by virtually all modern microprocessors to enhance performance by overlapping the execution of instructions. If the stages of a pipeline are not balanced and one stage is slower than another, the entire throughput of the pipeline is affected. In terms of a pipeline within a CPU, each instruction is broken up into different stages. Ideally if each stage is balanced (all stages are ready to start at the same time and take an equal amount of time to execute.) the time taken per instruction (pipelined) is defined as: Time per instruction (unpipelined) / Number of stages 10

2 Module 5 : Introduction to Parallel Processing System The previous expression is ideal. We will see later that there are many ways in which a pipeline cannot function in a perfectly balanced fashion. In terms of a CPU, the implementation of pipelining has the effect of reducing the average instruction time, therefore reducing the average CPI. EX: If each instruction in a microprocessor takes 5 clock cycles (unpipelined) and we have a 4 stage pipeline, the ideal average CPI with the pipeline will be Pipelining increases throughput, but not latency Answer available every 200ps, BUT A single computation still takes 1ns Limitations: Computations must be divisible into stage size Pipeline registers add overhead 11

3 4. What is the difference between pipelining and parallelism? Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination. A parallel computing system is a computer with more than one processor for parallel processing. In the past, each processor of a multiprocessing system always came in its own processor packaging, but recentlyintroduced multicore processors contain multiple logical processors in a single package. There are many different kinds of parallel computers. They are distinguished by the kind of interconnection between processors (known as "processing elements" or PEs) and memory. Pipelining is a method of increasing system performance and throughput. It takes advantage of the inherent parallelism in instructions. Instructions are divided into 5 stages : IF, ID, EX, MEM, WB. In pipelining, we try to execute 2 or more instructions at the same time thereby increasing the throughput. With regards to Silicon area and speed, Pipelining takes up more Silicon area. This is because in hardware pipelining we duplicate the few units. In terms of speed, pipelining increases system performance and speed. 5. Explain the Flynn's Classification in detail? Flynn s classifications A taxonomy first introduced by Flynn is still the most common way of categorizing systems with parallel processing capability. Flynn proposed the following categories of computer systems: Single instruction, single data (SISD) stream: A single processor executes a single instruction stream to operate on data stored in a single memory. uniprocessors fall into this category. Single instruction, multiple data (SIMD) stream: A single machine instruction controls the simultaneous execution of a number of processing elements on a lockstep basis. Each processing element has an associated data memory, so that each instruction is executed on a different set of data by the different processors. Vector and array processors fall into this category. 12

4 Module 5 : Introduction to Parallel Processing System Multiple instruction, single data (MISD) stream: A sequence of data is transmitted to a set of processors, each of which executes a different instruction sequence. This structure is not commercially implemented. Multiple instruction, multiple data (MIMD) stream: A set of processors simultaneously execute different instruction sequences on different data sets. SMPs, clusters, and NUMA systems fit into this category. Fig. A taxonomy of Parallel Processor Architectures With the MIMD organization, the processors are general purpose; each is able to process all of the instructions necessary to perform the appropriate data transformation. MIMDs can be further subdivided by the means in which the processors communicate (fig.6.1). If the processors share a common memory, and processors communicate with each other via that memory. The most common form of such system is known as a symmetric Multiprocessor (SMP). In an SMP, multiple processors share a single memory or pool of memory by means of a shared bus or other interconnection mechanism: a distinguishing feature is that the memory access time to any region of memory is approximately the same for each processor. A more recent development is the nonuniform memory access (NUMA) organization. As the name suggests, the memory access time to different regions of memory may differ for a NUMA processor. SISD SISD machines executes a single instruction on individual data values using a single processor. Even if the processor incorporates internal parallelism, such as an instruction pipeline, the computer would still be classified as SISD Uniprocessors 13

5 Fig (a) SISD SIMD As its name implies, an SIMD machine executes a single instruction on multiple data values simultaneously using many processors. Since there is only one instruction, each processor does not have to fetch and decode each instruction. Instead a single control unit handles this task for all processors within the SIMD computer Examples: Illiac-IV, CM-2 Simple programming model Low overhead Flexibility All custom integrated circuits Fig 5.2 SIMD (with distributed memory) MISD: This classification is not practical to implement. So, no significant MISD computers have ever been built. It is included for completeness of the classification. MIMD: Systems referred to as multiprocessors or multicomputer are usually MIMD. It may execute multiple instructions simultaneously, unlike SIMD. So, each processor must include its own control unit. MIMD machines are well suited for general purpose use. Examples: Sun Enterprise 5000, Cray T3D, SGI Origin Flexible 14

6 Module 5 : Introduction to Parallel Processing System Fig. MIMD (with distributed memory) 6. Comparison of pipeline processor and array processor SIMD MIMD 1. It I also called as Array Processor It is also called as Multiprocessor 2. Here, single stream of instruction is fetched Here multiple stream of instructions are fetched 3. In SIMD (single instruction multiple data stream) the instruction stream is fetched by shared memory. In MIMD (multiple instruction multiple data stream) the instruction stream is fetched by control unit. 4. Here instruction is broadcasted to multiple processing elements (PEs), which will operate on different data sets (i.e. data streams are taken from shared memory modules). Hence the name SIMD. Here instruction streams are decoded to get multiple decoded instruction streams (IS), which operate on multiple data stream (DS) taken from shared memory modules. 7. Comparison of loosely coupled and tightly coupled multiprocessor organizations Loosely coupled Multiprocessor 1. In loosely coupled multiprocessor organization communication between computer modules has taken place through message Transfer system Tightly coupled Multiprocessor In tightly coupled multiprocessor organization communication between processor s can be taken place through P MIN. 15

7 2. Diagram is as follows I S I N Disks R LM CA S I D R LM CA S I D P 1 P n P M I N m I O P N 1 Unmapped local Memory (stores kernel) Memory map Message Transfer System (Time Shared BUS) (MTS) 3. In loosely coupled system, the processor do not share memory, and each processor has its own local memory. M 1 M 2 M 3 Shared memory modules In, tight coupled system, there is a single system-wise primary memory that is shared by all the processors. LM CPU LM CPU LM CPU LM CPU CPU CPU System-wise LM shared CPU memory CP U Communication N/W Interconnection hardware 4. In these systems, all physical communication between the processors is done by passing message across the network that interconnects the processors. 5. Loosely coupled systems, are referred as distributed computing Systems. 6. In this system, the interaction between various processors and computer modules is very less. 7. In this, (As channel and Arbiter Switch) is used to connect the computer module to a message transfer system. In these systems, communication between the processors usually takes place through the shared memory. In this system, the interaction between various processors is very high. In this system, the interaction between various processors is very high. In this, shared memory modules can communicate through PMIN. 16

8 Module 5 : Introduction to Parallel Processing System 8. In this, the processor is directly connected to IO devices. 9. Distributed memory multiprocessors are sometimes referred as loosely coupled system. 10. There are no unmapped local memories in loosely coupled system. In this, processor connected to IO device through IOPIN. Shared memory multiprocessor are sometimes referred as tightly coupled system. In this tightly coupled system, every processor has small unmapped local memories which is used to store Kernel code. 8. What is instruction pipelining? Instruction processing is subdivided into Fetch/ Execute instruction Pipeline has two independent stages: 1 st Stage Fetch an instruction and buffers it. 2 nd Stage Temporarily free until first stage passes it the buffered instruction. While the second stage is executing the instruction, the first stage fetches and buffers the next instruction. Instruction prefetch or fetch overlap. 17

9 Clock Cycle Instruction I1 F1 E1 I2 F2 E2 I3 F3 E3 9. What is 5 stage instruction pipelining?(risc Pipeline) The five clock cycles will be broken up into the following steps: Instruction Fetch Cycle Instruction Decode/Register Fetch Cycle Execution Cycle Memory Access Cycle Write-Back Cycle When a CPU executes an instruction, it transitions through number of stages to complete. These basic stages (or phases) are: instruction fetch, instruction decode, register fetch, execution/effective-address computation, memory access and write back to registers. Instruction fetch (IF) use the current program counter (PC) to index into memory and fetch the instruction. The act of fetching from memory may cause an exception and other complicated interactions with the hardware memory optimizations. Instruction decode/register fetch (ID) this phase combines decoding of the next instruction with the latching of the current register values from the register file with the interpretation of the operation to be executed. A suitably designed instruction-set architecture it is possible to determine the 18

10 Module 5 : Introduction to Parallel Processing System registers operated on by the command independently of what the actual instruction is, which allows the register fetch to occur in parallel with the instruction decode. Execution/effective-address computation (EXE) the arithmetic logic unit (ALU) is central to instruction execution. Given the decoded operation and latched operands, the ALU performs the required operation, which can be either register arithmetic or an effective-address computation. The result is then latched for the next phase of execution. Memory access (MEM) either store a latched operand into memory indexed by the effective-address(ea) computed in the previous phase or load and latch a value from the computed EA. Write back to registers (WBR) Transfer the latched output of the memory access or execution phases to the required register. Some instructions do not need a WBR phase. A simple CPU implementation would perform each phase in a clock cycle and every instruction would take up to 5 clock cycles to complete. Not all instructions require all phases, for instance a simple register indirect store instruction does not need to perform the execution or the write back phases and can complete in 3 cycles. Early computer designers realized that this implementation leaves most of the CPU idle for most of the time and does not use the resources efficiently. In 1960 the designers of the IBM-7030 (Stretch because it was supposed to be a stretch in the state of the art) introduced the concept of an instruction pipeline. As figure 6.3 below shows, a pipeline can overlap the execution phases of an instruction with previous instructions. 10. Explain Four stage CPU instruction pipeline. 19

11 Explain Six stage CPU instruction pipeline

12 Module 5 : Introduction to Parallel Processing System Advantages Multiple instructions are being processed at same time This works because stages are isolated by registers Best case speedup of N Certain preventions Instructions interfere with each other - hazards Example: different instructions may need the same piece of hardware (e.g., memory) in same clock cycle 21

13 Example: instruction may require a result produced by an earlier instruction that is not yet complete 12. Explain various types of hazards in pipelined processors with example. Also propose solution each type. 22

14 Module 5 : Introduction to Parallel Processing System 23

15 24

16 Module 5 : Introduction to Parallel Processing System 25

17 26

18 Module 5 : Introduction to Parallel Processing System 27

19 13. Explain Multiprocessor architectures. Multiprocessor architecture : This is nothing but MIMD architecture. These are classified into two categories : i) Loosely coupled multiprocessor 28

20 ii) Closely (tightly) coupled multiprocessor. Module 5 : Introduction to Parallel Processing System Loosely Coupled multiprocessor : This system contains a number of computer modules each containing a processor, local memory and I/O devices. A switch called as CAS is used to contact the computer module to a message transfer system (MTS) which is generally a time shared bus. It is used for message transfer between two computer modules. The diagram is as follows : Computer Module 1 Computer Module N LM IO L M I O P P CAS CA S Message Transfer System (MTS) (Time Shared Bus) Fig : Loosely coupled multiprocessor P : Processor LM : Local Memory IO : I/O devices CAS : Channel and arbiter Switch Here work load is distributed among various computer modules. One computer module and multiprocessor operating system is responsible for distribution of the work load. Each of that computer will carry out the assigned work local in a distributed / independent fashion. Hence it is sometimes called a distributed system. There are vary less amount of interaction between various processors / computer modules through time shared bus, hence it is called as Loosely Coupled System. The performance of loosely coupled system (LCS) depends on following factors : 1. Bandwidth of time shared bus which defines the speed of message transfer. 29

21 30 2. Message length 3. Message arrival rate, which define the degree of interaction / coupling between various computer modulus. Symmetric Multiprocessors An SMP can be defined as a standalone computer system belonging to tightly coupled multiprocessing system and has the following characteristics: 1. There are two or more similar processors of comparable capability 2. These processors share the sane main memory and I/O facilities and are interconnected by a bus or other internal connection scheme, such that memory access time is approximately the same for each processor. 3. All processors share access to I/O devices, either through the same channels or through different channels that provide paths to the same device. 4. All processors can perform the same functions (hence the term symmetric) 5. The system is controlled by an integrated operating system that provides interaction between processors and their programs at the job, task, file, and data element levels. Clusters Clustering is an alternative to symmetric multiprocessing as an approach to providing high performance and high availability and is particularly attractive for server applications. Here the physical unit of interaction is usually a message or complete file. Whereas in SMP, individual data elements can constitute the level of interaction, and there can be a high degree of cooperation between processes. Clusters belong to loosely coupled multiprocessing system. A cluster can be defined as a group of interconnected, whole computers working together as a unified computing resource that can create the illusion of being one machine. The term whole computer means a system that can run on its own, apart from the cluster. Nonuniform Memory Access (NUMA) Here, all processors have access to all parts of main memory using loads and stores. The memory access time of a processor differs depending on which region of main memory is accessed. The last is true for all processors; however, for different processors, which memory regions are slower and which are faster differ. NUMA belongs to tightly coupled multiple-processor system. A special kind of NUMA is CC-NUMA (Cache-coherent NUMA) in which cache coherence is maintained among the caches of the various processors.

Multi-Processor / Parallel Processing

Multi-Processor / Parallel Processing Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Computer Organization. Chapter 16

Computer Organization. Chapter 16 William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple

More information

Processor Architecture and Interconnect

Processor Architecture and Interconnect Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing

More information

What is Pipelining. work is done at each stage. The work is not finished until it has passed through all stages.

What is Pipelining. work is done at each stage. The work is not finished until it has passed through all stages. PIPELINING What is Pipelining A technique used in advanced microprocessors where the microprocessor begins executing a second instruction before the first has been completed. - A Pipeline is a series of

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Chapter 18. Parallel Processing. Yonsei University

Chapter 18. Parallel Processing. Yonsei University Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih ARCHITECTURAL CLASSIFICATION Mariam A. Salih Basic types of architectural classification FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FENG S CLASSIFICATION Handler Classification Other types of architectural

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

CSCI 4717 Computer Architecture

CSCI 4717 Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Classification of Parallel Architecture Designs

Classification of Parallel Architecture Designs Classification of Parallel Architecture Designs Level of Parallelism Job level between jobs between phases of a job Program level between parts of a program within do-loops between different function invocations

More information

Dr. Joe Zhang PDC-3: Parallel Platforms

Dr. Joe Zhang PDC-3: Parallel Platforms CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Flynn s Classification

Flynn s Classification Flynn s Classification Guang R. Gao ACM Fellow and IEEE Fellow Endowed Distinguished Professor Electrical & Computer Engineering University of Delaware ggao@capsl.udel.edu 652-14F-PXM-intro 1 Execution

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE The most popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn s classification scheme is based on the notion of a stream of information.

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

Architecture of parallel processing in computer organization

Architecture of parallel processing in computer organization American Journal of Computer Science and Engineering 2014; 1(2): 12-17 Published online August 20, 2014 (http://www.openscienceonline.com/journal/ajcse) Architecture of parallel processing in computer

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

Chapter 11. Introduction to Multiprocessors

Chapter 11. Introduction to Multiprocessors Chapter 11 Introduction to Multiprocessors 11.1 Introduction A multiple processor system consists of two or more processors that are connected in a manner that allows them to share the simultaneous (parallel)

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Memory Systems in Pipelined Processors

Memory Systems in Pipelined Processors Advanced Computer Architecture (0630561) Lecture 12 Memory Systems in Pipelined Processors Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interleaved Memory: In a pipelined processor data is required every

More information

Parallel Processors. Session 1 Introduction

Parallel Processors. Session 1 Introduction Parallel Processors Session 1 Introduction Applications of Parallel Processors Structural Analysis Weather Forecasting Petroleum Exploration Fusion Energy Research Medical Diagnosis Aerodynamics Simulations

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Lecture 24: Virtual Memory, Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,

More information

Chapter 17 - Parallel Processing

Chapter 17 - Parallel Processing Chapter 17 - Parallel Processing Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 17 - Parallel Processing 1 / 71 Table of Contents I 1 Motivation 2 Parallel Processing Categories

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

COSC4201 Multiprocessors

COSC4201 Multiprocessors COSC4201 Multiprocessors Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Multiprocessing We are dedicating all of our future product development to multicore

More information

Introduction to parallel computing

Introduction to parallel computing Introduction to parallel computing 2. Parallel Hardware Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Motherboard Processor https://sites.google.com/

More information

CDA3101 Recitation Section 13

CDA3101 Recitation Section 13 CDA3101 Recitation Section 13 Storage + Bus + Multicore and some exam tips Hard Disks Traditional disk performance is limited by the moving parts. Some disk terms Disk Performance Platters - the surfaces

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

Parallel Architectures

Parallel Architectures Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s

More information

Objectives of the Course

Objectives of the Course Objectives of the Course Parallel Systems: Understanding the current state-of-the-art in parallel programming technology Getting familiar with existing algorithms for number of application areas Distributed

More information

RISC Processors and Parallel Processing. Section and 3.3.6

RISC Processors and Parallel Processing. Section and 3.3.6 RISC Processors and Parallel Processing Section 3.3.5 and 3.3.6 The Control Unit When a program is being executed it is actually the CPU receiving and executing a sequence of machine code instructions.

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

A taxonomy of computer architectures

A taxonomy of computer architectures A taxonomy of computer architectures 53 We have considered different types of architectures, and it is worth considering some way to classify them. Indeed, there exists a famous taxonomy of the various

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message

More information

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD Pipeline and Vector Processing 1. Parallel Processing Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data-processing tasks for the purpose

More information

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University COSC4201 Multiprocessors and Thread Level Parallelism Prof. Mokhtar Aboelaze York University COSC 4201 1 Introduction Why multiprocessor The turning away from the conventional organization came in the

More information

Aleksandar Milenkovich 1

Aleksandar Milenkovich 1 Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

High Performance Computing in C and C++

High Performance Computing in C and C++ High Performance Computing in C and C++ Rita Borgo Computer Science Department, Swansea University Announcement No change in lecture schedule: Timetable remains the same: Monday 1 to 2 Glyndwr C Friday

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

PARALLEL COMPUTER ARCHITECTURES

PARALLEL COMPUTER ARCHITECTURES 8 ARALLEL COMUTER ARCHITECTURES 1 CU Shared memory (a) (b) Figure 8-1. (a) A multiprocessor with 16 CUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different

More information

Distributed Operating System Shilpa Yadav; Tanushree & Yashika Arora

Distributed Operating System Shilpa Yadav; Tanushree & Yashika Arora Distributed Operating System Shilpa Yadav; Tanushree & Yashika Arora A Distributed operating system is software over collection of communicating, networked, independent and with physically separate computational

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory Pipelining the Idea Similar to assembly line in a factory Divide instruction into smaller tasks Each task is performed on subset of resources Overlap the execution of multiple instructions by completing

More information

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate to solve large problems fast Broad issues involved: Resource Allocation: Number of processing elements

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1 Pipelining and Vector Processing Parallel Processing: The term parallel processing indicates that the system is able to perform several operations in a single time. Now we will elaborate the scenario,

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

5 Computer Organization

5 Computer Organization 5 Computer Organization 5.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three subsystems of a computer. Describe the

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

ECEC 355: Pipelining

ECEC 355: Pipelining ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly

More information

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor

More information

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville Lecture 18: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Parallel Computers Definition: A parallel computer is a collection

More information