Lecture 7: Parallel Processing
|
|
- Ashley Rose
- 6 years ago
- Views:
Transcription
1 Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction execution time: Increased clock frequency by fast circuit technology. Simplify instructions (RISC). Parallelism within a processor: Pipelining. Parallel execution of instructions (ILP): Superscalar processors. VLIW architectures. Parallel processing: Many processors are implemented in a single computer. A huge degree of parallelism is possible. Zebo Peng, IDA, LiTH 2 1
2 Why Parallel Processing? Traditional computers are not able to meet high-performance requirements for many applications: Simulation of large complex systems in physics, economy, biology... Distributed data base with search function. Artificial intelligence and autonomous systems. Computer-aided design. Visualization and multimedia. Multi-tasking and multi-user systems (e.g., super computers). Such applications are characterized often by a very large amount of numerical computations and/or a high quantity of input data. In order to deliver sufficient performance for such applications, we should implement many processors in a single computer. Zebo Peng, IDA, LiTH 3 Why Parallel Processing (Cont d)? Technology development: Hardware and silicon technology makes it possible to build machines with huge degree of parallelism cost effectively. It started with mainframes and supercomputers. Now even file servers and regular PCs are implemented often as parallel machines. PP has also the potential of being more reliable: If one processor fails, the system continues to work, only with a lower performance. PP provides also a flexible platform to build scalable systems with different performances and capabilities. Zebo Peng, IDA, LiTH 4 2
3 Parallel Computer Parallel computers refer to architectures in which many CPUs are running in parallel to implement a given application or a set of applications. Such computers can be implemented in different ways, depending on several key parameters: number and complexity of individual CPUs; availability of common (shared) memory; interconnection technology and topology; performance of interconnection network; I/O devices; Zebo Peng, IDA, LiTH 5 Parallel Program In order to fully utilize a parallel computer, one should decompose a problem into sub-problems that can be solved in parallel. The results of sub-problems may have to be combined to get the final result of the main problem. Due to data dependency among the sub-problems, it is not easy to decompose some problems to get a large degree of parallelism. Due to data dependency, the processors may also have to communicate among each other. The time taken for communication may be very high when compared with the processing time. The communication mechanism must therefore be very well designed in order to get a good performance. Zebo Peng, IDA, LiTH 6 3
4 Parallel Program Example (1) Matrix computations: A B A B A B A B A B A B A B A B CAB A B A B A B A B A B A B A B A B M 1M M 2M M 3M N 1 N 1 N 2 N 2 N 3 N 3 NM NM Vector computation with vector of m elements: for i:=1 to n do C[i,1:m]:=A[i,1:m] + B[i,1:m]; end for; Zebo Peng, IDA, LiTH 7 Parallel Program Example (2) A vector dot product is common in filtering: Y Parallel sorting: N i1 a( i)... U N S O R T E D... Unsorted-1 Unsorted-2 Unsorted-3 Unsorted-4 x( i) Sorting Sorting Sorting Sorting Parallel part Sorted-1 Sorted-2 Sorted-3 Sorted-4 Merge Sequential part S O R T E D Zebo Peng, IDA, LiTH 8 4
5 Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 9 Flynn s Classification of Architectures Based on the nature of the instruction flow executed by a computer and the data flow on which the instructions operate. The multiplicity of instruction stream and data stream gives us four different classes: Single instruction, single data stream SISD Single instruction, multiple data stream SIMD Multiple instruction, single data stream MISD Multiple instruction, multiple data stream MIMD Zebo Peng, IDA, LiTH 10 5
6 Single Instruction, Single Data SISD The regular computers we have discussed up till now: A single processor; A single instruction stream; and Data stored in a single memory. Processor Control Unit Processing unit Memory System Zebo Peng, IDA, LiTH 11 Single Instruction, Multiple Data SIMD A single machine instruction stream. Simultaneous execution on different sets of data. A large number of processing elements is usually implemented. Lockstep synchronization among the process elements. The processing elements can: have their respective private data memory; or share a common memory via an interconnection network. Array and vector processors are the most common examples of SIMD machines. Zebo Peng, IDA, LiTH 12 6
7 SIMD with Shared Memory Control Unit IS Processing Unit_1 Processing Unit_2 Processing Unit_n DS 1 DS 2 DS n Interconnection Network Shared Memory Zebo Peng, IDA, LiTH 13 Multiple Instruction, Single Data MISD A single sequence of data, processed by a set of processors. The processors execute different instruction sequences. Fault-tolerant computers execute redundant operations on the same data. The space shuttle flight control computers, as an example. It has not been commercially implemented in a large scale! Data PE1 PE2... Zebo Peng, IDA, LiTH 14 PEn 7
8 Multiple Instruction, Multiple Data MIMD It consists of a set of processors. Simultaneously execute different instruction sequences. Different sets of data are operated on. The MIMD class can be further divided: Shared memory (tightly coupled): Symmetric multiprocessor (SMP) Non-uniform memory access (NUMA) Distributed memory (loosely coupled) = Clusters Zebo Peng, IDA, LiTH 15 MIMD with Shared Memory CPU_1 LM 1 DS 1 Control Unit_1 IS 1 Processing Unit_1 CPU_2 CPU_n Control Unit_2 Control Unit_n IS 2 IS 2 LM 2 Processing Unit_2 LM n Processing Unit_n DS 2 DS n Interconnection Network Shared Memory Zebo Peng, IDA, LiTH 16 8
9 Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 17 Performance Measurement Units (MIPS = Millions of Instructions Per Second.) FLOPS = FLoating Point Operations Per Second. kiloflops (kflops) = 10 3 megaflops (MFLOPS) = 10 6 gigaflops (GFLOPS) = 10 9 teraflops (TFLOPS) = petaflops (PFLOPS) = exaflops (EFLOPS) = zettaflops (ZFLOPS) = yottaflops (YFLOPS) = Zebo Peng, IDA, LiTH 18 9
10 Performance Metrics (1) How fast does a parallel computer run at its maximal potential? Peak rate: the maximal computation rate that can be theoretically achieved when all processors are fully utilized. Ex. The fastest supercomputer in the world has a peak rate of about 100 PFLOPS. The peak rate is of no practical significance for an individual user. It is mostly used by vendor companies for marketing their computers. Zebo Peng, IDA, LiTH 19 Performance Metrics (2) How fast execution can we expect from a parallel computer for a given application or a given set of applications? Note the increase of multi-tasking and multi-thread computing. Speedup: measures the gain we get by using a parallel computer, over a sequential one, to run a given application. S = Ts Tp TS: execution time needed with the sequential computer; Tp : execution time needed with the parallel computer. Zebo Peng, IDA, LiTH 20 10
11 Performance Metrics (3) Efficiency: to relate speedup to the number of processors used; it provides therefore a measure of the efficiency with which the processors are used. S E = P S: speedup; P: number of processors. For the ideal situation, in theory: S = P; which means E = 1. Practically the ideal efficiency of 1 cannot be achieved! Zebo Peng, IDA, LiTH 21 Speed Up Limitation Let f be the ratio of computations that, according to the application, have to be executed sequentially (0 f 1), and P the number of processors. S Tp = f Ts + S = f Ts + (1 f ) Ts P Ts (1 f ) Ts P = f + 1 (1 f ) P For a parallel computer with 10 processing elements f Zebo Peng, IDA, LiTH 22 11
12 Speed Up vs. % of Parallel Part (1-f) Speedup Cores Zebo Peng, IDA, LiTH 23 Amdahl s Law Even a little ratio of sequential computation imposes a limit on the speedup. A higher speedup than 1/f can t be achieved, regardless of the number of processors, since 1 1 If there is 20% sequential S = (1 f) computation, the speedup will f f + P maximally be 5, even If you have 1 million processors. To efficiently exploit a large number of processors, f must be small (the application has to be highly parallel), since S E = P = 1 f (P 1) + 1 Zebo Peng, IDA, LiTH 24 12
13 Other Factors that Limit Speedup Beside the intrinsic sequentiality of parts of an application, there are also other factors that limit the achievable speedup: communication cost; load balancing of the processors; costs of creating and scheduling processes; and I/O operations (mostly sequential in nature). There are many applications with a high degree of parallelism. The value of f is very small and can be ignored, and they are suited for massively parallel systems. The other limiting factors, such as the cost of communications, become critical, in such algorithms. Zebo Peng, IDA, LiTH 25 Impact of Communication Consider a highly parallel computation, f is small and can be neglected. Let fc be the fractional communication overhead of a processor: Tcalc: the time that a processor executes computations; Tcomm: the time that a processor is idle because of communication; fc = Tcomm Tcalc TS S = = Tp P 1 + fc Tp = E = TS P (1 + fc) 1 1 fc 1 + fc (if fc is small) With applications having a high degree of parallelism, massively parallel computers, consisting of large number of processors, can be efficiently used only if fc is small. The time spent by a processor for communication has to be small compared to its time for computation. Communication time is very much impacted by the interconnection network. Zebo Peng, IDA, LiTH 26 13
14 Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 27 Interconnection Network Interconnection network (IN) is a key component of a parallel architecture. It has a decisive influence on: the overall performance; and the total cost of the architecture. The traffic in an IN consists of: data transfer; and transfer of commands and requests (control information). The key parameters of an IN are total bandwidth: transferred bits/second; and implementation cost. Zebo Peng, IDA, LiTH 28 14
15 ... Node1 Node2 Noden Single Bus It is simple, cheap and relatively flexible; and it is a broadcast mechanism. One single communication is allowed at a time; the bandwidth is shared by all nodes. Performance is relatively poor. In order to have good performance, the number of nodes is limited (to around 16-20). Multiple buses can be used instead, if needed. Zebo Peng, IDA, LiTH 29 Completely Connected Network N x (N-1)/2 wires Each node is connected to every other node. Communications can be performed in parallel between any pair of nodes. Both performance and cost are high. Cost increases rapidly with number of nodes. Zebo Peng, IDA, LiTH 30 15
16 Crossbar Network Node 1 Node 2 Node n A dynamic network: the interconnection topology can be modified by configurating the switches. It is completely connected: any node can be directly connected to any other. Fewer interconnections are needed than for the static completely connected network; however, a large number of switches is needed. A large number of communications can be performed in parallel (even though one node can receive or send only one data at a time). Zebo Peng, IDA, LiTH 31 Torus: Mesh Network Cheaper than completely connected networks, while giving relatively good performance. To transmit data between two nodes, routing through intermediate nodes is needed, with maximum 2 (n-1) nodes for an n n mesh. It is possible to provide wrap-around connections: Torus. Three dimensional meshes have also been implemented. Zebo Peng, IDA, LiTH 32 16
17 Hypercube Network 2-D 3-D 4-D 5-D 2 n nodes are arranged in an n-dimensional cube. Each node is connected to n neighbors. To transmit data between two nodes, routing through intermediate nodes is needed, but with maximum n intermediates. Zebo Peng, IDA, LiTH 33 Summary The growing need for high computing performance cannot always be satisfied by traditional computers. With parallel computers, multiple CPUs are running concurrently in order to solve a given problem. Parallel programs have to be developed in order to make efficient use of a parallel computer. Computers can be classified based on the nature of the instruction flow and the data flow on which the instructions operate. Another key component of a parallel architecture is the interconnection network. The performance of a parallel computer depends not only on the number of processors and interconnection network, but also on characteristics of the executed programs. Zebo Peng, IDA, LiTH 34 17
Lecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationLecture 8: RISC & Parallel Computers. Parallel computers
Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer
More informationARCHITECTURES FOR PARALLEL COMPUTATION
Datorarkitektur Fö 11/12-1 Datorarkitektur Fö 11/12-2 Why Parallel Computation? ARCHITECTURES FOR PARALLEL COMTATION 1. Why Parallel Computation 2. Parallel Programs 3. A Classification of Computer Architectures
More informationLecture 9: MIMD Architecture
Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationParallel computer architecture classification
Parallel computer architecture classification Hardware Parallelism Computing: execute instructions that operate on data. Computer Instructions Data Flynn s taxonomy (Michael Flynn, 1967) classifies computer
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationComputer Architecture
Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationChapter 18 Parallel Processing
Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD
More informationAdvanced Parallel Architecture. Annalisa Massini /2017
Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing
More informationUnit 9 : Fundamentals of Parallel Processing
Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationComputer parallelism Flynn s categories
04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories
More information4. Networks. in parallel computers. Advances in Computer Architecture
4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors
More informationChapter 11. Introduction to Multiprocessors
Chapter 11 Introduction to Multiprocessors 11.1 Introduction A multiple processor system consists of two or more processors that are connected in a manner that allows them to share the simultaneous (parallel)
More informationBİL 542 Parallel Computing
BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,
More informationSMD149 - Operating Systems - Multiprocessing
SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction
More informationOverview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy
Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system
More informationMultiprocessors - Flynn s Taxonomy (1966)
Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The
More informationParallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam
Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of
More informationARCHITECTURES FOR PARALLEL COMPUTATION
Datorarkitektur Fö 11/12-1 Datorarkitektur Fö 11/12-2 ARCHITECTURES FOR PARALLEL COMTATION 1. Why Parallel Computation 2. Parallel Programs 3. A Classification of Computer Architectures 4. Performance
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete
More information3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:
BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General
More informationParallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor
Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationDr. Joe Zhang PDC-3: Parallel Platforms
CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationModule 5 Introduction to Parallel Processing Systems
Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this
More informationArchitecture of parallel processing in computer organization
American Journal of Computer Science and Engineering 2014; 1(2): 12-17 Published online August 20, 2014 (http://www.openscienceonline.com/journal/ajcse) Architecture of parallel processing in computer
More informationARCHITECTURAL CLASSIFICATION. Mariam A. Salih
ARCHITECTURAL CLASSIFICATION Mariam A. Salih Basic types of architectural classification FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FENG S CLASSIFICATION Handler Classification Other types of architectural
More informationComputer Organization. Chapter 16
William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More information06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1
Credits:4 1 Understand the Distributed Systems and the challenges involved in Design of the Distributed Systems. Understand how communication is created and synchronized in Distributed systems Design and
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming ATHENS Course on Parallel Numerical Simulation Munich, March 19 23, 2007 Dr. Ralf-Peter Mundani Scientific Computing in Computer Science Technische Universität München
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationOrganisasi Sistem Komputer
LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationComputer Architecture and Organization
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 10 Advanced Computer Architecture 10-2 Chapter 10 - Advanced Computer
More informationAdvanced Computer Architecture. The Architecture of Parallel Computers
Advanced Computer Architecture The Architecture of Parallel Computers Computer Systems No Component Can be Treated In Isolation From the Others Application Software Operating System Hardware Architecture
More informationParallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor
More informationTypes of Parallel Computers
slides1-22 Two principal types: Types of Parallel Computers Shared memory multiprocessor Distributed memory multicomputer slides1-23 Shared Memory Multiprocessor Conventional Computer slides1-24 Consists
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationCSCI 4717 Computer Architecture
CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel
More informationMultiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism
Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,
More informationFLYNN S TAXONOMY OF COMPUTER ARCHITECTURE
FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE The most popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn s classification scheme is based on the notion of a stream of information.
More informationPipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD
Pipeline and Vector Processing 1. Parallel Processing Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data-processing tasks for the purpose
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationLecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter
Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)
More informationCSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing
Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed
More informationProcessor Architecture and Interconnect
Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing
More informationLect. 2: Types of Parallelism
Lect. 2: Types of Parallelism Parallelism in Hardware (Uniprocessor) Parallelism in a Uniprocessor Pipelining Superscalar, VLIW etc. SIMD instructions, Vector processors, GPUs Multiprocessor Symmetric
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationParallel Numerics, WT 2013/ Introduction
Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally
More informationHigh Performance Computing in C and C++
High Performance Computing in C and C++ Rita Borgo Computer Science Department, Swansea University Announcement No change in lecture schedule: Timetable remains the same: Monday 1 to 2 Glyndwr C Friday
More informationFundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.
Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing
More informationPIPELINE AND VECTOR PROCESSING
PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates
More informationNon-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.
CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says
More informationCourse II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan
Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor
More informationParallel Programming Programowanie równoległe
Parallel Programming Programowanie równoległe Lecture 1: Introduction. Basic notions of parallel processing Paweł Rzążewski Grading laboratories (4 tasks, each for 3-4 weeks) total 50 points, final test
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationINTERCONNECTION NETWORKS LECTURE 4
INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationFundamentals of Computer Design
Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationParallel Numerics, WT 2017/ Introduction. page 1 of 127
Parallel Numerics, WT 2017/2018 1 Introduction page 1 of 127 Scope Revise standard numerical methods considering parallel computations! Change method or implementation! page 2 of 127 Scope Revise standard
More informationEECS4201 Computer Architecture
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationParallel Processors. Session 1 Introduction
Parallel Processors Session 1 Introduction Applications of Parallel Processors Structural Analysis Weather Forecasting Petroleum Exploration Fusion Energy Research Medical Diagnosis Aerodynamics Simulations
More informationFundamentals of Quantitative Design and Analysis
Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature
More informationSHARED MEMORY VS DISTRIBUTED MEMORY
OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors
More informationFundamentals of Computers Design
Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationParallel programming. Luis Alejandro Giraldo León
Parallel programming Luis Alejandro Giraldo León Topics 1. 2. 3. 4. 5. 6. 7. 8. Philosophy KeyWords Parallel algorithm design Advantages and disadvantages Models of parallel programming Multi-processor
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More information18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor
More informationMulti-Processor / Parallel Processing
Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms
More informationIntroduction. EE 4504 Computer Organization
Introduction EE 4504 Computer Organization Section 11 Parallel Processing Overview EE 4504 Section 11 1 This course has concentrated on singleprocessor architectures and techniques to improve upon their
More informationMulticores, Multiprocessors, and Clusters
1 / 12 Multicores, Multiprocessors, and Clusters P. A. Wilsey Univ of Cincinnati 2 / 12 Classification of Parallelism Classification from Textbook Software Sequential Concurrent Serial Some problem written
More informationParallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.
Parallel Systems Prof. James L. Frankel Harvard University Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Architectures SISD (Single Instruction, Single Data)
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationEITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor
EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationComputer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015
18-447 Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 Assignments Lab 7 out Due April 17 HW 6 Due Friday (April 10) Midterm II April
More informationMulti-core Programming - Introduction
Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
More informationPart VII Advanced Architectures. Feb Computer Architecture, Advanced Architectures Slide 1
Part VII Advanced Architectures Feb. 2011 Computer Architecture, Advanced Architectures Slide 1 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture:
More informationCS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2
CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann
More informationOverview. Processor organizations Types of parallel machines. Real machines
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More
More information