Lecture 9: MIMD Architecture
|
|
- Abel Simmons
- 5 years ago
- Views:
Transcription
1 Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected together. In contrast to a SIMD computer, a MIMD computer can execute different programs on different processors. Flexibility! P 1 P 8 m 8 P 9 P 5 P 6 P 17 m 6 m 18 m 7 P 7 P 18 m m P 19 P m P 20 P 13 P 22 P 10 P 21 m m m 14 m P 23 P m 11 m 25 P 14 m 15 m 5 m 17 m 9 m 13 m21 m 22 P 24 P 32 m 34 P 28 P 23 P 22 m 23 P 31 P 23 m 40 m 41 P34 P 35 P 36 P 33 P 39 m 42 P 37 P 38 Irregular Task Parallelism (Intelligent cruise controller) Zebo Peng, IDA, LiTH 2 1
2 Introduction (Cont d) MIMD processors work asynchronously, and don t have to synchronize with each other. Local clocks can run at higher speeds. At any time, different processors may be executing different instructions on different steams of data. They can be built from commodity (off-the-shelf) microprocessors with relatively little effort. They are also highly scalable, provided that an appropriate memory organization is used. Most current parallel computers are built based on the MIMD architecture. Supercomputers have a global MIMD organization, even though locally they use SIMD for vector processing. Zebo Peng, IDA, LiTH 3 SIMD vs MIMD A SIMD computer requires less hardware than a MIMD computer (single control unit). However, SIMD processors are specially designed, and tend to be expensive and have long design cycles. Not all applications are naturally suited to SIMD processors. Conceptually, MIMD computers cover also SIMD need. By letting all processors executing the same program (single program multiple data streams SPMD). On the other hand, SIMD architectures are more power efficient, since one instruction controls many computations. For regular vector computation, it is still more appropriate to use a SIMD machine (e.g., implemented in GPUs). Zebo Peng, IDA, LiTH 4 2
3 MIMD Processor Classification Centralized Memory: Shared memory located at centralized location consisting usually of several interleaved modules the same distance from any processor. Symmetric Multiprocessor (SMP) Uniform Memory Access (UMA) Distributed Memory: Memory is distributed to each processor improving scalability. Message Passing Architectures: No processor can directly access another processor s memory. Hardware Distributed Shared Memory (DSM): Memory is distributed, but the address space is shared. Non-Uniform Memory Access (NUMA) Software DSM: A level of OS built on top of message passing multiprocessor to give a shared memory view to the programmer. Zebo Peng, IDA, LiTH 5 MIMD with Shared Memory CU1 IS PE1 DS CU2... IS PE2... DS Bottleneck Shared Memory CUn IS PEn DS Tightly coupled, not very scalable. Typically called Multi-processor systems. Zebo Peng, IDA, LiTH 6 3
4 MIMD with Distributed Memory CU1 IS PE1 DS LM1 CU2... IS PE2... DS LM2... Interconnection Network CUn IS PEn DS LMn Loosely coupled with message passing, scalable. Typically called Multi-computer systems. Zebo Peng, IDA, LiTH 7 Shared-Address-Space Platforms Part (or all) of the memory is accessible to all processors. Processors interact by modifying data objects stored in this shared-address-space. P P... P Interconnection Network M M... M UMA (uniform memory access): the time taken by a processor to access any memory word is identical. NUMA, otherwise. Memory Zebo Peng, IDA, LiTH 8 4
5 Multi-Computer Systems A multi-computer system is a collection of computers interconnected by a message-passing network. Ex. clusters. Each processor is an autonomous computer: With its own local memory. There is not a shared-address-space any longer. They communicating with each other through a message passing network. They can be easily built with off-the-shelf microprocessors. Zebo Peng, IDA, LiTH 9 MIMD Design Issues Design decisions related to an MIMD machine are very complex, since it involves both architecture and software issues. The most important issues include: Processor design (functionality and capability) Physical organization Interconnection structure Inter-processor communication protocols Memory hierarchy Cache organization and coherency Operating system design Parallel programming languages Application software techniques Zebo Peng, IDA, LiTH 10 5
6 Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 11 Symmetric Multiprocessors (SMP) A set of similar processors of comparable capacity. Each processor can perform the same functions (symmetric). The processors are connected by a bus or other interconnections. They share the same memory. A single memory or a set of memory modules. Memory access time is the same for each processor. All processors share access to I/O. Either through the same channels or different channels giving paths to the same devices. All processors are controlled by an integrated operating system. Zebo Peng, IDA, LiTH 12 6
7 An SMP Example Zebo Peng, IDA, LiTH 13 High performance SMP Advantages If similar work can be done in parallel (e.g., scientific computing). Good availability Since all processors can perform the same functions, failure of a single processor does not stop the system. Support incremental growth Users can enhance performance by adding additional processors. Good scalability Vendors can offer range of products based on different number of processors. Zebo Peng, IDA, LiTH 14 7
8 SMP based on Shared Bus Critical for performance Zebo Peng, IDA, LiTH 15 Shared Bus SMP Pros and Cons Advantages: Simple and cheap. Flexibility easy to expand the system by attaching more processors. Disadvantages: Performance limited by the bus bandwidth. Each processor should have local cache: To reduce number of bus accesses. It leads to problems with cache coherence. Should be solved in hardware (to be discussed in Lecture 10). Zebo Peng, IDA, LiTH 16 8
9 Multi-Port Memory SMP Direct access of memory modules by several processors. Better performance than a bus-based SMP: Each processor has dedicated path to memory module. All the ports can be active simultaneously. With the restriction that only one write can be performed into a memory location at a time. Zebo Peng, IDA, LiTH 17 Multi-Port Memory SMP (Cont d) Cache Cache Hardware logic required to resolve conflicts (parallel writes). Permanently designated priorities to each memory port. Can configure portions of memory as private to one or more processors (improved also security). Write-through cache policy requires to alert other processors of memory updates. Zebo Peng, IDA, LiTH 18 9
10 Operating System Issues OS should manage the processor resources so that the user perceives them as a single system. It should appear as a single-processor multiprogramming system. In both SMP and uniprocessor, several processes may be active at one time. OS is responsible for scheduling their execution and allocating resources. A user may build applications with multiple processes without regard to if a single processor or multiple processors are available. OS addresses the issues of reliability and fault tolerance. Graceful degradation. Zebo Peng, IDA, LiTH 19 IBM S/390 Mainframe SMP Processor unit (PU): CISC microprocessor Frequently used instructions hard wired 64k L1 unified cache with 1 cycle access time Zebo Peng, IDA, LiTH 20 10
11 IBM S/390 Mainframe SMP L2 cache 384k Zebo Peng, IDA, LiTH 21 IBM S/390 Mainframe SMP Bus switching network adapter (BSN) Includes 2M of L3 cache Zebo Peng, IDA, LiTH 22 11
12 IBM S/390 Mainframe SMP Memory card 8G per card Zebo Peng, IDA, LiTH 23 Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 24 12
13 Memory Access Approaches Uniform memory access (UMA), as in SMP: All processors have access to all parts of memory. Access time to all regions of memory is the same. Access time for different processors is the same. Non-uniform memory access (NUMA) All processors have access to all parts of memory. Access time of processor differs depending on region of memory. Different processors access different regions of memory at different speeds. Zebo Peng, IDA, LiTH 25 Motivation for NUMA SMP has stringent limit to the number of processors. Bus traffic limits to about processors. A multi-port memory has limited number of inputs (ca. 10). In cluster computers each node has its own memory. Applications don t see a large global memory. Coherence maintained by software not hardware (i.e., slow speed). NUMA retains SMP flavour while making large scale multiprocessing possible. e.g., Silicon Graphics Origin NUMA machine has 1024 processors ( 1 teraflops peak performance). NUMA provides a transparent system-wide memory while allowing nodes to have their own bus or internal interconnections. Zebo Peng, IDA, LiTH 26 13
14 A Typical NUMA Organization (1) Each node is, in general, an SMP. Each node has its own main memory. Each processor has its own L1 and L2 caches. Zebo Peng, IDA, LiTH 27 A Typical NUMA Organization (2) Nodes are connected by some networking facility. Each processor sees a single addressable memory space: Each memory location has a unique systemwide address. Zebo Peng, IDA, LiTH 28 14
15 A Typical NUMA Organization (3) Load X Memory request order: L1 cache (local to processor) L2 cache (local to processor) Main memory (local to node) Remote memory (via the interconnect network) All is done automatically and transparent to the processor. With very different access time! Zebo Peng, IDA, LiTH 29 NUMA Pros and Cons Better performance at higher levels of parallelism than SMP. However, performance can break down if too much access to remote memory, which can be avoided by: Designing better L1 and L2 caches to reduce memory access. Need good temporal and spatial locality of software. If software has good spatial locality, data needed for an application will reside on a small number of frequently used pages. They can be initially moved into the local memory. Enhancing VM by including in OS a page migration mechanism to move a VM page to a node that is frequently using it. Design of OS becomes an important issue for NUMA machines. Zebo Peng, IDA, LiTH 30 15
16 Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 31 Loosely Coupled MIMD Clusters Cluster: A set of computers is connected typically over a highbandwidth local area network, and used as a multi-computer system. A group of interconnected stand-alone computers. Work together as a unified resource. Each computer is called a node. A node can also be a multiprocessor itself, such as an SMP. Message passing for communication between nodes. There is no longer a shared memory space. NOW Network of Workstations, usually homogeneous cluster. Grid (Heterogeneous) computers, over a wide area network. Cloud computing A large number of computers connected via the Internet. Zebo Peng, IDA, LiTH 32 16
17 Cluster Benefits Absolute scalability: Cluster with a huge number of nodes is possible. Incremental scalability: A user can start with a small system and expand it as needed, without having to go through a major upgrade. High availability: Fault tolerance by nature. Superior price/performance: Can be built with cheap off-the-shelf nodes. With supercomputing-class off-the-shelf components. Amazon, Google, Facebook and Hotmail rely on clusters of PCs to provide services used by millions of users every day. Zebo Peng, IDA, LiTH 33 IBM s Blue Gene Supercomputer A supercomputer with a massive number of PowerPC processors and a huge memory space. 65,536 processors give 360 teraflops performance (BG/L). It led several times TOP500 and Green500 (energy efficiency) lists of supercomputers. It makes use of message passing for communication between nodes. Zebo Peng, IDA, LiTH 34 17
18 Blue Gene/L Architecture Zebo Peng, IDA, LiTH 35 Blue Gene/Q 64-bit PowerPC A2 processor with 4-way simultaneously multithreaded, running at 1.6 GHz. The processor cores are linked by a crossbar switch to a 32 MB L2 cache. 16 Processor cores are used for computing, and a 17th core for OS. A18th core is used as a redundant spare in case some core is permanently damaged. The 18 core chip has a peak performance of GFLOPS. Blue Gene/Q has performance of 20 PFLOPS (world's fastest supercomputer in June 2012), which draws ca. 7.9 megawatts of power. One of the greenest supercomputers at over 2 GFLOPS/watt at that time (the fastest supercomputer now has 6.2 GFLOPS/watt). Zebo Peng, IDA, LiTH 36 18
19 Parallelizing Computation How to make a single application executing in parallel on a cluster: Paralleling complier: The compiler determines which parts can be executed in parallel. Parallelized application: Applications written from scratch to be parallel. Message passing is used to move data between nodes. Hard to program, but have the best end-results. Parametric computing: An application consists of repeated executions of the same algorithm over different sets of data. Ex. simulation of a system with different scenarios. Needs tools to organize, run, and manage the parallel tasks. Zebo Peng, IDA, LiTH 37 Google Applications Search engines require large amount of computations per request. A single query on Google (on average) reads hundreds of megabytes of data. takes tens of billions of CPU cycles. A peak request stream on Google means thousands of queries per second. requires supercomputer capability to handle. They can be easily parallelized different queries can run on different processors. a single query can also use multiple processors. Zebo Peng, IDA, LiTH 38 19
20 Google Cluster of PCs Cluster of off-the-shelf PCs cheap solution. More than 15,000 nodes form a cluster. Rather than a smaller number of high-end servers or a supercomputer. Important design considerations: Price-performance ratio. Energy efficiency (huge amount of energy is needed). Several clusters available worldwide. To handle catastrophic failures (earthquakes, power failures, etc.) Parametric computing: Same algorithm (PageRank to determine the relative importance of a web page). Different sets of data. Zebo Peng, IDA, LiTH 39 Summary Two most common MIMD architectures are symmetric multiprocessors and clusters. SMP is an example of tightly coupled system with shared memory. Cluster is a loosely coupled system with distributed memory. More recently, non-uniform memory access (NUMA) systems are also becoming popular. More complex in terms of design and operation. Several organization styles can be also integrated into a single system. E.g., SMP as a node inside a NUMA machine. MIMD architecture is highly scalable, therefore they can be build with a huge number of microprocessors. Zebo Peng, IDA, LiTH 40 20
Lecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationChapter 18 Parallel Processing
Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD
More informationComputer Organization. Chapter 16
William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data
More informationLecture 8: RISC & Parallel Computers. Parallel computers
Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer
More informationOrganisasi Sistem Komputer
LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationChapter 18. Parallel Processing. Yonsei University
Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types
More informationParallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationCSCI 4717 Computer Architecture
CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel
More informationParallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam
Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of
More informationComp. Org II, Spring
Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel
More informationParallel Processing & Multicore computers
Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)
More informationComp. Org II, Spring
Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer
More information06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1
Credits:4 1 Understand the Distributed Systems and the challenges involved in Design of the Distributed Systems. Understand how communication is created and synchronized in Distributed systems Design and
More informationComputer Architecture
Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors
More informationWhy Multiprocessors?
Why Multiprocessors? Motivation: Go beyond the performance offered by a single processor Without requiring specialized processors Without the complexity of too much multiple issue Opportunity: Software
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationParallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationLecture 2: Memory Systems
Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2 Internal and External Memories CPU Date transfer
More informationMemory Systems in Pipelined Processors
Advanced Computer Architecture (0630561) Lecture 12 Memory Systems in Pipelined Processors Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interleaved Memory: In a pipelined processor data is required every
More informationConvergence of Parallel Architecture
Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty
More information3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:
BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General
More informationHigh Performance Computing Course Notes HPC Fundamentals
High Performance Computing Course Notes 2008-2009 2009 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationMultiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University
A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationMultiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationMulti-Processor / Parallel Processing
Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms
More informationDistributed Systems. Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology
Distributed Systems Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology Chapter 1: Introduction Distributed Systems Hardware & software Transparency Scalability Distributed
More informationA Multiprocessor system generally means that more than one instruction stream is being executed in parallel.
Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,
More informationNon-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.
CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationMultiprocessors - Flynn s Taxonomy (1966)
Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationLecture 24: Virtual Memory, Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationComputer parallelism Flynn s categories
04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationIntroduction. Distributed Systems. Introduction. Introduction. Instructor Brian Mitchell - Brian
Distributed 1 Directory 1 Cache 1 1 2 Directory 2 Cache 2 2... N Directory N Interconnection Network... Cache N N Instructor Brian Mitchell - Brian bmitchel@mcs.drexel.edu www.mcs.drexel.edu/~bmitchel
More informationIntroduction to Parallel Processing
Babylon University College of Information Technology Software Department Introduction to Parallel Processing By Single processor supercomputers have achieved great speeds and have been pushing hardware
More informationCS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:
CS550 Advanced Operating Systems (Distributed Operating Systems) Instructor: Xian-He Sun Email: sun@iit.edu, Phone: (312) 567-5260 Office hours: 1:30pm-2:30pm Tuesday, Thursday at SB229C, or by appointment
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer
More informationHigh Performance Computing Course Notes Course Administration
High Performance Computing Course Notes 2009-2010 2010 Course Administration Contacts details Dr. Ligang He Home page: http://www.dcs.warwick.ac.uk/~liganghe Email: liganghe@dcs.warwick.ac.uk Office hours:
More informationWhat are Clusters? Why Clusters? - a Short History
What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by
More informationThree basic multiprocessing issues
Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated
More informationLecture 24: Memory, VM, Multiproc
Lecture 24: Memory, VM, Multiproc Today s topics: Security wrap-up Off-chip Memory Virtual memory Multiprocessors, cache coherence 1 Spectre: Variant 1 x is controlled by attacker Thanks to bpred, x can
More informationObjectives of the Course
Objectives of the Course Parallel Systems: Understanding the current state-of-the-art in parallel programming technology Getting familiar with existing algorithms for number of application areas Distributed
More informationParallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence
Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture
More informationDistributed OS and Algorithms
Distributed OS and Algorithms Fundamental concepts OS definition in general: OS is a collection of software modules to an extended machine for the users viewpoint, and it is a resource manager from the
More informationARCHITECTURAL CLASSIFICATION. Mariam A. Salih
ARCHITECTURAL CLASSIFICATION Mariam A. Salih Basic types of architectural classification FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FENG S CLASSIFICATION Handler Classification Other types of architectural
More informationClient Server & Distributed System. A Basic Introduction
Client Server & Distributed System A Basic Introduction 1 Client Server Architecture A network architecture in which each computer or process on the network is either a client or a server. Source: http://webopedia.lycos.com
More informationParallel Processors. Session 1 Introduction
Parallel Processors Session 1 Introduction Applications of Parallel Processors Structural Analysis Weather Forecasting Petroleum Exploration Fusion Energy Research Medical Diagnosis Aerodynamics Simulations
More informationComputer-System Organization (cont.)
Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,
More informationParallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor
Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel
More informationDistributed Systems. 01. Introduction. Paul Krzyzanowski. Rutgers University. Fall 2013
Distributed Systems 01. Introduction Paul Krzyzanowski Rutgers University Fall 2013 September 9, 2013 CS 417 - Paul Krzyzanowski 1 What can we do now that we could not do before? 2 Technology advances
More informationCrossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)
Chapter 2: Concepts and Architectures Computer System Architectures Disk(s) CPU I/O Memory Traditional Computer Architecture Flynn, 1966+1972 classification of computer systems in terms of instruction
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel
More informationOperating Systems Fundamentals. What is an Operating System? Focus. Computer System Components. Chapter 1: Introduction
Operating Systems Fundamentals Overview of Operating Systems Ahmed Tawfik Modern Operating Systems are increasingly complex Operating System Millions of Lines of Code DOS 0.015 Windows 95 11 Windows 98
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationShared Symmetric Memory Systems
Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationDheeraj Bhardwaj May 12, 2003
HPC Systems and Models Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110 016 India http://www.cse.iitd.ac.in/~dheerajb 1 Sequential Computers Traditional
More informationAdvanced Parallel Architecture. Annalisa Massini /2017
Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing
More informationSMD149 - Operating Systems - Multiprocessing
SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction
More informationOverview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy
Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationChapter 1: Introduction. Operating System Concepts 9 th Edit9on
Chapter 1: Introduction Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Objectives To describe the basic organization of computer systems To provide a grand tour of the major
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Ninth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationArchitecture of parallel processing in computer organization
American Journal of Computer Science and Engineering 2014; 1(2): 12-17 Published online August 20, 2014 (http://www.openscienceonline.com/journal/ajcse) Architecture of parallel processing in computer
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationChapter 2: Computer-System Structures. Hmm this looks like a Computer System?
Chapter 2: Computer-System Structures Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure of a computer system and understanding
More informationUniprocessor Computer Architecture Example: Cray T3E
Chapter 2: Computer-System Structures MP Example: Intel Pentium Pro Quad Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure
More informationBİL 542 Parallel Computing
BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message
More information4. Networks. in parallel computers. Advances in Computer Architecture
4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors
More informationMultiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism
Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,
More informationChapter 1: Introduction
Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real -Time Systems Handheld Systems Computing Environments
More informationIan Foster, CS554: Data-Intensive Computing
The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Ian Foster,
More informationChapter 11. Introduction to Multiprocessors
Chapter 11 Introduction to Multiprocessors 11.1 Introduction A multiple processor system consists of two or more processors that are connected in a manner that allows them to share the simultaneous (parallel)
More informationMIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer
MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware
More informationModule 5 Introduction to Parallel Processing Systems
Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationDISTRIBUTED SHARED MEMORY
DISTRIBUTED SHARED MEMORY COMP 512 Spring 2018 Slide material adapted from Distributed Systems (Couloris, et. al), and Distr Op Systems and Algs (Chow and Johnson) 1 Outline What is DSM DSM Design and
More informationCSC 553 Operating Systems
CSC 553 Operating Systems Lecture 1- Computer System Overview Operating System Exploits the hardware resources of one or more processors Provides a set of services to system users Manages secondary memory
More informationSpring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University
18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?
More informationLecture 1: Parallel Architecture Intro
Lecture 1: Parallel Architecture Intro Course organization: ~13 lectures based on textbook ~10 lectures on recent papers ~5 lectures on parallel algorithms and multi-thread programming New topics: interconnection
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Ninth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More information