Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
|
|
- Caren Watts
- 5 years ago
- Views:
Transcription
1 Parallel Computing Platforms Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
2 Elements of a Parallel Computer Hardware Multiple processors Multiple memories Interconnection network System software Parallel operating system Programming constructs to express/orchestrate concurrency Application software Parallel algorithms Goal: utilize the hardware, system and application software to Achieve speedup: T p = T s /p Solve problems requiring a large amount of memory 2
3 Parallel Computing Platform Logical organization The user s view of the machine as it is being presented via its system software Physical organization The actual hardware architecture Physical architecture is to a large extent independent of the logical architecture Ex) message passing on shared memory architecture, distributed shared memory system 3
4 Logical Organization Elements Control mechanism Flynn s taxonomy Single-core processor not covered SISD Single Instruction stream Single Data stream MISD Multiple Instruction stream Single Data stream SIMD Single Instruction stream Multiple Data stream MIMD Multiple Instruction stream Multiple Data stream Multi-core processor 4
5 SIMD vs. MIMD SIMD architecture MIMD architecture 5
6 SIMD Exploit data parallelism The same instruction on multiple data items 16-byte boundaries for (i=0; i<n; i++) a[i] = b[i] + c[i]; SIMD unit b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c vr1 vr2 b0 b1 b2 b3 c0 c1 c2 c3 b0+ b1+ b2+ b3+ c0 c1 c2 c3 vr3 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a
7 SIMD Exploit data parallelism The same instruction on multiple data items SIMD units in processors Supercomputers: BlueGene/Q PC: MMX/SSE/AVX (x86), AltiVec/VMX (PowerPC), Embedded systems: Neon (ARM), VLIW+SIMD DSPs Co-processors: nvidia GPGPU 7
8 MIMD Multiple instructions on multiple data items A collection of independent processing elements (or cores) Usually exploits thread-level parallelism Modern parallel computing platforms SIMD can also work on this system 8
9 Programming Model What programmer uses in coding applications Specifies communication and synchronization Instructions, APIs, defined data structure Programming model examples Shared address space Load/store instructions to access the data for communication Message passing Special system library, APIs for data transmission Data parallel Well-structured data, same operation to multiple data in parallel Implemented with shared address space or message passing 9
10 Shared Address Space Architecture Shared address space Any processor can directly reference any memory location Communication occurs implicitly as result of loads and stores Location transparency (flat address space) Similar programming model to time-sharing on uniprocessors Except processes run on different processors Good throughput on multi-programmed workloads Popularly known as shared memory machine/model Memory may be physically distributed among processors 10
11 Shared Address Space Architecture Multi-Processing One or more thread on a virtual address space Portion of address spaces of processes are shared Writes to shared address visible to other threads/processes Natural extension of uniprocessor model Conventional memory operations for communication Special atomic operations for synchronization Virtual address spaces for a collection of processes communicating via shared addresses Machine physical address space 11
12 x86 Examples Shared Address Space Quad core processors Highly integrated, commodity systems Multiple cores on a chip low-latency, high bandwidth communication via shared cache Core Core Core Core Core Core Shared L2 Cache Core Core Shared L3 Cache Intel i7 (Nehalem) AMD Phenom II (Barcelona) 12
13 Earlier x86 Example Intel Pentium Pro Quad All coherence and multiprocessing glue in processor module High latency and low bandwidth CPU Interrupt controller 256-KB L 2 $ P-Pro module P-Pro module P-Pro module Bus interface P-Pro bus (64-bit data, 36-bit address, 66 MHz) PCI bridge PCI bridge Memory controller PCI I/O cards PCI bus PCI bus MIU 1-, 2-, or 4-way interleaved DRAM 13
14 Shared Address Space Architecture Physical organization Shared memory system Uniform memory access (UMA) Non-uniform memory access (NUMA) Distributed memory system Cluster of shared memory systems Hardware- or software-based distributed shared memory (DSM) UMA system NUMA system Distributed memory system 14
15 Scaling Up M M M Network Network $ $ $ M $ M $ M $ P P P P P P Dance Hall (UMA) Distributed Memory (NUMA) Problem is interconnect - cost (crossbar) or bandwidth (bus) Share memory (uniform memory access, UMA) Latencies to memory uniform, but uniformly large Distributed memory (non-uniform memory access, NUMA) Construct shared address space out of simple message transactions across a general-purpose network Cache: keeps shared data (local, and non-local data in NUMA) 15
16 Example: SGI Altix UV 1000 Scale up to 262,144 cores 16TB shared memory 15 GB/sec links Multistate interconnection network Hardware cache coherence ccnuma 16
17 Parallel Programming Models Shared Address Space Message Passing Data Parallel 17
18 Message Passing Architectures Message passing architectures Complete computer as building block Communication via explicit I/O operations Programming model Directly access only private address space (local memory) Communicate via explicit messages (send/receive) High-level block diagram similar to distributedmemory shared address space system But communication integrated to I/O level, not memory-level Easier to build than scalable SAS 18
19 Message Passing Abstraction Match Receive Y, P, t Send X, Q, t Addr ess Y Addr ess X Local pr ocess addr ess space Local pr ocess addr ess space Pr ocess P Pr ocess Q Message passing Send specifies buffer to be transmitted and receiving process Recv specifies sending process and buffer to receive Can be memory to memory copy, but need to name processes Optional tag on send and matching rule on receive Many overheads: copying, buffer management, protection 19
20 Message Passing Architectures Physical organization Shared memory system Uniform memory access (UMA) Non-uniform memory access (NUMA) Distributed memory system Cluster of shared memory systems UMA system NUMA system Distributed memory system 20
21 Example: IBM Blue Gene/L Nodes: 2 PowerPC 400s Everything (except DRAM) on one chip 21
22 Example: IBM SP-2 Made out of essentially complete RS6000 workstation Network interface integrated in I/O bus Bandwidth limited by I/O bus Power 2 CPU IBM SP-2 node L 2 $ Memory bus General interconnection network formed from 8-port switches Memory controller 4-way interleaved DRAM MicroChannel bus NIC I/O i860 DMA NI DRAM 22
23 Taxonomy of Common Systems Large-scale shared address space and message passing systems Large multiprocessors Shared address space Distributed address space aka message passing Symmetric shared memory (SMP) Ex) IBM eserver, SUN Sunfire Distributed shared memory (DSM) Cache coherent (ccnuma) Commodity clusters Ex) Beowulf, Custom clusters Uniform cluster Ex) SGI Origin/Altix Ex) IBM Blue Gene Non-cache coherent Ex) Cray T3E, X1 Constellation cluster of DSMs or SMPs Ex) SGI Altix, ASC Purple 23
24 Parallel Programming Models Shared Address Space Message Passing Data Parallel 24
25 Data Parallel Systems Programming model Operations performed in parallel on each element of data structure Logically single thread of control Alternate sequential steps and parallel steps Architectural model Array of many simple, cheap processors with little memory each Attached to a control processor that issues instructions Cheap global synchronization Centralize high cost of instruction fetch & sequencing Perfect fit for differential equation solver 25
26 Evolution and Convergence Architecture converge to SAS/DAS architecture Rigid control structure is minus for general purpose Simple, regular app s have good locality, can do well anyway Loss of applicability due to hardwired data parallelism Programming model converges with SPMD Single Program Multiple Data (SPMD) Contributes need for fast global synchronization Can be implemented on either shared address space or message passing systems Same program on different PEs, behavior conditional on thread ID 26
Parallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationConvergence of Parallel Architecture
Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty
More informationParallel Programming Models and Architecture
Parallel Programming Models and Architecture CS 740 September 18, 2013 Seth Goldstein Carnegie Mellon University History Historically, parallel architectures tied to programming models Divergent architectures,
More informationNumber of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization
Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate to solve large problems fast Broad issues involved: Resource Allocation: Number of processing elements
More informationLearning Curve for Parallel Applications. 500 Fastest Computers
Learning Curve for arallel Applications ABER molecular dynamics simulation program Starting point was vector code for Cray-1 145 FLO on Cray90, 406 for final version on 128-processor aragon, 891 on 128-processor
More informationMultiprocessors - Flynn s Taxonomy (1966)
Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The
More informationParallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam
Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of
More informationComputer parallelism Flynn s categories
04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories
More informationProcessor Architecture and Interconnect
Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing
More informationEvolution and Convergence of Parallel Architectures
History Evolution and Convergence of arallel Architectures Historically, parallel architectures tied to programming models Divergent architectures, with no predictable pattern of growth. Todd C. owry CS
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel
More informationECE 669 Parallel Computer Architecture
ECE 669 arallel Computer Architecture Lecture 2 Architectural erspective Overview Increasingly attractive Economics, technology, architecture, application demand Increasingly central and mainstream arallelism
More informationIssues in Multiprocessors
Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationUniprocessor Computer Architecture Example: Cray T3E
Chapter 2: Computer-System Structures MP Example: Intel Pentium Pro Quad Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure
More informationChapter 2: Computer-System Structures. Hmm this looks like a Computer System?
Chapter 2: Computer-System Structures Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure of a computer system and understanding
More informationDr. Joe Zhang PDC-3: Parallel Platforms
CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model
More informationParallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor
Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel
More informationMultiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University
A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor
More informationParallel Architecture. Hwansoo Han
Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationThree basic multiprocessing issues
Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationLecture 9: MIMD Architecture
Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is
More informationParallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?
Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationConvergence of Parallel Architectures
History Historically, parallel architectures tied to programming models Divergent architectures, with no predictable pattern of growth. Systolic Arrays Dataflow Application Software System Software Architecture
More informationComputer Architecture
Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationLect. 2: Types of Parallelism
Lect. 2: Types of Parallelism Parallelism in Hardware (Uniprocessor) Parallelism in a Uniprocessor Pipelining Superscalar, VLIW etc. SIMD instructions, Vector processors, GPUs Multiprocessor Symmetric
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationChapter 1: Perspectives
Chapter 1: Perspectives Copyright @ 2005-2008 Yan Solihin Copyright notice: No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical,
More informationMultiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.
Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationParallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence
Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationInterconnection Network
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics
More informationCache Coherence in Bus-Based Shared Memory Multiprocessors
Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationInterconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection
More informationOnline Course Evaluation. What we will do in the last week?
Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do
More informationModule 5 Introduction to Parallel Processing Systems
Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationECE5610/CSC6220 Models of Parallel Computers. Recap: What is Parallel Computer?
ECE5610/CSC6220 Models of Parallel Computers Professor Cheng-Zhong Xu Department of Electrical/Computer Engineering Wayne State University Recap: What is Parallel Computer? A parallel computer is a collection
More informationCMSC 611: Advanced. Parallel Systems
CMSC 611: Advanced Computer Architecture Parallel Systems Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationThe Cache-Coherence Problem
The -Coherence Problem Lecture 12 (Chapter 6) 1 Outline Bus-based multiprocessors The cache-coherence problem Peterson s algorithm Coherence vs. consistency Shared vs. Distributed Memory What is the difference
More informationCS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics
CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationCOSC4201 Multiprocessors
COSC4201 Multiprocessors Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Multiprocessing We are dedicating all of our future product development to multicore
More informationWhat is a parallel computer?
7.5 credit points Power 2 CPU L 2 $ IBM SP-2 node Instructor: Sally A. McKee General interconnection network formed from 8-port switches Memory bus Memory 4-way interleaved controller DRAM MicroChannel
More informationMulti-core Programming - Introduction
Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationOrganisasi Sistem Komputer
LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple
More informationMultiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism
Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,
More informationCS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Introduction (Chapter 1)
CS/ECE 757: Advanced Computer Architecture II (arallel Computer Architecture) Introduction (Chapter 1) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived from work by Sarita
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationMultiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
More informationComputer Organization. Chapter 16
William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationAleksandar Milenkovich 1
Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More informationCS Parallel Algorithms in Scientific Computing
CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan
More informationLecture 24: Memory, VM, Multiproc
Lecture 24: Memory, VM, Multiproc Today s topics: Security wrap-up Off-chip Memory Virtual memory Multiprocessors, cache coherence 1 Spectre: Variant 1 x is controlled by attacker Thanks to bpred, x can
More informationParallel Computers. c R. Leduc
Parallel Computers Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Why Parallel Computing?
More informationCDA3101 Recitation Section 13
CDA3101 Recitation Section 13 Storage + Bus + Multicore and some exam tips Hard Disks Traditional disk performance is limited by the moving parts. Some disk terms Disk Performance Platters - the surfaces
More informationOverview. Processor organizations Types of parallel machines. Real machines
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments
More informationSMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems
Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]
More informationSMD149 - Operating Systems - Multiprocessing
SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction
More informationOverview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy
Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationCOSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University
COSC4201 Multiprocessors and Thread Level Parallelism Prof. Mokhtar Aboelaze York University COSC 4201 1 Introduction Why multiprocessor The turning away from the conventional organization came in the
More informationLecture 24: Virtual Memory, Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large
More informationIntroduction to parallel computing
Introduction to parallel computing 2. Parallel Hardware Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Motherboard Processor https://sites.google.com/
More informationNon-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.
CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says
More informationAleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture 18: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Parallel Computers Definition: A parallel computer is a collection
More informationComp. Org II, Spring
Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel
More informationFall 2011 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic
Fall 2011 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic Flynn s Taxonomy of Parallel Machines How many Instruction streams? How many Data streams? SISD: Single I Stream, Single D Stream A uniprocessor
More informationParallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationParallel Processing & Multicore computers
Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)
More information18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor
More informationCOMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University
COMP4300/8300: Overview of Parallel Hardware Alistair Rendell COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University 2.1 Lecture Outline Review of Single Processor Design So we talk
More informationComp. Org II, Spring
Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer
More informationCS650 Computer Architecture. Lecture 10 Introduction to Multiprocessors and PC Clustering
CS650 Computer Architecture Lecture 10 Introduction to Multiprocessors and PC Clustering Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 10: Intro to Multiprocessors/Clustering
More informationDheeraj Bhardwaj May 12, 2003
HPC Systems and Models Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110 016 India http://www.cse.iitd.ac.in/~dheerajb 1 Sequential Computers Traditional
More informationChapter 11. Introduction to Multiprocessors
Chapter 11 Introduction to Multiprocessors 11.1 Introduction A multiple processor system consists of two or more processors that are connected in a manner that allows them to share the simultaneous (parallel)
More informationModule 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:
The Lecture Contains: Four Organizations Hierarchical Design Cache Coherence Example What Went Wrong? Definitions Ordering Memory op Bus-based SMP s file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture10/10_1.htm[6/14/2012
More informationChapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST
Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will
More informationWhat are Clusters? Why Clusters? - a Short History
What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by
More informationCSE 392/CS 378: High-performance Computing - Principles and Practice
CSE 392/CS 378: High-performance Computing - Principles and Practice Parallel Computer Architectures A Conceptual Introduction for Software Developers Jim Browne browne@cs.utexas.edu Parallel Computer
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More information