COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University

Size: px
Start display at page:

Download "COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University"

Transcription

1 COMP4300/8300: Overview of Parallel Hardware Alistair Rendell COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University

2 2.1 Lecture Outline Review of Single Processor Design So we talk the same language Many things happen in parallel even on a single processor Identify potential issues for parallel hardware Why use 2 CPUs if you can double the speed on one! Multiple Processor Design Hardware models Shared/Distributed memory Hierarchical/flat memory Dynamic/static processor connectivity Evaluating static networks Routing mechanisms COMP4300/8300 Lecture 2-2 Copyright c 2015 The Australian National University

3 2.2 The Processor Performs: Floating point operations (FLOPS) - add, mult, division (sqrt maybe!) Integer operations (MIPS) - adds etc, also logical ops and instruction processing MIPS: Machine Instructions Per Second on very old VAX 100/780 Anyway, what is a machine instruction on different CPUs! Our primary focus will be in floating point operations Clock: All ops take a fixed number of clock ticks to complete Clock speed is measured in GHz (10 9 cycles/second) or nsec (10 9 seconds) Apple iphone 6 ARM A8 1.4GHz (0.71ns), NCI Raijin Intel Xeon Sandy Bridge 2.6GHz (0.38ns), IBM zec12 processor 5.5Ghz (0.18ns) Clock limited by etching+speed of light, hence motivates parallel (duo systems) (To my knowledge) IBM zec12 is fastest commodity processor at 5.5GHz Light travels about 10cm in.32ns, chip is a few cm! COMP4300/8300 Lecture 2-3 Copyright c 2015 The Australian National University

4 2.3 Processor Performance FLOPS/Sec Prefix Occurrence 10 3 kilo very badly written code 10 6 mega badly written code 10 9 giga single-core tera multiple chip (NCI) peta 23 machines in Top500 (Nov 2012, measured) exa around 2020! PC 2.5GHz Core2 Quad, 4(core)*4(ops)*2.5GHz 40GF Bunyip Pentium III, 96(nodes)*2(sockets)*1(op)*550MHz 105GF NCI Raijin 3592(nodes)*2(sockets)*8(core)*8(ops)*2.6GHz 1.19PF COMP4300/8300 Lecture 2-4 Copyright c 2015 The Australian National University

5 2.4 Adding Numbers Consider adding two double precision (8 byte) numbers ± Exponent Mantissa Possible Steps Determine largest exponent Normalize smaller exponent to the larger Add mantissas Renormalize the mantissa and exponent of the result Multiple steps each taking 1 tick implies 4 ticks per addition (FLOP) COMP4300/8300 Lecture 2-5 Copyright c 2015 The Australian National University

6 2.5 Pipeline Operations#1 Step in Pipeline Waiting Done X(6) X(5) X(4) X(3) X(2) X(1) X(1) takes 4 ticks to appear (startup latency) X(2) appears 1 tick after X(1) Asymptotically achieve 1 result per clock tick Operation is said to be pipelined Steps in the pipeline are running in parallel COMP4300/8300 Lecture 2-6 Copyright c 2015 The Australian National University

7 2.6 Pipeline Operations#2 Requires same op consecutively on different (independent) data items good for vector operations note limitations on chaining output data to input Tendency to increase number of stages in pipeline if each stage can run faster More stages in a pipeline the greater the startup latency UltraSPARC II has 9 stage pipeline, UltraSPARC III has a 14 stage pipeline Prescott Pentium 4 processor had a 31 stage pipeline Not all operations are pipelined, eg integer multiplication, division, sqrt Clock cycles for different operations on Alpha EV6 Operation Latency Repeat +,-,* 4 1 / sqrt COMP4300/8300 Lecture 2-7 Copyright c 2015 The Australian National University

8 2.7 Instruction Parallelism Processor issues multiple instructions per clock cycle that are executed in parallel on different parts of the chip hardware Grouping rules: restriction on what can be done in parallel, eg UltraSPARC: 4 from 2*Floating, 2*Integer, 1*load/store, 1*branch Input 1 Input2 Input 3 Multiply unit Addition unit Result Pentium III single FLOP per cycle Opteron, UltraSPARC and Alpha 2 (different) FLOPs per cycle Core2, Itanium2 and IBM Power5 4 (DP) FLOPs per cycle Xeon Sandy Bridge 8 (DP) FLOPs per cycle COMP4300/8300 Lecture 2-8 Copyright c 2015 The Australian National University

9 2.8 Memory Structure Consider DAXPY: Y(i) = X(i)+Y(i) If theoretically the CPU can perform the 2 FLOPs in 1 cycle Memory must deliver (load) two floats (X(i) and Y(i) or 16 bytes) and store one (Y(i) 8bytes) each clock cycle On 1GHz system this implies 1.0*16GB/sec 16GB/sec load traffic and 8GB/sec store Typically A processor core can only issue one load OR store instruction in a clock cycle DDR3-SDRAM memory is available clocked at 1066MHz with access times accordingly Memory Latency and Bandwidth are critical performance issues Caches: reduce latency and provide improved cache to CPU bandwidth Memory banks: improve bandwidth COMP4300/8300 Lecture 2-9 Copyright c 2015 The Australian National University

10 2.9 Cache Main Memory Cache CPU Registers large cheap memory large latency/small bandwidth small fast expensive memory lower latency/higher bandwidth Memory hierarchy or Non-Uniform Memory Access (NUMA) Cache Hit - data in cache and received in a few cycles Cache Miss - data fetched from main memory (or higher level cache) Try to ensure data is in cache (or as close to the CPU as possible) Can we block algorithm to minimize memory traffic Cache is effective because algorithms often use data that are close in memory. (Note duplication of data in cache will have implications for parallel systems!) COMP4300/8300 Lecture 2-10 Copyright c 2015 The Australian National University

11 2.10 Cache Mapping Blocks of main memory are mapped to a cache line Cache line typically bytes wide Mapping may be direct, or n-way associative Entire cache line is fetched from memory not just one element Structure code to try and use an entire cache line of data Best to have unit stride Pointer chasing is very bad Cache Main Memory Mapping Line 1 1 or 3 2 or 4 1 or 3 2 or 4 Line 2 1 or 3 2 or 4 1 or 3 2 or 4 Line 3 1 or 3 2 or 4 1 or 3 2 or 4 Line 4 1 or 3 2 or 4 1 or 3 2 or 4 1 or 3 2 or 4 1 or 3 2 or 4 COMP4300/8300 Lecture 2-11 Copyright c 2015 The Australian National University

12 2.11 Memory Banks Memory bandwidth improved by having multiple parallel paths to/from memory Bank 1 Bank 2 Bank 3 Bank 4 C P U Traditional solution used by vector processors High initial latency Good performance for unit stride Very bad performance if bank conflict COMP4300/8300 Lecture 2-12 Copyright c 2015 The Australian National University

13 2.12 Going Parallel Inevitably the performance of a single processor is limited by the clock speed Improved manufacturing increases clock but ultimately limited by speed of light Superscalar allows multiple ops at once, but not always applicable It s time to go parallel Hardware Issues Flynn s Taxonomy of parallel processors SIMD/MIMD Shared/distributed memory Hierarchical/flat memory Dynamic/static processor connectivity Characteristics of static networks COMP4300/8300 Lecture 2-13 Copyright c 2015 The Australian National University

14 2.13 Architecture Classification: Flynn s Taxonomy Why classify: What kind of parallelism is employed? Which architecture has the best prospect for the future? What has already been achieved by current architecture types? Reveal configurations that have not yet considered by system architect. Enable building of performance models. Flynn s taxonomy is based on the degree of parallelism, with 4 categories determined according to the number of instruction and data streams Single Data Stream Multiple Single SISD SIMD Instruction 1CPU Array/Vector Processor Stream Multiple MISD MIMD (Pipelined?) Multiple Processor COMP4300/8300 Lecture 2-14 Copyright c 2015 The Australian National University

15 2.14 SIMD and MIMD SIMD: Single Instruction Multiple Data Also know as data parallel processors or array processors Vector processors (to some extent) Current examples include SSE instructions, SPEs on CellBE, GPUs NVIDIA s SIMT (T = Threads) is slight variation MIMD: Multiple Instruction Multiple Data Examples include quad-core PC, octa-core Xeons on Raijin Global Control Unit CPU CPU CPU CPU CPU and Control CPU and Control CPU and Control CPU and Control I N T E R C O N N E C T I N T E R C O N N E C T S I M D M I M D COMP4300/8300 Lecture 2-15 Copyright c 2015 The Australian National University

16 2.15 MIMD Most successful parallel model More general purpose than SIMD (eg CM5 could emulate CM2) Harder to program, as processors are not synchronized at the instruction level Design issues for MIMD machines Scheduling: efficient allocation of processors to tasks in a dynamic fashion Synchronization: prevent processors accessing data simultaneously Interconnection design: processor to memory and processor to processor interconnects. Also I/O network - often processors dedicated to I/O devices Overhead: inevitably there is some overhead associated with coordinating activities between processors, eg resolve contention for resources Partitioning: identifying parallelism in processing algorithms that is capable of exploiting concurrent processing streams is non-trivial (Aside SPMD Single Program Multiple Data: more restrictive than MIMD, implying that all processors run the same executable. Simplifies use of shared address space.) COMP4300/8300 Lecture 2-16 Copyright c 2015 The Australian National University

17 2.16 Address Space Organization: Message Passing Each processor has local or private memory Interact solely by message passing Commonly known as distributed memory machines Memory bandwidth scales with number of processors Examples, between nodes on NCI Raijin System I N T E R C O N N E C T PROCESSOR PROCESSOR PROCESSOR PROCESSOR MEMORY MEMORY MEMORY MEMORY COMP4300/8300 Lecture 2-17 Copyright c 2015 The Australian National University

18 2.17 Address Space Organization: Shared Address Space Processors interact by modifying data objects stored in a shared address space Flat uniform memory access (UMA) Scalability of memory bandwidth and processor-processor communications a problem Example, dual/quad core PC (ignoring cache) MEMORY MEMORY MEMORY MEMORY I N T E R C O N N E C T PROCESSOR PROCESSOR PROCESSOR PROCESSOR COMP4300/8300 Lecture 2-18 Copyright c 2015 The Australian National University

19 2.18 Non-Uniform Memory Access (NUMA) Machine includes some hierarchy in memory structure All memory local to the programmer (single address space), but some memory takes longer to access than others Cache introduces one level of NUMA Between sockets on NCI Raijin system or in a multisocket Opteron systems MEMORY MEMORY MEMORY MEMORY I N T E R C O N N E C T I N T E R C O N N E C T MEMORY MEMORY MEMORY MEMORY Cache Cache Cache Cache Cache Cache Cache Cache PROCESSOR PROCESSOR PROCESSOR PROCESSOR PROCESSOR PROCESSOR PROCESSOR PROCESSOR COMP4300/8300 Lecture 2-19 Copyright c 2015 The Australian National University

20 2.19 Shared Address Space Access Parallel Random Access Machine (PRAM): any shared memory machine What happens when multiple processors try to read/write to the same memory location at the same time PRAM models Exclusive-read, exclusive-write (EREW) PRAM Concurrent-read, exclusive-write (CREW) PRAM Exclusive-read, concurrent-write (ERCW) PRAM Concurrent-read, concurrent-write (CRCW) PRAM Concurrent read OK, but write requires arbitration: Common: allowed if all values being written are identical Arbitrary: an arbitrary processor is allowed to proceed and the rest fail Priority: processors are organized into a predefined prioritized list, process with highest priority succeeds the rest fail Sum: the sum of all quantities is written COMP4300/8300 Lecture 2-20 Copyright c 2015 The Australian National University

21 2.20 Dynamic Processor Connectivity: Crossbar Non-blocking network in that connection of two processors does not block connection between other processors Complexity grows as O(p 2 ) May be used to connect processors with its own local memory Processor and Memory Processor and Memory Processor and Memory Processor and Memory Processor and Memory Processor and Memory Processor and Memory Processor and Memory COMP4300/8300 Lecture 2-21 Copyright c 2015 The Australian National University

22 2.21 Dynamic Processor Connectivity: Multistaged Networks Processors S W I T C H I N G N E T W O R K Memory O M E G A N E T W O R K Consist of log 2 p stages, where p is the number of processors s and t are binary representation of message source and destination at stage 1 Route through if most significant bits of s and t are the same Crossover if most significant bits of s and t are different Process repeated for next stage using the next most significant bit etc COMP4300/8300 Lecture 2-22 Copyright c 2015 The Australian National University

23 2.22 Dynamic Processor Connectivity: Bus Processor gains exclusive access to bus for some period Performance of BUS limits scalability MEMORY MEMORY MEMORY MEMORY B U S Cache Cache Cache Cache PROCESSOR PROCESSOR PROCESSOR PROCESSOR Performance Cost Cross Bar > Multistage > Bus Cross Bar > Multistage > Bus COMP4300/8300 Lecture 2-23 Copyright c 2015 The Australian National University

24 2.23 Static Processor Connectivity: Complete, Mesh, Tree Completely Connected (becomes very complex!) Linear Array/Ring, Mesh/2D Torus Tree (static if nodes are processors) Switches Processors COMP4300/8300 Lecture 2-24 Copyright c 2015 The Australian National University

25 2.24 Static Processor Connectivity: Hypercube Multidimensional mesh with exactly two processors in each dimension p = 2 d where d is the dimension of the hypercube Disadvantage is the number of connections per processor increases rapidly Examples: Intel ipsc Hypercube, NCube & SGI Origin COMP4300/8300 Lecture 2-25 Copyright c 2015 The Australian National University

26 2.25 Static Processor Connectivity: Hypercube Characteristics Two processors connected directly ONLY IF binary labels differ by one bit In a d-dimensional hypercube each processor directly connects to d others d-dimensional hypercube can be partitioned into two (d-1) subcubes etc The number of links in the shortest path between two processors is the Hamming distance between their labels The Hamming distance between two processor labeled s and t is the number of bits that are on in the binary representation of s t where is the bitwise exclusive or operation (eg 3 for and 2 for ) COMP4300/8300 Lecture 2-26 Copyright c 2015 The Australian National University

27 2.26 Evaluating Static Interconnection Networks#1 Diameter The maximum distance between any two processors in the network Diameter directly determines communication time Connectivity The multiplicity of paths between any two processors High connectivity desirable as it minimizes contention Arch connectivity of the network: the minimum number of arcs that must be removed for the network to break it into two disconnected networks 1 for linear arrays and binary trees 2 for rings and 2-D meshes 4 for 2-D torus d for d-dimensional hypercubes COMP4300/8300 Lecture 2-27 Copyright c 2015 The Australian National University

28 2.27 Evaluating Static Interconnection Networks#2 Channel width The number of bits that can be communicated simultaneously over a link connecting two processors Bisection Width and Bandwidth Width is the minimum number of communication links that have to be removed to partition the network into two equal halves Bandwidth is the minimum volume of communication allowed between two halves of the network with equal numbers of processors Cost Many criteria can be used, we will use the number of communication links or wires required by the network. COMP4300/8300 Lecture 2-28 Copyright c 2015 The Australian National University

29 2.28 Summary Static Interconnection Characteristics Bisection Arc Cost Network Diameter Width Connectivity (No of Links) Completely connected 1 p 2 /4 p 1 p(p 1)/2 Binary Tree 2log 2 ((p+1)/2) 1 1 p 1 Linear array p p 1 Ring p/2 2 2 p 2-D Mesh 2( p 1) p 2 2(p p) 2-D Torus 2 p/2 2 p 4 2p Hypercube log 2 p p/2 log 2 p (plog 2 p)/2 COMP4300/8300 Lecture 2-29 Copyright c 2015 The Australian National University

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell COMP4300/8300: Overview of Parallel Hardware Alistair Rendell COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University 2.2 The Performs: Floating point operations (FLOPS) - add, mult,

More information

Processor Performance. Overview: Classical Parallel Hardware. The Processor. Adding Numbers. Review of Single Processor Design

Processor Performance. Overview: Classical Parallel Hardware. The Processor. Adding Numbers. Review of Single Processor Design Overview: Classical Parallel Hardware Processor Performance Review of Single Processor Design so we talk the same language many things happen in parallel even on a single processor identify potential issues

More information

Overview: Classical Parallel Hardware

Overview: Classical Parallel Hardware Overview: Classical Parallel Hardware Review of Single Processor Design so we talk the same language many things happen in parallel even on a single processor identify potential issues for parallel hardware

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

CSC630/CSC730: Parallel Computing

CSC630/CSC730: Parallel Computing CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued) Cluster Computing Dichotomy of Parallel Computing Platforms (Continued) Lecturer: Dr Yifeng Zhu Class Review Interconnections Crossbar» Example: myrinet Multistage» Example: Omega network Outline Flynn

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

Design of Parallel Algorithms. The Architecture of a Parallel Computer

Design of Parallel Algorithms. The Architecture of a Parallel Computer + Design of Parallel Algorithms The Architecture of a Parallel Computer + Trends in Microprocessor Architectures n Microprocessor clock speeds are no longer increasing and have reached a limit of 3-4 Ghz

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

Parallel Architectures

Parallel Architectures Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s

More information

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport Chapter 2: Architecture of Parallel Computers Hardware Software 2.1.1 Flynn s taxonomy Single-instruction

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

COMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory

COMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory COMP 633 - Parallel Computing Lecture 6 September 6, 2018 SMM (1) Memory Hierarchies and Shared Memory 1 Topics Memory systems organization caches and the memory hierarchy influence of the memory hierarchy

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2 CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Types of Parallel Computers

Types of Parallel Computers slides1-22 Two principal types: Types of Parallel Computers Shared memory multiprocessor Distributed memory multicomputer slides1-23 Shared Memory Multiprocessor Conventional Computer slides1-24 Consists

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

SHARED MEMORY VS DISTRIBUTED MEMORY

SHARED MEMORY VS DISTRIBUTED MEMORY OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors

More information

TDT4260/DT8803 COMPUTER ARCHITECTURE EXAM

TDT4260/DT8803 COMPUTER ARCHITECTURE EXAM Norwegian University of Science and Technology Department of Computer and Information Science Page 1 of 13 Contact: Magnus Jahre (952 22 309) TDT4260/DT8803 COMPUTER ARCHITECTURE EXAM Monday 4. June Time:

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

Advanced Computer Architecture. The Architecture of Parallel Computers

Advanced Computer Architecture. The Architecture of Parallel Computers Advanced Computer Architecture The Architecture of Parallel Computers Computer Systems No Component Can be Treated In Isolation From the Others Application Software Operating System Hardware Architecture

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

CS Parallel Algorithms in Scientific Computing

CS Parallel Algorithms in Scientific Computing CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

Copyright 2010, Elsevier Inc. All rights Reserved

Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 2 Parallel Hardware and Parallel Software 1 Roadmap Some background Modifications to the von Neumann model Parallel hardware Parallel software

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2 Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

Physical Organization of Parallel Platforms. Alexandre David

Physical Organization of Parallel Platforms. Alexandre David Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #4 1/24/2018 Xuehai Qian xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Announcements PA #1

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Introduction to parallel computing

Introduction to parallel computing Introduction to parallel computing 2. Parallel Hardware Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Motherboard Processor https://sites.google.com/

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348 Chapter 1 Introduction: Part I Jens Saak Scientific Computing II 7/348 Why Parallel Computing? 1. Problem size exceeds desktop capabilities. Jens Saak Scientific Computing II 8/348 Why Parallel Computing?

More information

Chapter 2: Parallel Programming Platforms

Chapter 2: Parallel Programming Platforms Chapter 2: Parallel Programming Platforms Introduction to Parallel Computing, Second Edition By Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar Contents Implicit Parallelism: Trends in Microprocessor

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

Multi-Processor / Parallel Processing

Multi-Processor / Parallel Processing Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms

More information

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Parallel Systems Prof. James L. Frankel Harvard University Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Architectures SISD (Single Instruction, Single Data)

More information

Interconnection Network

Interconnection Network Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

Tools and techniques for optimization and debugging. Fabio Affinito October 2015

Tools and techniques for optimization and debugging. Fabio Affinito October 2015 Tools and techniques for optimization and debugging Fabio Affinito October 2015 Fundamentals of computer architecture Serial architectures Introducing the CPU It s a complex, modular object, made of different

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection

More information

Portland State University ECE 588/688. Cray-1 and Cray T3E

Portland State University ECE 588/688. Cray-1 and Cray T3E Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector

More information

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K. Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing

More information

Shared-Memory Hardware

Shared-Memory Hardware Shared-Memory Hardware Parallel Programming Concepts Winter Term 2013 / 2014 Dr. Peter Tröger, M.Sc. Frank Feinbube Shared-Memory Hardware Hardware architecture: Processor(s), memory system(s), data path(s)

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

Objectives of the Course

Objectives of the Course Objectives of the Course Parallel Systems: Understanding the current state-of-the-art in parallel programming technology Getting familiar with existing algorithms for number of application areas Distributed

More information

Processor Performance and Parallelism Y. K. Malaiya

Processor Performance and Parallelism Y. K. Malaiya Processor Performance and Parallelism Y. K. Malaiya Processor Execution time The time taken by a program to execute is the product of n Number of machine instructions executed n Number of clock cycles

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

CS/COE1541: Intro. to Computer Architecture

CS/COE1541: Intro. to Computer Architecture CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

High Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA

High Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA High Performance Computing Leopold Grinberg T. J. Watson IBM Research Center, USA High Performance Computing Why do we need HPC? High Performance Computing Amazon can ship products within hours would it

More information

Model Questions and Answers on

Model Questions and Answers on BIJU PATNAIK UNIVERSITY OF TECHNOLOGY, ODISHA Model Questions and Answers on PARALLEL COMPUTING Prepared by, Dr. Subhendu Kumar Rath, BPUT, Odisha. Model Questions and Answers Subject Parallel Computing

More information

Chapter 4 Data-Level Parallelism

Chapter 4 Data-Level Parallelism CS359: Computer Architecture Chapter 4 Data-Level Parallelism Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University 1 Outline 4.1 Introduction 4.2 Vector Architecture

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

Message Passing and Network Fundamentals ASD Distributed Memory HPC Workshop

Message Passing and Network Fundamentals ASD Distributed Memory HPC Workshop Message Passing and Network Fundamentals ASD Distributed Memory HPC Workshop Computer Systems Group Research School of Computer Science Australian National University Canberra, Australia October 30, 2017

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Chapter Seven. Idea: create powerful computers by connecting many smaller ones Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:

More information

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically

More information

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms Complexity and Advanced Algorithms Introduction to Parallel Algorithms Why Parallel Computing? Save time, resources, memory,... Who is using it? Academia Industry Government Individuals? Two practical

More information

Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory

Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Parallel Machine 1 CPU Usage Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Solution Use multiple CPUs or multiple ALUs For simultaneous

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11

More information

! An alternate classification. Introduction. ! Vector architectures (slides 5 to 18) ! SIMD & extensions (slides 19 to 23)

! An alternate classification. Introduction. ! Vector architectures (slides 5 to 18) ! SIMD & extensions (slides 19 to 23) Master Informatics Eng. Advanced Architectures 2015/16 A.J.Proença Data Parallelism 1 (vector, SIMD ext., GPU) (most slides are borrowed) Instruction and Data Streams An alternate classification Instruction

More information

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance

More information