Advanced Computer Architecture. The Architecture of Parallel Computers

Similar documents
ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

Parallel Processors. Session 1 Introduction

Types of Parallel Computers

BlueGene/L (No. 4 in the Latest Top500 List)

PIPELINE AND VECTOR PROCESSING

Chapter 2 Parallel Computer Models & Classification Thoai Nam

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Chapter 11. Introduction to Multiprocessors

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

Lecture 7: Parallel Processing

Unit 9 : Fundamentals of Parallel Processing

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Architectures of Flynn s taxonomy -- A Comparison of Methods

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 7: Parallel Processing

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Issues in Multiprocessors

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD

Module 5 Introduction to Parallel Processing Systems

Issues in Multiprocessors

COSC 6385 Computer Architecture - Multi Processor Systems

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Dr. Joe Zhang PDC-3: Parallel Platforms

Computer Systems Architecture

Computer Systems Architecture

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

Chapter 18 Parallel Processing

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Instruction Register. Instruction Decoder. Control Unit (Combinational Circuit) Control Signals (These signals go to register) The bus and the ALU

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell

Processor Architecture and Interconnect

Top500 Supercomputer list

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

Parallel computer architecture classification

Outline Marquette University

COMP 308 Parallel Efficient Algorithms. Course Description and Objectives: Teaching method. Recommended Course Textbooks. What is Parallel Computing?

Computer Architecture

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

RISC Processors and Parallel Processing. Section and 3.3.6

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Classification of Parallel Architecture Designs

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

Chap. 4 Multiprocessors and Thread-Level Parallelism

Parallel Computing: Parallel Architectures Jin, Hai

Multiprocessors & Thread Level Parallelism

Fundamentals of Computers Design

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Chapter 4 Data-Level Parallelism

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Fundamentals of Computer Design

RAID 0 (non-redundant) RAID Types 4/25/2011

Lecture 1: Introduction

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Introduction to parallel computing

Processor Performance and Parallelism Y. K. Malaiya

Introduction to Parallel Programming

An Introduction to Parallel Architectures

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

WHY PARALLEL PROCESSING? (CE-401)

Chapter 5. Thread-Level Parallelism

Intro to Multiprocessors

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Parallel Models. Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD

High Performance Computing in C and C++

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

CSCI 4717 Computer Architecture

Chapter 18. Parallel Processing. Yonsei University

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Computer parallelism Flynn s categories

PREPARED BY: S.SAKTHI, AP/IT

Introduction. EE 4504 Computer Organization

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

PIPELINING AND VECTOR PROCESSING

Objectives of the Course

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

2 MARKS Q&A 1 KNREDDY UNIT-I

THREAD LEVEL PARALLELISM

Lecture 9: MIMD Architecture

Parallel Architectures

Multi-Processor / Parallel Processing

Designing for Performance. Patrick Happ Raul Feitosa

Copyright 2012, Elsevier Inc. All rights reserved.

Advanced Parallel Architecture. Annalisa Massini /2017

Lect. 2: Types of Parallelism

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Why Multiprocessors?

Flynn s Classification

EC 513 Computer Architecture

Multicores, Multiprocessors, and Clusters

Parallel Processing: An Insight into Architectural Modeling and Efficiency

Transcription:

Advanced Computer Architecture The Architecture of Parallel Computers

Computer Systems No Component Can be Treated In Isolation From the Others Application Software Operating System Hardware Architecture

Hardware Issues Number and Type of Processors Processor Control Memory Hierarchy I/O devices and Peripherals Operating System Support Applications Software Compatibility

Operating System Issues Allocating and Managing Resources Access to Hardware Features Multi-Processing Multi-Threading I/O Management Access to Peripherals Efficiency

Applications Issues Compiler/Linker Support Programmability OS/Hardware Feature Availability Compatibility Parallel Compilers Preprocessor Precompiler Parallelizing Compiler

Architecture Evolution Scalar Architecture Prefetch Fetch/Execute Overlap Multiple Functional Units Pipelining Vector Processors Lock-Step Processors Multi-Processor

Flynn s Classification Consider Instruction Streams and Data Streams Separately. SISD - Single Instruction, Single Data Stream SIMD - Single Instruction, Multiple Data Streams MIMD - Multiple Instruction, Multiple Data Streams. MISD - (rare) Multiple Instruction, Single Data Stream

SISD Conventional Computers. Pipelined Systems Multiple-Functional Unit Systems Pipelined Vector Processors Includes most computers encountered in everyday life

SIMD Multiple Processors Execute a Single Program Each Processor operates on its own data Vector Processors Array Processors PRAM Theoretical Model

MIMD Multiple Processors cooperate on a single task Each Processor runs a different program Each Processor operates on different data Many Commercial Examples Exist

MISD A Single Data Stream passes through multiple processors Different operations are triggered on different processors Systolic Arrays Wave-Front Arrays

Programming Issues Parallel Computers are Difficult to Program Automatic Parallelization Techniques are only Partially Successful Programming languages are few, not well supported, and difficult to use. Parallel Algorithms are difficult to design.

Performance Issues Clock Rate / Cycle Time = τ Cycles Per Instruction (Average) = CPI Instruction Count = I c Time, T = I c CPI τ p = Processor Cycles, m = Memory Cycles, k = Memory/Processor cycle ratio T = I c (p + m k) τ

Performance Issues II Ic & p affected by processor design and compiler technology. m affected mainly by compiler technology τ affected by processor design k affected by memory hierarchy structure and design

Other Measures MIPS rate - Millions of instructions per second Clock Rate for similar processors MFLOPS rate - Millions of floating point operations per second. These measures are not neccessarily directly comparable between different types of processors.

Parallelizing Code Implicitly Write Sequential Algorithms Use a Parallelizing Compiler Rely on compiler to find parallelism Explicitly Design Parallel Algorithms Write in a Parallel Language Rely on Human to find Parallelism

Multi-Processors Multi-Processors generally share memory, while multi-computers do not. Uniform memory model Non-Uniform Memory Model Cache-Only MIMD Machines

Multi-Computers Independent Computers that Don t Share Memory. Connected by High-Speed Communication Network More tightly coupled than a collection of independent computers Cooperate on a single problem

Vector Computers Independent Vector Hardware May be an attached processor Has both scalar and vector instructions Vector instructions operate in highly pipelined mode Can be Memory-to-Memory or Register-to- Register

SIMD Computers One Control Processor Several Processing Elements All Processing Elements execute the same instruction at the same time Interconnection network between PEs determines memory access and PE interaction

The PRAM Model SIMD Style Programming Uniform Global Memory Local Memory in Each PE Memory Conflict Resolution CRCW - Common Read, Common Write CREW - Common Read, Exclusive Write EREW - Exclusive Read, Exclusive Write ERCW - (rare) Exclusive Read, Common Write

The VLSI Model Implement Algorithm as a mostly combinational circuit Determine the area required for implementation Determine the depth of the circuit

Advanced Computer Architecture The Architecture of Parallel Computers

Computer Systems No Component Can be Treated In Isolation From the Others Application Software Operating System Hardware Architecture

Hardware Issues Number and Type of Processors Processor Control Memory Hierarchy I/O devices and Peripherals Operating System Support Applications Software Compatibility

Operating System Issues Allocating and Managing Resources Access to Hardware Features Multi-Processing Multi-Threading I/O Management Access to Peripherals Efficiency

Applications Issues Compiler/Linker Support Programmability OS/Hardware Feature Availability Compatibility Parallel Compilers Preprocessor Precompiler Parallelizing Compiler

Architecture Evolution Scalar Architecture Prefetch Fetch/Execute Overlap Multiple Functional Units Pipelining Vector Processors Lock-Step Processors Multi-Processor

Flynn s Classification Consider Instruction Streams and Data Streams Separately. SISD - Single Instruction, Single Data Stream SIMD - Single Instruction, Multiple Data Streams MIMD - Multiple Instruction, Multiple Data Streams. MISD - (rare) Multiple Instruction, Single Data Stream

SISD Conventional Computers. Pipelined Systems Multiple-Functional Unit Systems Pipelined Vector Processors Includes most computers encountered in everyday life

SIMD Multiple Processors Execute a Single Program Each Processor operates on its own data Vector Processors Array Processors PRAM Theoretical Model

MIMD Multiple Processors cooperate on a single task Each Processor runs a different program Each Processor operates on different data Many Commercial Examples Exist

MISD A Single Data Stream passes through multiple processors Different operations are triggered on different processors Systolic Arrays Wave-Front Arrays

Programming Issues Parallel Computers are Difficult to Program Automatic Parallelization Techniques are only Partially Successful Programming languages are few, not well supported, and difficult to use. Parallel Algorithms are difficult to design.

Performance Issues Clock Rate / Cycle Time = τ Cycles Per Instruction (Average) = CPI Instruction Count = I c Time, T = I c CPI τ p = Processor Cycles, m = Memory Cycles, k = Memory/Processor cycle ratio T = I c (p + m k) τ

Performance Issues II Ic & p affected by processor design and compiler technology. m affected mainly by compiler technology τ affected by processor design k affected by memory hierarchy structure and design

Other Measures MIPS rate - Millions of instructions per second Clock Rate for similar processors MFLOPS rate - Millions of floating point operations per second. These measures are not neccessarily directly comparable between different types of processors.

Parallelizing Code Implicitly Write Sequential Algorithms Use a Parallelizing Compiler Rely on compiler to find parallelism Explicitly Design Parallel Algorithms Write in a Parallel Language Rely on Human to find Parallelism

Multi-Processors Multi-Processors generally share memory, while multi-computers do not. Uniform memory model Non-Uniform Memory Model Cache-Only MIMD Machines

Multi-Computers Independent Computers that Don t Share Memory. Connected by High-Speed Communication Network More tightly coupled than a collection of independent computers Cooperate on a single problem

Vector Computers Independent Vector Hardware May be an attached processor Has both scalar and vector instructions Vector instructions operate in highly pipelined mode Can be Memory-to-Memory or Register-to- Register

SIMD Computers One Control Processor Several Processing Elements All Processing Elements execute the same instruction at the same time Interconnection network between PEs determines memory access and PE interaction

The PRAM Model SIMD Style Programming Uniform Global Memory Local Memory in Each PE Memory Conflict Resolution CRCW - Common Read, Common Write CREW - Common Read, Exclusive Write EREW - Exclusive Read, Exclusive Write ERCW - (rare) Exclusive Read, Common Write

The VLSI Model Implement Algorithm as a mostly combinational circuit Determine the area required for implementation Determine the depth of the circuit