Parallel computer architecture classification

Similar documents
Lecture 7: Parallel Processing

Lecture 8: RISC & Parallel Computers. Parallel computers

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

BlueGene/L (No. 4 in the Latest Top500 List)

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

A taxonomy of computer architectures

Classification of Parallel Architecture Designs

Computer Architecture

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Copyright 2012, Elsevier Inc. All rights reserved.

Flynn s Classification

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Introduction to Parallel Programming

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Lecture 7: Parallel Processing

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

Chapter 4 Data-Level Parallelism

Unit 9 : Fundamentals of Parallel Processing

Fabio AFFINITO.

Advanced Computer Architecture. The Architecture of Parallel Computers

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Embedded Systems Architecture. Computer Architectures

EECS4201 Computer Architecture

Lecture 2 Parallel Programming Platforms

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Advanced Computer Architectures CC721

Multi-core Programming - Introduction

Hakam Zaidan Stephen Moore

Module 5 Introduction to Parallel Processing Systems

Lecture 1: Introduction

Fundamentals of Computers Design

Top500 Supercomputer list

GP-GPU. General Purpose Programming on the Graphics Processing Unit

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Fundamentals of Computer Design

RAID 0 (non-redundant) RAID Types 4/25/2011

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

WHY PARALLEL PROCESSING? (CE-401)

Processor Architecture and Interconnect

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Lect. 2: Types of Parallelism

Parallel Programming Programowanie równoległe

Fundamentals of Quantitative Design and Analysis

Measuring Performance. Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks

The STREAM Benchmark. John D. McCalpin, Ph.D. IBM eserver Performance ^ Performance

Introduction to Parallel Programming

Chapter 11. Introduction to Multiprocessors

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines

A course on Parallel Computer Architecture with Projects Subramaniam Ganesan Oakland University, Rochester, MI

What are Clusters? Why Clusters? - a Short History

Flynn s Taxonomy of Parallel Architectures

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

Parallel Processors. Session 1 Introduction

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

High Performance Computing

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.

Chapter 17 - Parallel Processing

RISC Processors and Parallel Processing. Section and 3.3.6

Parallel Systems I The GPU architecture. Jan Lemeire

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3

A General Discussion on! Parallelism!

CSE 260 Introduction to Parallel Computation

Fujitsu s Approach to Application Centric Petascale Computing

Computer parallelism Flynn s categories

Architectures of Flynn s taxonomy -- A Comparison of Methods

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Parallel Architectures

High Performance Computing Systems

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

Computer Architecture and Organization

Static Compiler Optimization Techniques

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1

What have we learned from the TOP500 lists?

THREAD LEVEL PARALLELISM

3.3 Hardware Parallel processing

Godson Processor and its Application in High Performance Computers

COSC 6385 Computer Architecture - Multi Processor Systems

Parallel Computer Architecture - Basics -

Online Course Evaluation. What we will do in the last week?

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data

Course web site: teaching/courses/car. Piazza discussion forum:

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Part VII Advanced Architectures. Feb Computer Architecture, Advanced Architectures Slide 1

A study on SIMD architecture

High Performance Computing. Classes of computing SISD. Computation Consists of :

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Chapter 18 Parallel Processing

Parallel Computing Why & How?

Confessions of an Accidental Benchmarker

Parallel Architectures

Transcription:

Parallel computer architecture classification

Hardware Parallelism Computing: execute instructions that operate on data. Computer Instructions Data Flynn s taxonomy (Michael Flynn, 1967) classifies computer architectures based on the number of instructions that can be executed and how they operate on data.

Flynn s taxonomy Single Instruction Single Data (SISD) Traditional sequential computing systems Single Instruction Multiple Data (SIMD) Multiple Instructions Multiple Data (MIMD) Multiple Instructions Single Data (MISD) Computer Architectures SISD SIMD MIMD MISD

SISD At one time, one instruction operates on one data Traditional sequential architecture

SIMD At one time, one instruction operates on many data Data parallel architecture Vector architecture has similar characteristics, but achieve the parallelism with pipelining. Array processors

Array processor (SIMD) IP MAR MEMORY OP ADDR MDR A1 B1 C1 A2 B2 C2 AN BN CN DECODER ALU ALU ALU

MIMD Multiple instruction streams operating on multiple data streams Classical distributed memory or SMP architectures

MISD machine Not commonly seen. Systolic array is one example of an MISD architecture.

Flynn s taxonomy summary SISD: traditional sequential architecture SIMD: processor arrays, vector processor Parallel computing on a budget reduced control unit cost Many early supercomputers MIMD: most general purpose parallel computer today Clusters, MPP, data centers MISD: not a general purpose architecture.

Feng s classification 16K MPP 256 bit slice length 64 STARAN PEPE IlliacIV 16 C.mmP 1 PDP11 IBM370 1 16 32 64 word length

Erlangen s classification < K x K, D x D, W x W > control data word dash degree of pipelining TI - ASC <1, 4, 64 x 8> CDC 6600 <1, 1 x 10, 60> x <10, 1, 12> (I/O) C.mmP <16,1,16> + <1x16,1,16> + <1,16,16> PEPE <1 x 3, 288, 32> Cray-1 <1, 12 x 8, 64 x (1 ~ 14)>

Modern classification (Sima, Fountain, Kacsuk) Classify based on how parallelism is achieved by operating on multiple data: data parallelism by performing many functions in parallel: function parallelism Control parallelism, task parallelism depending on the level of the functional parallelism. Parallel architectures Data-parallel architectures Function-parallel architectures

Data parallel architectures Data-parallel architectures Vector architectures Associative And neural SIMDs Systolic architectures architectures

Data parallel architectures Vector processors, SIMD (array processors), systolic arrays. IP MAR Vector processor (pipelining) MEMORY OP ADDR MDR A B C DECODER ALU

Data parallel architecture: Array processor IP MAR MEMORY OP ADDR MDR A1 B1 C1 A2 B2 C2 AN BN CN DECODER ALU ALU ALU

Control parallel architectures Function-parallel architectures Instruction level Parallel Arch (ILPs) Thread level Parallel Arch Process level Parallel Arch (MIMDs) Pipelined processors VLIWs Superscalar processors Distributed Memory MIMD Shared Memory MIMD

Performance of parallel architectures Common metrics MIPS: million instructions per second MIPS = instruction count/(execution time x 10 6 ) MFLOPS: million floating point operations per second. MFLOPS = FP ops in program/(execution time x 10 6 ) Which is a better metric? FLOP is more related to the time of a task in numerical code # of FLOP / program is determined by the matrix size

Performance of parallel architectures FlOPS units kiloflops (KFLOPS) 10^3 megaflops (MFLOPS) 10^6 gigaflops (GFLOPS) 10^9 single CPU performance teraflops (TFLOPS) 10^12 petaflops (PFLOPS) 10^15 we are here right now»10 petaflops supercomputers exaflops (EFLOPS) 10^18 the next milestone

Peak and sustained performance Peak performance Measured in MFLOPS Highest possible MFLOPS when the system does nothing but numerical computation Rough hardware measure Little indication on how the system will perform in practice.

Peak and sustained performance Sustained performance The MFLOPS rate that a program achieves over the entire run. Measuring sustained performance Using benchmarks Peak MFLOPS is usually much larger than sustained MFLOPS Efficiency rate = sustained MFLOPS / peak MFLOPS

Measuring the performance of parallel computers Benchmarks: programs that are used to measure the performance. LINPACK benchmark: a measure of a system s floating point computing power Solving a dense N by N system of linear equations Ax=b Use to rank supercomputers in the top500 list.

Other common benchmarks Micro benchmarks suit Numerical computing LAPACK ScaLAPACK Memory bandwidth STREAM Kernel benchmarks NPB (NAS parallel benchmark) PARKBENCH SPEC Splash

Summary Flynn s classification SISD, SIMD, MIMD, MISD Modern classification Data parallelism function parallelism Instruction level, thread level, and process level Performance MIPS, MFLOPS Peak performance and sustained performance

References K. Hwang, "Advanced Computer Architecture : Parallelism, Scalability, Programmability", McGraw Hill, 1993. D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.