Processor Architecture and Interconnect

Similar documents
Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Aleksandar Milenkovich 1

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville

CMSC 611: Advanced. Parallel Systems

Computer parallelism Flynn s categories

Chap. 4 Multiprocessors and Thread-Level Parallelism

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Module 5 Introduction to Parallel Processing Systems

Parallel Computing Platforms

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

COSC 6385 Computer Architecture - Multi Processor Systems

Chapter 9 Multiprocessors

COSC4201 Multiprocessors

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Issues in Multiprocessors

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors & Thread Level Parallelism

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Issues in Multiprocessors

BlueGene/L (No. 4 in the Latest Top500 List)

Chapter 11. Introduction to Multiprocessors

Computer Systems Architecture

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

Computer Systems Architecture

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Convergence of Parallel Architecture

Dr. Joe Zhang PDC-3: Parallel Platforms

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University

Computer Architecture

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Handout 3 Multiprocessor and thread level parallelism

WHY PARALLEL PROCESSING? (CE-401)

Lecture 17: Multiprocessors: Size, Consitency. Review: Networking Summary

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

Introduction II. Overview

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Multiprocessors 1. Outline

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Parallel Architecture. Hwansoo Han

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

Introduction to parallel computing

Lecture 7: Parallel Processing

Computer Organization. Chapter 16

Lecture 9: MIMD Architectures

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

CSCI 4717 Computer Architecture

Lecture 8: RISC & Parallel Computers. Parallel computers

Parallel Computers. c R. Leduc

Chapter 5. Thread-Level Parallelism

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)

Why Multiprocessors?

CS4961 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/30/11. Administrative UPDATE. Mary Hall August 30, 2011

Introduction to Parallel Programming

Organisasi Sistem Komputer

Types of Parallel Computers

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

CSE 392/CS 378: High-performance Computing - Principles and Practice

Top500 Supercomputer list

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

Fall 2011 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Flynn s Classification

Architecture of Large Systems CS-602 Computer Science and Engineering Department National Institute of Technology

Parallel Architecture. Sathish Vadhiyar

Parallel Programming Models and Architecture

! Readings! ! Room-level, on-chip! vs.!

Chapter 18 Parallel Processing

Lecture 9: MIMD Architectures

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Lecture 7: Parallel Processing

CS/COE1541: Intro. to Computer Architecture

Classification of Parallel Architecture Designs

Parallel Numerics, WT 2013/ Introduction

High Performance Computing in C and C++

Lect. 2: Types of Parallelism

Parallel Processors. Session 1 Introduction

High Performance Computing. University questions with solution

THREAD LEVEL PARALLELISM

Multi-Processor / Parallel Processing

Flynn s Classification

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Multicores, Multiprocessors, and Clusters

COSC 6374 Parallel Computation. Parallel Computer Architectures

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

Online Course Evaluation. What we will do in the last week?

Parallel Architectures

Transcription:

Processor Architecture and Interconnect

What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing was introduced because the sequential process of executing instructions took a lot of time

Classification Parallel Processor Architectures

Flynn s Taxonomy Single instruction stream, single data stream (SISD) Single instruction stream, multiple data streams (SIMD) Vector architectures Multimedia extensions Graphics processor units asses of Computers Multiple instruction streams, single data stream (MISD) No commercial implementation Multiple instruction streams, multiple data streams (MIMD) Tightly-coupled MIMD Loosely-coupled MIMD Copyright 2012, Elsevier Inc. All rights reserved.

Introduction SIMD architectures can exploit significant data-level parallelism for: matrix-oriented scientific computing media-oriented image and sound processors troduction SIMD is more energy efficient than MIMD Only needs to fetch one instruction per multiple data operations, rather than one instr. per data op. Makes SIMD attractive for personal mobile devices SIMD allows programmer to continue to think sequentially Copyright 2012, Elsevier Inc. All rights reserved.

SIMD It is also called as Array Processor Here, single stream of instruction is fetched The instruction stream is fetched by shared memory Here, instruction is broadcasted to multiple processing elements(pes) MIMD It is also called as Multiprocessor Here, multiple streams of instruction are fetched The instruction streams are fetched by control unit Here, instruction streams are decoded to get multiple decoded instruction stream (IS)

Block Diagram of Tightly Coupled Multiprocessor Processors share memory Communicate via that shared memory

Block Diagram of Loosely Coupled Multiprocessor

Cluster Computer Architecture

Loosely Coupled Multiprocessor Communication takes places through Message Transfer System The processors do not share memory Each processor has its own local memory They are referred to as Distributed Systems Interaction between the processors and IO modules is very low The processor is directly connected to IO devices. Tightly Coupled Multiprocessor Communication takes places through PMIN. The processors shares memory Each processor does have its own local memory They are referred to as Shared Systems Interaction between the processors and IO modules is very high The processor is connected to IO devices through IOPIN

Communication Models Shared Memory Processors communicate with shared address space Easy on small-scale machines Advantages: Model of choice for uniprocessors, small-scale MPs Ease of programming Lower latency Easier to use hardware controlled caching Message passing Processors have (private memories)local memories or buffers associated with it, communicate via messages Advantages: Less hardware, easier to design Focuses attention on costly non-local operations

Shared Address/Memory Model Communicate via Load and Store Oldest and most popular model Based on timesharing: processes on multiple processors vs. sharing single processor Single virtual and physical address space Multiple processes can overlap (share), but ALL threads share a process address space Writes to shared address space by one thread are visible to reads of other threads Usual model: share code, private stack, some shared heap, some private heap

Advantages shared-memory communication model Compatibility with SMP hardware Ease of programming when communication patterns are complex or vary dynamically during execution Ability to develop apps using familiar SMP model, attention only on performance critical accesses Lower communication overhead, better use of BW for small items, due to implicit communication and memory mapping to implement protection in hardware, rather than through I/O system HW-controlled caching to reduce remote comm. by caching of all data, both shared and private.

Shared Address Model Summary Each process can name all data it shares with other processes Data transfer via load and store Data size: byte, word,... or cache blocks Uses virtual memory to map virtual to local or remote physical Memory hierarchy model applies: now communication moves data to local proc. cache (as load moves data from memory to cache) Latency, BW

Message Passing Model Whole computers (CPU, memory, I/O devices) communicate as explicit I/O operations Send specifies local buffer + receiving process on remote computer Receive specifies sending process on remote computer + local buffer to place data Usually send includes process tag and receive has rule on tag: match 1, match any Synch: when send completes, when buffer free, when request accepted, receive wait for send Send+receive => memory-memory copy, where each supplies local address, AND does pair-wise synchronization!

Message Passing Model Send + receive => memory-memory copy, synchronization on OS even on 1 processor History of message passing: Network topology important because could only send to immediate neighbour Typically synchronous, blocking send & receive Later DMA with non-blocking sends, DMA for receive into buffer until processor does receive, and then data is transferred to local memory Later SW libraries to allow arbitrary communication Example: IBM SP-2, RS6000 workstations in racks.

Advantages message-passing communication model The hardware can be simpler Communication explicit => simpler to understand; in shared memory it can be hard to know when communicating and when not, and how costly it is Explicit communication focuses attention on costly aspect of parallel computation, sometimes leading to improved structure in multiprocessor program Synchronization is naturally associated with sending messages, reducing the possibility for errors introduced by incorrect synchronization Easier to use sender-initiated communication, which may have some advantages in performance

Applications of Parallel Processing