Dr. Joe Zhang PDC-3: Parallel Platforms

Similar documents
Issues in Multiprocessors

Chapter 11. Introduction to Multiprocessors

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

Issues in Multiprocessors

Computer Architecture

Computer parallelism Flynn s categories

Introduction to parallel computing

PARALLEL COMPUTER ARCHITECTURES

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms

CS Parallel Algorithms in Scientific Computing

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

Lecture 2 Parallel Programming Platforms

Module 5 Introduction to Parallel Processing Systems

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

Lecture notes for CS Chapter 4 11/27/18

Processor Architecture and Interconnect

Parallel Computers. c R. Leduc

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

High Performance Computing in C and C++

Parallel Programming Platforms

BlueGene/L (No. 4 in the Latest Top500 List)

Lecture 7: Parallel Processing

Multiprocessors - Flynn s Taxonomy (1966)

Lecture 7: Parallel Processing

Parallel Architectures

COSC 6385 Computer Architecture - Multi Processor Systems

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Parallel Architecture. Sathish Vadhiyar

RISC Processors and Parallel Processing. Section and 3.3.6

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Three basic multiprocessing issues

Chap. 4 Multiprocessors and Thread-Level Parallelism

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Unit 9 : Fundamentals of Parallel Processing

Multicores, Multiprocessors, and Clusters

Chapter 18 Parallel Processing

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

CS650 Computer Architecture. Lecture 10 Introduction to Multiprocessors and PC Clustering

Introduction to parallel computing. Seminar Organization

Types of Parallel Computers

Parallel Architectures

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Introduction (Chapter 1)

SHARED MEMORY VS DISTRIBUTED MEMORY

Objectives of the Course

Introduction II. Overview

Top500 Supercomputer list

Lect. 2: Types of Parallelism

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Chapter 9 Multiprocessors

Introduction to Parallel Processing

Parallel Architectures

Limitations of Memory System Performance

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Learning Curve for Parallel Applications. 500 Fastest Computers

Cray XE6 Performance Workshop

Parallel Architecture. Hwansoo Han

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Chapter 5: Thread-Level Parallelism Part 1

Advanced Computer Architecture. The Architecture of Parallel Computers

Architecture of Large Systems CS-602 Computer Science and Engineering Department National Institute of Technology

PIPELINE AND VECTOR PROCESSING

Computer Systems Architecture

The Art of Parallel Processing

Introduction to Parallel Computing

Introduction to Parallel Programming

Organisasi Sistem Komputer

Multi-core Programming - Introduction

WHY PARALLEL PROCESSING? (CE-401)

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Computer Organization. Chapter 16

Dheeraj Bhardwaj May 12, 2003

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Lecture 8: RISC & Parallel Computers. Parallel computers

Chapter 1: Perspectives

Multi-Processor / Parallel Processing

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Introduction to Parallel Programming

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines

Overview. Processor organizations Types of parallel machines. Real machines

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

Lecture 2. Memory locality optimizations Address space organization

10th August Part One: Introduction to Parallel Computing

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiprocessors & Thread Level Parallelism

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

ECE 669 Parallel Computer Architecture

Computer Systems Architecture

Static Compiler Optimization Techniques

Chapter 5. Thread-Level Parallelism

Moore s Law. Computer architect goal Software developer assumption

Transcription:

CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model hysical organization (actual hardware) RAM Interconnection networks Network topologies Characteristics 2 1

Example of arallelism arallelism from single instruction on multiple processors for (i = 0; i < 1000; i++) c[i] = a[i] + b[i]; Various iterations of the loop are independent Execute the same instruction, add, on all processors Single instruction stream, multiple data stream (SIMD) 3 Flynn Classification Flynn classification based on instruction stream and data stream (control structure) Each single-instruction stream generated from program operates single data (SISD) Each single-instruction stream generated from program operates multiple data (SIMD) Multiple instruction stream generated from program operates single data (MISD) (not exits) Multiple instruction stream generated from program operates multiple data (MIMD) 4 2

Diagram Comparing Classifications SISD The simplest architecture is single-instruction single-data (SISD). The first extension to CUs for speedup was pipelining. The various circuits of the CU are split up into functional units which are arranged into a pipeline. Each functional unit operates on the result of the previous functional unit during a clock cycle. 3

SIMD Vector processors perform the same operation on several inputs simultaneously. The basic instruction is only issued once for several operands. ure SIMD systems have a single CU devoted to control and a large collection of subordinate processors each with its own registers. The control CU broadcasts an instruction to all of the subordinates. Each subordinate either executes the instruction or sits idle. SIMD (Fortran Example) Compare the Fortran 77 code (sequential): do 100 i = 1, 100 z(i) = x(i) + y(i) 100 continue with the equivalent Fortran 90 code (vector): z(1:100) = x(1:100) + y(1:100) 4

MIMD All the autonomous processors in MIMD machines operate on their own data. Each processor operates on its own pace. There is often no global clock and no explicit synchronization. There are shared-memory systems and distributed-memory systems. SIMD and MIMD 10 5

Comparison of SIMD and MIMD SIMD computers requires less hardware Only one global control unit SIMD requires less memory Only one copy of data SIMD is not so popular Specialized hardware architecture, extensive design efforts oor resource utility in the case of conditional execution 11 Conditional Execution on SIMD 12 6

SMD Single program multiple data (SMD) A simple variant of MIMD model Rely on multiple instances of the same program executing on different data SMD is widely used by many parallel platforms Sun Ultra Servers Multiprocessors Cs Workstation cluster IBM S Requires minimal architecture support 13 Communication Model Two primary forms of data exchange Accessing a shared memory space Message passing 14 7

Shared vs. Distributed Memory BUS Memory Shared memory - single address space. All processors have access to a pool of shared memory. (Ex: SGI Origin, Sun E10000) Distributed memory - each processor has it s own local memory. Must do message passing to exchange data between processors. (Ex: CRAY T3E, IBM S, clusters) M M M M Network M M Shared-Address-Space latforms Shared-address-space platforms A common data space is accessible to all processors rocessors interact by modifying data objects stored in the shared-address-space Memory can be local or global (common to all processors) Accessing local memory is cheaper With a global memory, programming is easier Multiprocessors Shared-address-space platforms supporting SMD programming 16 8

Shared Memory: UMA vs. NUMA BUS Memory Uniform memory access (UMA): Each processor has uniform access to memory. Also known as symmetric multiprocessors, or SMs (Sun E10000) Non-uniform memory access (NUMA): Time for memory access depends on location of data. Local access is faster than non-local access. Easier to scale than SMs (SGI Origin) BUS Memory BUS Memory Network UMA and NUMA with/without cashes 18 9

Message-assing latforms A message-passing platform Consists of p processing nodes Each with its own exclusive address space Each node can either be a single processor or a sharedaddress-space multiprocessor Clustered workstations Interactions must be accomplished using messages Send Receive 19 latforms and rogramming latforms that supports message-passing paradigm IBM S SGI Origin 2000 Workstation clusters rogramming Message assing Interface (MI) (Chapter 6) 20 10

Summary Flynn classification SISD SIMD MISD MIMD SMD Communication model Accessing a shared memory space UMA and NUMA Message passing 21 CSC630/CSC730: arallel & Distributed Computing Questions? 22 11