Parallel Architectures

Similar documents
Computer parallelism Flynn s categories

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

Parallel Architecture. Sathish Vadhiyar

CS Parallel Algorithms in Scientific Computing

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

What is Parallel Computing?

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

COSC 6374 Parallel Computation. Parallel Computer Architectures

Lecture 2 Parallel Programming Platforms

Overview. Processor organizations Types of parallel machines. Real machines

Introduction II. Overview

COSC 6374 Parallel Computation. Parallel Computer Architectures

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Introduction to Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Chapter 11. Introduction to Multiprocessors

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Advanced Parallel Architecture. Annalisa Massini /2017

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Introduction

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Issues in Multiprocessors

Computer Architecture

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

BlueGene/L (No. 4 in the Latest Top500 List)

Parallel Architecture, Software And Performance

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Interconnection Networks

4. Networks. in parallel computers. Advances in Computer Architecture

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Parallel Computing Platforms

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Scalability and Classifications

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Multiprocessors & Thread Level Parallelism

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Issues in Multiprocessors

Computer Systems Architecture

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Interconnection Networks. Issues for Networks

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction to parallel computing

Chapter 9 Multiprocessors

WHY PARALLEL PROCESSING? (CE-401)

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Physical Organization of Parallel Platforms. Alexandre David

SHARED MEMORY VS DISTRIBUTED MEMORY

COSC 6385 Computer Architecture - Multi Processor Systems

Lecture 7: Parallel Processing

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Taxonomy of Parallel Computers, Models for Parallel Computers. Levels of Parallelism

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Computer Systems Architecture

Introduction to Parallel Programming

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Processor Performance. Overview: Classical Parallel Hardware. The Processor. Adding Numbers. Review of Single Processor Design

CS/COE1541: Intro. to Computer Architecture

Multiprocessors - Flynn s Taxonomy (1966)

Overview: Classical Parallel Hardware

Chap. 4 Multiprocessors and Thread-Level Parallelism

Lect. 2: Types of Parallelism

Types of Parallel Computers

Dr. Joe Zhang PDC-3: Parallel Platforms

EE382 Processor Design. Illinois

INTERCONNECTION NETWORKS LECTURE 4

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

High Performance Computing in C and C++

Parallel Architectures

High Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA

Lecture 24: Virtual Memory, Multiprocessors

CSC630/CSC730: Parallel Computing

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

Parallel Hardware and Interconnects

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Lecture 7: Parallel Processing

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

CS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.

Parallel Architectures

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

Interconnection Network

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

Copyright 2010, Elsevier Inc. All rights Reserved

Computer Architecture Spring 2016

Introduction to Parallel Programming

Architecture of parallel processing in computer organization

Transcription:

Parallel Architectures

Part 1: The rise of parallel machines

Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores )

Lab Cluster

Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36 cores ) + multi socket boards

SUN UltraSPARC T3 16 CPU cores 8 hardware thread per core (128 cores )

IBM Power 8

GPUs 2,000+ cores on one chip

NVIDIA TITAN Z

Top500.org

Part 2: Taxonomies for Parallel Architectures

Taxonomies for Parallel Architectures Floyd s Taxonomy - program control and memory access Taxonomy Based on Memory Organization Taxonomy Based on Processor Granularity Taxonomy Based on Processor Synchronization Taxonomy Based on Interconnection Architecture

Floyd s Taxonomy Computer architectures: SISD MISD SIMD MIMD Based on method of program control and memory access

SISD Computers Standard sequential computer. A single processing unit receives a single stream of instructions that operate on a single stream of data.

MISD Computers p processors, each with its own control unit, share a common memory.

SIMD Computers All p identical processors operate under the control of a single instruction stream issued by a central control unit. There are p data streams, one per processor so different data can be used in each processor.

MIMD Computers p processors p streams of instructions p streams of data

Taxonomy Based on Memory Organization Distributed memory Shared memory UMA NUMA

Distributed Memory Each processor has its own memory Communication is usually performed by message passing Each processor can access its own memory, directly memory of another processor, via message passing Interconnect

Shared Memory provides hardware support for read/write to a shared memory space has a single address space shared by all processors I/O devices Mem Mem Mem Interconnect Processor Mem I/O ctrl Interconnect Processor I/O ctrl

Scaling Up Problem is interconnect: cost (crossbar) or bandwidth (bus) Dance-hall: bandwidth still scalable, but lower cost than crossbar latencies to memory uniform, but uniformly large Distributed memory or non-uniform memory access (NUMA) Construct shared address space out of simple message transactions across a general-purpose network (e.g. read-request, read-response) Caching shared (particularly nonlocal) data?

Taxonomy Based on Processor Granularity Coarse Grained: Few powerful processors Fined Grained: Many small processors (massively parallel) Medium Grained: between the two...

Taxonomy Based on Processor Synchronization Asynchronous: Processors run on independent clocks. User has synchronize via message passing or shared variable. Fully Synchronous: Processors run in sync on one global clock. Bulk-synchronous: Hybrid. Processors have independent clocks. Support is provided for global synchronization to be called by the user s application program.

Taxonomy Based on Interconnection Architectures Static Point to point connections Dynamic Network with switches Crossbars Buses Interconnect Network

Static Interconnection Topologies Linear Array Ring Diameter (Max distance between processors) Bisection Width (Min cuts to break into equal halves) Cost (number of links)

Static Interconnection Topologies Mesh Torus Diameter? Bisection Width? Cost?

Static Interconnection Topologies Tree Diameter? Bisection Width? Cost?

Static Interconnection Topologies Complete Network Diameter? Bisection Width? Cost?

Static Interconnection Topologies d-dim Hypercube 2d processors d=4 d=0 d=1 d=2 d=5 d=3 Diameter? Bisection Width? Cost?

Static Interconnection Topologies Fat Tree Diameter? Bisection Width? Cost?

Switch based interconnection network

Summary

Taxanomy of parallel machines Fine grained Coarse grained Distributed memory Shared memory coarse grained clusters massively parallel clusters GPU multi-core MIMD SIMD Massively parallel cluster (MIMD, distributed memory, fine grained) Coarse grained cluster (MIMD, distributed memory, coarse grained) Multi-core processor (MIMD, shared memory, coarse grained) GPU (SIMD, shared memory, fine grained)