Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Similar documents
Introduction II. Overview

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

WHY PARALLEL PROCESSING? (CE-401)

Computer parallelism Flynn s categories

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Multiprocessors - Flynn s Taxonomy (1966)

Computer Systems Architecture

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Chap. 4 Multiprocessors and Thread-Level Parallelism

Lect. 2: Types of Parallelism

Lecture 24: Virtual Memory, Multiprocessors

Computer Systems Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Lecture 24: Memory, VM, Multiproc

Computer Architecture

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

BlueGene/L (No. 4 in the Latest Top500 List)

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

THREAD LEVEL PARALLELISM

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Parallel Architecture. Hwansoo Han

Online Course Evaluation. What we will do in the last week?

COSC 6385 Computer Architecture - Multi Processor Systems

CMSC 611: Advanced. Parallel Systems

Parallel Computing Platforms

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

Multiprocessors & Thread Level Parallelism

COSC4201 Multiprocessors

Parallel Architectures

Parallel Architectures

Parallel Computers. c R. Leduc

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Processor Performance and Parallelism Y. K. Malaiya

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Processor Architecture and Interconnect

A taxonomy of computer architectures

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Computer Architecture Spring 2016

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

Parallel Computing Introduction

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction to parallel computing

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Lecture 7: Parallel Processing

The Art of Parallel Processing

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Overview. Processor organizations Types of parallel machines. Real machines

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Multicores, Multiprocessors, and Clusters

Module 5 Introduction to Parallel Processing Systems

Issues in Multiprocessors

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Issues in Multiprocessors

Types of Parallel Computers

Handout 3 Multiprocessor and thread level parallelism

Chapter 1: Perspectives

Parallel computer architecture classification

Top500 Supercomputer list

Lecture 7: Parallel Processing

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville

Fundamentals of Computers Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

Introduction to Parallel Computing

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Introduction to Parallel Programming

Moore s Law. Computer architect goal Software developer assumption

Fundamentals of Computer Design

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Multi-core Programming - Introduction

Aleksandar Milenkovich 1

Chapter 11. Introduction to Multiprocessors

CS420/CSE 402/ECE 492. Introduction to Parallel Programming for Scientists and Engineers. Spring 2006

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Shared Symmetric Memory Systems

Parallel Programming. Michael Gerndt Technische Universität München

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Computer Organization. Chapter 16

Why Multiprocessors?

Introduction to Parallel Processing

Lecture 9: MIMD Architectures

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Parallel Computing: Parallel Architectures Jin, Hai

POLITECNICO DI MILANO. Advanced Topics on Heterogeneous System Architectures! Multiprocessors

Lecture 1: Introduction

Transcription:

Multiprocessing

Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel Computing,1989 Questions about parallel computers: How large a collection? How powerful are processing elements? How do they cooperate and communicate? How are data transmitted? What type of interconnection? Does it translate into performance?

Parallel s The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Led to innovative organization tied to particular programming models since uniprocessors can t keep going Argument instead is the pull of opportunity of scalable performance, not the push of uniprocessor performance plateau?

What level Parallelism? Bit level parallelism: 1970 to ~1985 4 bits, 8 bit, 16 bit, 32 bit microprocessors Instruction level parallelism (ILP): ~1985 through today Pipelining Superscalar VLIW Out-of-Order execution Limits to benefits of ILP? Process level or Thread level parallelism Pentium4 HT (Hyper-threading) Program level Multicore processor

Why Multiprocessors? - Complexity of current microprocessors Do we have enough ideas to sustain 1.5X/yr? Can we deliver such complexity on schedule? - Slow (but steady) improvement in parallel software (scientific applications, databases, OS) - Emergence of embedded and server markets driving microprocessors in addition to desktops Embedded functional parallelism, producer/consumer model Server figure of merit is tasks per hour vs. latency

Multiprocessor Classifications: Proposed by Michael J. Flynn, a professor at Stanford, in 1966 Based upon the number of concurrent instruction (or control) and data streams available in the architecture Known as Flynn Taxonomy

SISD (Single Instruction Single Data) A sequential computer which exploits no parallelism in either the instruction or data streams. Examples of SISD architecture are the traditional uniprocessor machines

SIMD (Single Instruction Multiple Data) A computer which exploits multiple data streams against a single instruction stream to perform operations which may be parallelized. Example, an array processor or GPU.

MISD (Multiple Instruction Single Data) multiple processors on a single data stream Unusual due to the fact that multiple instruction streams generally require multiple data streams to be effective No commercial product Redundant parallel processing can be classified into this category i.e. air/space craft flight control computer

MIMD (Multiple Instruction Multiple Data) Most multiple CPUs scheme today Multiple processors simultaneously executing different instructions on different data

Memory Organization Centralized shared-memory multiprocessor or Symmetric shared-memory multiprocessor (SMP) - Multiple processors connected to a single centralized memory since all processors see the same memory organization uniform memory access (UMA) - Shared-memory because all processors can access the entire memory address space Caches Caches Caches Caches Main Memory I/O System

Commercial SMP motherboard

Memory Organization For higher scalability, memory is distributed among processors distributed memory multiprocessors If one processor can directly address the memory local to another processor, the address space is shared distributed shared-memory (DSM) multiprocessor If memories are strictly local, we need messages to communicate data cluster of computers or multicomputers Non-uniform memory architecture (NUMA) since local memory has lower latency than remote memory & Caches Memory I/O & Caches Memory I/O & Caches Memory Interconnection network I/O & Caches Memory I/O

Intel Xeon Phi Architecture PCI co-processor card Add 61 cores, 244 threads 8GB memory Based on Pentium CPU Support MPI, OpenMP, OpenCL and Intel Math library

Intel Xeon Phi Architecture