Beyond Latency and Throughput

Similar documents
Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Fundamentals of Quantitative Design and Analysis

Lecture 1: Introduction

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

High Performance Computing Systems

EECS4201 Computer Architecture

RAID 0 (non-redundant) RAID Types 4/25/2011

Lecture 7: Parallel Processing

Introduction to Parallel Computing

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Fundamentals of Computer Design

Introduction to Parallel Programming

CSE 502 Graduate Computer Architecture

Fundamentals of Computers Design

RISC Processors and Parallel Processing. Section and 3.3.6

Computer Architecture

Processor Architecture and Interconnect

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Computer Systems Architecture

Lecture 7: Parallel Processing

CS650 Computer Architecture. Lecture 10 Introduction to Multiprocessors and PC Clustering

BlueGene/L (No. 4 in the Latest Top500 List)

Single Chip Heterogeneous Multiprocessor Design

Top500 Supercomputer list

Chap. 4 Multiprocessors and Thread-Level Parallelism

Course web site: teaching/courses/car. Piazza discussion forum:

10th August Part One: Introduction to Parallel Computing

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Multi Processor Systems

Computer Systems Architecture

CSC2/458 Parallel and Distributed Systems Machines and Models

Computer Architecture and Organization

Multiprocessors - Flynn s Taxonomy (1966)

Introduction to Parallel Programming

Parallel Programming. Presentation to Linux Users of Victoria, Inc. November 4th, 2015

Module 5 Introduction to Parallel Processing Systems

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines

Computer Architecture Spring 2016

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Distributed System. Gang Wu. Spring,2019

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

SMD149 - Operating Systems - Multiprocessing

Multiprocessors & Thread Level Parallelism

Parallel Processors. Session 1 Introduction

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

TDT 4260 lecture 3 spring semester 2015

Multicores, Multiprocessors, and Clusters

THREAD LEVEL PARALLELISM

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Introduction to Computer Architecture II

Parallel Computing Platforms

Chapter 1: Perspectives

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Flynn s Taxonomy of Parallel Architectures

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Computer Architecture. R. Poss

Introduction. Distributed Systems. Introduction. Introduction. Instructor Brian Mitchell - Brian

Chapter 11. Introduction to Multiprocessors

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Outline Marquette University

Introduction II. Overview

CSL 860: Modern Parallel

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

The Role of Performance

Introduction to Parallel Programming and Computing for Computational Sciences. By Justin McKennon

What is Parallel Computing?

1.13 Historical Perspectives and References

Chapter 1: Fundamentals of Quantitative Design and Analysis

Lecture 1: Course Introduction and Overview Prof. Randy H. Katz Computer Science 252 Spring 1996

3.3 Hardware Parallel processing

CDA3101 Recitation Section 13

Performance of computer systems

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Comp. Org II, Spring

Introduction to Computer Science Lecture 2: Data Manipulation

Test on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs)

Engineering Robust Server Software

Parallel Processing & Multicore computers

Lecture 25: Interrupt Handling and Multi-Data Processing. Spring 2018 Jason Tang

Introduction to Parallel Programming

In the early days of computing, the best way to increase the speed of a computer was to use faster logic devices.

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Parallelism. CS6787 Lecture 8 Fall 2017

ECE 669 Parallel Computer Architecture

Introduction to parallel computing

Lecture 28 Introduction to Parallel Processing and some Architectural Ramifications. Flynn s Taxonomy. Multiprocessing.

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

The Need for Speed: Understanding design factors that make multicore parallel simulations efficient

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Introduction to Parallel Processing

COMPUTER ARCHITECTURE

Transcription:

Beyond Latency and Throughput Performance for Heterogeneous Multi-Core Architectures JoAnn M. Paul Virginia Tech, ECE National Capital Region

Common basis for two themes Flynn s Taxonomy Computers viewed from the inside out Single Instruction Multi- Instruction Develop a new Taxonomy From the outside in Single Datastream SISD MISD Later fill in MISD Multi- Datastream SIMD MIMD 2

From the outside in Single User (SU) a computer designed for use by a single individual at a time Multiple User (MU) a computer designed for use by multiple individuals at a time Single Application (SA) a computer designed to execute a single application at a time Multiple Application (MA) a computer designed to execute multiple applications at a time 3

The U-A Taxonomy Are supercomputers and emerging single chip heterogeneous multiprocessors really the same? Single Application Single User SUSA Multi-User MUSA Multi- Application SUMA MUMA 4

Filling in the U-A Taxonomy SU MU SA SISD MIMD Early Computers, PCs running DOS Supercomputers Database mainframes (IBM) Website servers MA SISD MIMD Unix-style O/Ses, Windows Emerging! Personal, wireless computers (iphone++) Early timeshare General servers (Google) 5

Implications of SUMA-MIMD Performance is judged solely by the user Can even be highly personalized Diminishing returns for Graphics, sound quality Users can only juggle so many things at once Will never see some performance improvements Will trade more apps for isolated optimality 6

Usefulness and Timeliness Addition of speech to a navigation system Scenario Description 1 no software Is slower better? 10 2 basic function 3 4 best without speech RISC load too high 5 inadequate speech 6 inadequate display 7 speech and display equal 8 optimal balance 9 SIMD load too high all load too high Quality usefulness timeliness 1 2 3 4 5 6 7 8 9 10 Scenario speech recognition route optimization web search display 7

Amdahl s Law Speedup L 0 L A = L 0 f Limited to the removal of the sequential fraction R 1 R 1 Ideal if it can be considered to take zero time to execute, f f 0 R 2 you can t do better than R 2 8

Amdahl s Law Implies slower f is worse Assumes Other regions are unaffected Independence of design concerns L 0 R 1 f R 2 f 0 L A = L 0 f R 1 R 2 9

Global Effects Anything with a common element what does not have a common element Easiest to see is a boundary e.g. chip Region A Region B chip 10

When Bounded, Heterogeneous The system can be faster when the isolated portion (sequential fraction) is slowed down L 0 R 1 f f 0 L A = L 0 f R 1 R 2 OR L B > L 0 R 1 Microarchitecture has this effect also R 2 R 2 Floating point vs. larger register file Faster f leads to slower system!! 11

Faster is no longer always better DA CA I/O (values, times) Application /Architecture Specification Benchmark 1 Executed One at a time Benchmark 2 Benchmark n Tools & Design Algorithms Datasets (values) Processor Features Design Instance Simulation Performance 12

Challenges and Architectural Responses MISD 13

Information v. transistor density growth 14

Effort required to find, file and share data 15

Computing has been Quantitative Transformative mapping from one mathematical space, A, to another, B, functionally, completely for all a ε A and b ε B b = F(a) Functional transformations Tend to be reductive typically transforming points from spaces with more to those with fewer dimensions Are objective Mathematical the correct result is presumed to exist 16

Qualitative Computing Identify The extent to which some input has a set of qualities {A, B, C, D, } where those qualities are represented in imperfect mathematical spaces Subjective Real answer only lies in the interpretation of individual users Exact interpretations do not exist When is something, red, large, good? 17

Solution is Architectural, MISD Personalization tuning to individual preferences enabled by data management to on personalized, (and mobile) computers Mult-interpretation simultaneously applying different techniques enabled by heterogeneity of the underlying architecture Integration effective coupling of global data and processing enabled by new data-processor relationships on novel single chip heterogeneous multiprocessor architectures 18

Spectroprocessing is MISD For Qualitative Computation Customized to personal preference history Grows over time May start with common seed Facilitates judgment chip 19

Conclusions Since the dawn of computing Performance has been driven by Reducing latency Increasing throughput All computation has been quantitative fundamentally transformational We are at the frontier of transcending each 20