ECE 588/688 Advanced Computer Architecture II

Similar documents
ECE 588/688 Advanced Computer Architecture II

ECE 587 Advanced Computer Architecture I

Introduction to Parallel Computing

When and Where? Course Information. Expected Background ECE 486/586. Computer Architecture. Lecture # 1. Spring Portland State University

Parallel Computer Architecture

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Portland State University ECE 588/688. IBM Power4 System Microarchitecture

Portland State University ECE 588/688. Cray-1 and Cray T3E

Wisconsin Computer Architecture. Nam Sung Kim

Portland State University ECE 588/688. Cray-1 and Cray T3E

CS/EE 6810: Computer Architecture

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Computer Architecture!

Computer Architecture!

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

EECS4201 Computer Architecture

ECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Computer Architecture!

COMP 322: Fundamentals of Parallel Programming

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Lecture 1: Introduction

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Lecture 1: Gentle Introduction to GPUs

EECS 452 Lecture 9 TLP Thread-Level Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Computer Systems Architecture

EE382 Processor Design. Class Objectives

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Computer Architecture. Fall Dongkun Shin, SKKU

Performance, Power, Die Yield. CS301 Prof Szajda

Portland State University ECE 588/688. Dataflow Architectures

Spring 2016 :: CSE 502 Computer Architecture. Introduction. Nima Honarmand

Chapter 2. OS Overview

Online Course Evaluation. What we will do in the last week?

COMP 322 / ELEC 323: Fundamentals of Parallel Programming

CSE502: Computer Architecture CSE 502: Computer Architecture

LECTURE 1. Introduction

Lecture 18: Multithreading and Multicores

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE 475/CS 416 Computer Architecture - Introduction. Today s Agenda. Edward Suh Computer Systems Laboratory

Introduction to Computer Architecture II

Portland State University ECE 588/688. Cache-Only Memory Architectures

CS3350B Computer Architecture. Introduction

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016

Fundamentals of Computers Design

Computer Architecture

Portland State University ECE 587/687. Caches and Prefetching

COMP 322 / ELEC 323: Fundamentals of Parallel Programming

CS 6534: Tech Trends / Intro

Computer Systems Architecture

LIMITS OF ILP. B649 Parallel Architectures and Programming

EECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont

Advanced Computer Architecture (CS620)

Computer Architecture

CS377P Programming for Performance Multicore Performance Multithreading

Advanced Processor Architecture

15-740/ Computer Architecture Lecture 12: Issues in OoO Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures

Lecture 1: Course Introduction and Overview Prof. Randy H. Katz Computer Science 252 Spring 1996

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101

WHY PARALLEL PROCESSING? (CE-401)

Computer Architecture Crash course

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

Computer Architecture

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Portland State University ECE 588/688. Memory Consistency Models

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

Fundamentals of Computer Design

First, the need for parallel processing and the limitations of uniprocessors are introduced.

Multicore and Parallel Processing

Microelectronics. Moore s Law. Initially, only a few gates or memory cells could be reliably manufactured and packaged together.

IT 252 Computer Organization and Architecture. Introduction. Chia-Chi Teng

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

THREAD LEVEL PARALLELISM

Architecture-Conscious Database Systems

The Processor: Instruction-Level Parallelism

Microprocessor Trends and Implications for the Future

Computer Architecture Spring 2016

CS/ECE 752: Advanced Computer Architecture 1. Lecture 1: What is Computer Architecture?

Hyperthreading Technology

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

COS 318: Operating Systems. Introduction

COMPUTER ARCHITECTURE

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Administration. Coursework. Prerequisites. CS 378: Programming for Performance. 4 or 5 programming projects

CS61C Machine Structures. Lecture 1 Introduction. 8/27/2006 John Wawrzynek (Warzneck)

E&CE 429 Computer Structures

CMPSCI 201: Architecture and Assembly Language

6 February Parallel Computing: A View From Berkeley. E. M. Hielscher. Introduction. Applications and Dwarfs. Hardware. Programming Models

Transcription:

ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Winter 2018 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2018 1

When and Where? When: Monday & Wednesday 5:00-6:50 PM Where: PCC Willow Creek 312 Office hours: After class, or by appointment TA: Aarti Rane Office hours; TBD (starting next week) Webpage: http://www.cecs.pdx.edu/~alaa/ece588/ Go to webpage for: Class slides and papers Homework and project assignments Changes in class schedule Portland State University ECE 588/688 Winter 2018 2

Course Description Parallel computing and multiprocessors Symmetric Multiprocessors (SMPs) Chip Multiprocessors (CMPs), aka multi-core processors Multithreading and parallel programming models Multiprocessor memory systems Emphasis on papers readings NOT on a textbook Tutorial papers Original sources and ideas papers Papers covering most recent trends Portland State University ECE 588/688 Winter 2018 3

Expected Background ECE 587/687 or equivalent Superscalar processor microarchitecture Branch prediction Cache organization Memory ordering Speculative execution Multithreading Programming experience in C Portland State University ECE 588/688 Winter 2018 4

Grading Policy Class Participation 5% Paper reviews 10% Homeworks 20% Project 25% Midterm Exam 20% Final Exam 20% Grading Scale: A: 92-100% A-: 86-91.5% B+: 80-85.5% B: 76-79.5% B-: 72-75.5% C+: 68-71.5% C: 64-67.5% C-: 60-63.5% D+: 57-59.5% D: 54-56.5% D-: 50-53.5% F: Below 50% Portland State University ECE 588/688 Winter 2018 5

Important Dates Midterm Exam: Monday, February 12 th, in class Final Exam: Wednesday March 14 th, in class Project Presentations: Friday March 16 th, 7-9 PM, location TBD Project Reports Due: Tuesday March 20 th (email) No class next Monday (holiday). Other classes may get combined/cancelled/moved. Portland State University ECE 588/688 Winter 2018 6

Why Study Computer Architecture Technology advancements require continuous optimization of cost, performance, and power Moore s law Original version: Transistor scaling exponential Popular version: Processor performance exponentially increasing Innovation needed to satisfy market trends User and software requirements keep on changing Software developers expecting improvements in computing power Portland State University ECE 588/688 Winter 2018 7

Why Study Parallel Computing Technology made multi-core processors both feasible AND necessary for performance Moore s law: too many transistors on a die than can be used (efficiently) for a single processor Traditional out-of-order processors face memory and power walls Software requirements need more computing power than a single processor Scientific computations Commercial applications Portland State University ECE 588/688 Winter 2018 8

Moore s Law (1965) #Transistors Per Chip (Intel) 10,000,000,000 1,000,000,000 100,000,000 10,000,000 1,000,000 100,000 10,000 1971 1974 1977 1980 1,000 1983 1986 1989 1992 1995 1998 2001 Year 2004 Almost 75% increase per year Portland State University ECE 588/688 Winter 2018 9

Memory Wall 1000 Time (ns) 100 10 1 0.1 1982 1984 1986 1988 1990 1992 1994 1998 2000 2002 2004 2006 1996 Year CPU cycle time: 500 times faster since 1982 DRAM Latency: Only ~5 times faster since 1982 CPU Cycle Time DRAM latency Portland State University ECE 588/688 Winter 2018 10

Why Not Build Larger OoO Memory Wall Processors? 1980 Memory Latency ~ 1 instruction Now ~ 1000 instructions Power Wall Dynamic power increases quadratically with frequency, static power increases exponentially ILP Limitations Serial programs don t have an infinite amount of instructions to execute in parallel Other Problems Temperature & cooling On-chip interconnect Complexity & testing Portland State University ECE 588/688 Winter 2018 11

Technology Trends: Growing #Transistors Causes Inflection Points Most famous inflection point enabled simple microprocessor 1971 Intel 4004 2300 transistors Could access 300 bytes of memory (0.0003 megabytes) Bigger and better-performing processors were enabled by more transistors 2006, Prof. Mark Hill 12 Portland State University ECE 588/688 Winter 2018

Microprocessor Design Complexity Millions of Transistors Bit-level Parallelism (BLP) 64-bit multiply in one/two cycles Instruction-Level Parallelism (ILP) Dozens of instruction in flight Branch prediction Out-of-order Dominating Use of Caches Level-one cache Level-two cache Emerging Chip Multiprocessing L1 Data Cache L2 Cache Tags PowerPC 750 (1998) L1 Instruction Cache 13 Portland State University ECE 588/688 Winter 2018

Multi-core Processors (Chip Replicate processor core & Caches Multiprocessors) CPU 0 + L1 Cache CPU 1 + L1 Cache Uses more transistors Tolerates memory wall Simpler lower-power cores Reduces complexity Shared L2 Cache 14 Portland State University ECE 588/688 Winter 2018 IBM Power 4

Transactions Per Minute (tpmc) 7000000 Software Trends 6000000 5000000 4000000 3000000 2000000 1000000 0 Dec-99 Apr-01 Sep-02 Jan-04 May-05 Oct-06 Feb-08 Jul-09 TPC-C Throughput increased by more than 75% per year over eight years What about desktop applications? What about scientific applications? 15 Portland State University ECE 588/688 Winter 2018

Multi-Core Performance Recall Popular Moore s Law: Microprocessor performance doubles every 1.5-2 years Future Multi-Core Performance Doublings Require Effective multithreaded programming Better communication Faster synchronization More cores Portland State University ECE 588/688 Winter 2018 16

Multi-core Processor Architecture Cache Cache Core0 Core0 Core1 Core4 Core5 Core8 Core9 Core12 Core13 Core2 Core6 Core10 Core14 Core1 Core3 Core7 Core11 Core15 Cache Cache Which design is better? Balance cores, caches, and pin bandwidth Portland State University ECE 588/688 Winter 2018 17

Programming Models Single-core One program runs on core Programmers write efficient programs Architects design for fast execution of instructions Multi-core Each core can run a thread/task/program Programmers can split up their programs? Multiprocessors Shared memory Message passing Threads Data Parallel Portland State University ECE 588/688 Winter 2018 18

Introduction to Parallel Computing Serial vs. Parallel Computation (tutorial pictures) Parallel computing includes Break up problem into smaller discrete parts that can be solved concurrently Break each part further into a sequence of (serial) instructions All parts are run simultaneously on different CPUs Why use parallel computing? Save time Solve larger problems Provide concurrency (do more than one thing at the same time) Save money (use multiple cheaper resources vs. a single expensive supercomputer) Use more memory than available on a single computer Use non-local resources Who uses parallel computing and why? (tutorial graphs) Portland State University ECE 588/688 Winter 2018 19

Reading Assignment Introduction to Parallel Computing, Lawrence Livermore National Laboratory tutorial Read before Wed class Wed 1/17: Olukotun et al., The Case for a Single-Chip Multiprocessor, ASPLOS 1996 Submit review before the beginning of class: Paper summary Strong points (2-4 points) Weak points (2-4 points) Review due before 5PM on day of class Send by email to aarane AT pdx DOT edu Plain text, no attachments Subject line has to be: ECE588_REVIEW_01_17 Portland State University ECE 588/688 Winter 2018 20