Suggested Readings! Lecture 24" Parallel Processing on Multi-Core Chips! Technology Drive to Multi-core! ! Readings! ! H&P: Chapter 7! vs.! CSE 30321!

Similar documents
Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"

Huh? Lecture 01 Introduction to CSE You can learn about good routes to run if you!re visiting Chicago...

! Readings! ! Room-level, on-chip! vs.!

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Multicore Hardware and Parallelism

Question?! Processor comparison!

THREAD LEVEL PARALLELISM

Parallelism in Hardware

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Twos Complement Signed Numbers. IT 3123 Hardware and Software Concepts. Reminder: Moore s Law. The Need for Speed. Parallelism.

Lecture 25: Board Notes: Threads and GPUs

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Parallel Processing SIMD, Vector and GPU s cont.

Computer Architecture Crash course

CS/EE 6810: Computer Architecture

Computer Systems Architecture

Multiprocessors and Locking

45-year CPU Evolution: 1 Law -2 Equations

Lecture 28 Introduction to Parallel Processing and some Architectural Ramifications. Flynn s Taxonomy. Multiprocessing.

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Lecture 27 Programming parallel hardware" Suggested reading:" (see next slide)"

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multithreaded Processors. Department of Electrical Engineering Stanford University

EECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont

Computer Systems Architecture

Introducing Multi-core Computing / Hyperthreading

ECE 486/586. Computer Architecture. Lecture # 2

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Written Exam / Tentamen

Hardware-Based Speculation

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

ECE/CS 250 Computer Architecture. Summer 2016

Heterogenous Computing

Parallelism via Multithreaded and Multicore CPUs. Bradley Dutton March 29, 2010 ELEC 6200

An Introduction to Parallel Programming

Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals


CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

Lecture 18: Multithreading and Multicores

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Computers: Inside and Out

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

CSE502 Graduate Computer Architecture. Lec 22 Goodbye to Computer Architecture and Review

Parallel Systems I The GPU architecture. Jan Lemeire

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Online Course Evaluation. What we will do in the last week?

Parallelism. CS6787 Lecture 8 Fall 2017

WHY PARALLEL PROCESSING? (CE-401)

CMSC 411 Computer Systems Architecture Lecture 2 Trends in Technology. Moore s Law: 2X transistors / year

ECEN 449 Microprocessor System Design. Memories

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Fundamentals of Computer Design

Very Large Scale Integration (VLSI)

More Course Information

Computer Architecture

CS6303-COMPUTER ARCHITECTURE UNIT I OVERVIEW AND INSTRUCTIONS PART A

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

HW Trends and Architectures

Lecture 18: DRAM Technologies

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

Simultaneous Multithreading Architecture

EECS 452 Lecture 9 TLP Thread-Level Parallelism

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems

Write only as much as necessary. Be brief!

Lecture 14: Memory Hierarchy. James C. Hoe Department of ECE Carnegie Mellon University

Sense Amplifiers 6 T Cell. M PC is the precharge transistor whose purpose is to force the latch to operate at the unstable point.

EEC 483 Computer Organization

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

Computer Architecture

Microprocessor Trends and Implications for the Future

CS 654 Computer Architecture Summary. Peter Kemper

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

Chapter 1: Introduction to the Microprocessor and Computer 1 1 A HISTORICAL BACKGROUND

Integrated circuits and fabrication

ECE 587 Advanced Computer Architecture I

ECE 588/688 Advanced Computer Architecture II

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Concurrency & Parallelism, 10 mi

administrivia final hour exam next Wednesday covers assembly language like hw and worksheets

Parallel Algorithm Engineering

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Assuming ideal conditions (perfect pipelining and no hazards), how much time would it take to execute the same program in: b) A 5-stage pipeline?

CSE502: Computer Architecture CSE 502: Computer Architecture

Keywords and Review Questions

Lecture-14 (Memory Hierarchy) CS422-Spring

Chapter 2 Parallel Hardware

Multi-core Programming Evolution

Transcription:

1! 2! Suggested Readings!! Readings!! H&P: Chapter 7!! (Over next 2 weeks)! Lecture 24" Parallel Processing on Multi-Core Chips! 3! Processor components! Multicore processors and programming! Processor comparison! vs.! Goal: Explain and articulate why modern microprocessors now have more than one core and how software must adapt to accommodate the now prevalent multicore approach to computing. " CSE 30321! Writing more! efficient code! The right HW for the right application! Technology Drive to Multi-core! HLL code translation! 4!

5! Transistors used to manipulate/store 1s & 0s! NMOS! 0 (0V) Switch-level representation! 1 (5V) Cross-sectional view! (can act as a capacitor storing charge)! Using above diagrams as context, note that if we (i) apply a suitable voltage to the gate & (ii) then apply a suitable voltage between source and drain, current will flow.! 6! Moore#s Law!! Cramming more components onto integrated circuits.!!!!!!!- G.E. Moore, Electronics 1965!! Observation: DRAM transistor density doubles annually!! Became known as Moore#s Law!! Actually, a bit off:!! Density doubles every 18 months (now more like 24)!! (in 1965 they only had 4 data points!)!! Corollaries:!! Cost per transistor halves annually (18 months)!! Power per transistor decreases with scaling!! Speed increases with scaling!! Of course, it depends on how small you try to make things!»! (I.e. no exponential lasts forever)! Remember these!! 7! Previous Industry Projections! 8! A funny thing happened on the way to 45 nm!!speed increases with scaling...! 2005 projection was for 5.2 GHz - and we didn#t make it in production. Further, we#re still stuck at 3+ GHz in production.!

9! A funny thing happened on the way to 45 nm!!power decreases with scaling...! 10! A bit on device performance...!! One way to think about switching time:!! Charge is carried by electrons!! Time for charge to cross channel = length/speed!! = L 2 /(mvds)!! What about power (i.e. heat)?!! Dynamic power is:! Pdyn = CLVdd 2 f0-1! Thus, to make a device faster, we want to either increase Vds or decrease feature sizes (i.e. L)!! CL = (eoxwl)/d!! eox = dielectric, WL = parallel plate area, d = distance between gate and substrate! Summary of relationships!! (+)!If V increases, speed (performance) increases!! (-)!If V increases, power (heat) increases!! (+)!If L decreases, speed (performance) increases!! (?)!If L decreases, power (heat) does what?!! P could improve because of lower C!! P could increase because >> # of devices switch!! P could increase because >> # of devices switch faster!! 11! 12! A funny thing happened on the way to 45 nm!!speed increases with scaling...!!power decreases with scaling...! Why the clock flattening? POWER!!!!! Need to carefully consider tradeoffs between speed and heat!

13! 14! (Short term?) Solution!! Processor complexity is good enough!! Transistor sizes can still scale!! Slow processors down to manage power!! Get performance from...! Parallelism! Are there design problems and issues unique to parallel processing on multi-core chips?! (i.e. 1 processor, 1 ns clock cycle! vs.! 2 processors, 2 ns clock cycle)! Issues!! Not that much different than those listed earlier:!! Cache Coherency!! Contention!! Latency!! Reliability!! Languages!! Algorithms!! In order of priority!! Algorithms / Languages!! Contention / Latency!! Cache coherency! 15! 16! Are there parallel processing models more suitable to chip-level systems?!

Multithreading!! Idea:!! Performing multiple threads of execution in parallel!! Replicate registers, PC, etc.!! Fast switching between threads!! Flavors:!! Fine-grain multithreading!! Switch threads after each cycle!! Interleave instruction execution!! If one thread stalls, others are executed!! Coarse-grain multithreading!! Only switch on long stall (e.g., L2-cache miss)!! Simplifies hardware, but doesn#t hide short stalls (e.g., data hazards)!! SMT (Simultaneous Multi-Threading)!! Especially relevant for superscalar!! Refer to this picture:! Board examples! Part D, E! Impact of modern processing principles" (Lots of state )!! User:!! state used for application execution!! Supervisor:!! state used to manage user state!! Machine:!! state that configures the system!! Transient:!! state used during instruction execution!! Access-Enhancing:!! state used to simplify translation of other state names!! Latency-Enhancing:!! state used to reduce latency to other state values! Impact of modern processing principles" (Total State vs. Time)! Estimated State (k bits) 100,000.00 10,000.00 1,000.00 100.00 10.00 1.00 0.10 0.01 1970 1980 1990 2000 Total State Machine Supervisor User Transient Latency Enhancing Access Enhancing

Or can do both! Real life examples!