Computer Organization and Components
|
|
- Alfred Atkinson
- 5 years ago
- Views:
Transcription
1 Computer Organization and Components Course Structure IS500, fall 05 Lecture :, Concurrency,, and ILP Module : C and Assembly Programming L L L L Module : Processor Design X LAB S LAB L9 L5 Assistant Research ngineer, University of California, Berkeley L6 X LAB L L8 X X LAB L Proj. xpo Slides version.0 L X5 I Abstractions in Computer Systems Agenda Networked Systems and Systems of Systems Application Software Software Operating System Instruction Set Architecture Hardware/Software Interface I Microarchitecture Logic and Building Blocks Digital Hardware Design Digital Circuits Analog Circuits Devices and Physics Analog Design and Physics I S L Computer System LAB6 Module 6: Parallel Processors and Programs Module : Logic Design L7 LAB5 Module 5: Memory Hierarchy Module : I/O Systems Associate Professor, KTH Royal Institute of Technology S L0 I
2 5 How is this computer revolution possible? (Revisited) 6 Moore s law: Integrated circuit resources (transistors) double every 8- months. By Gordon. Moore, Intel s co-founder, 960s. Possible because refined manufacturing process..g., th generation Intel Core i7 processors uses nm manufacturing. Sometimes considered a self-fulfilling prophecy. Served as a goal for the semiconductor industry. Acknowledgement: The structure and several of the good examples are derived from the book Computer Organization and Design (0) by David A. Patterson and John L. Hennessy I Have we reached the limit? (Revisited) Why? The Power Wall I 7 During the last decade, the clock rate has increased dramatically. 989: 8086, 5MHz 99: Pentium, 66Mhz 997: Pentium Pro, 00MHz 00: Pentium,.0 GHz 00: Pentium,.6 GHz 8 What is a multiprocessor? A multiprocessor is a computer system with two or more processors. By contrast, a computer with one processor is called a uniprocessor. Multicore microprocessors are multiprocessors where all processors (cores) are located on a single integrated circuit. by ric Gaba, CC BY-SA.0. No modifications made. 0: Core i7,. GHz - GHz image=8&picture=tegelvagg Increased clock rate implies increased power We cannot cool the system enough to increase the clock rate anymore New trend since 006: Multicore Moore s law still holds More processors on a chip: multicore New challenge: parallel programming I A cluster is a set of computers that are connected over a local area network (LAN). May be viewed as one large multiprocessor. Photo by Robert Harker I
3 Different Kinds of Computer Systems (Revisited) 9 0 Why multiprocessors? Possible to execute many computation tasks in parallel. Performance Multiprocessor Photo by Robert Harker Photo by Kyro mbedded Real-Time Systems Personal Computers and Personal Mobile Devices Dependability nergy Dependability Performance I I How much can we improve the performance using parallelization? = Software Serial Sequential Concurrent xample: A Linux OS running on a unicore processor. xample: matrix multiplication on a multicore processor. xecution time after improvement Still increased speedup, but less efficient I Tafter xecution time of one program before improvement Linear speedup (or ideal speedup) xample: A Linux OS running on a multicore processor. Tbefore Superlinear speedup. ither wrong, or due to e.g. cache effects. Parallel Hardware Note: As always, everybody does not agree on the definitions of concurrency and parallelism. The matrix is from H&P 0 and the informal definitions above are similar to what was said in a talk by Rob Pike. If one out of N processors fails, still N- processors are functioning. Concurrency is about handling many things at the same time. Concurrency may be viewed from the software viewpoint. xample: matrix multiplication on a unicore processor. Replace energy inefficient processors in data centers with many efficient smaller processors. Warehouse Scale Computers and Concurrency what is the difference? is about doing (executing) many things at the same time. may be viewed from the hardware viewpoint. nergy Number of processors Danger: Relative speedup measures only the same program True speedup compares also with the best known sequential program, I
4 Amdahl s Law (/) Amdahl s Law (/) Can we achieve linear speedup? Divide execution time before improvement into two parts. xecution time after improvement = T after T = T affected + T unaffected T after = = Time affected by the improvement of parallelization T affected N T affected N + T unaffected + T unaffected Time unaffected of improvement (sequential part) Amount of improvement (N times improvement) This is sometimes referred to as Amdahl s law = T after = T affected xercise: Assume a program consists of an image analysis task, sequentially followed by a statistical computation task. Only the image analysis task can be parallelized. How much do we need to improve the image analysis task to be able to achieve times speedup? Assume that the program takes 80ms in total and that the image analysis task takes 60ms out of this time. N + T unaffected Solution: = 80 / (60 / N ) 60/N + 0 = 0 60/N = 0 It is impossible to achieve this speedup! I I Amdahl s Law (/) 5 Amdahl s Law (/) 6 = T after = T affected N Assume that we perform 0 scalar integer additions, followed by one matrix addition, where matrices are 0x0. Assume additions take the same amount of time and that we can only parallelize the matrix addition. xercise A: What is the speedup with 0 processors? xercise B: What is the speedup with 0 processors? xercise C: What is the maximal speedup (the limit when N à infinity) + T unaffected Solution A: (0+0*0) / (0*0/0 + 0) = 5.5 Solution B: (0+0*0) / (0*0/0 + 0) = 8.8 Solution C: (0+0*0) / ((0*0)/N + 0) = when N à infinity I xample continued. What if we change the size of the problem (make the matrices larger)? Size of matrices 0x0 0x0 Number of processors But was not the maximal speedup when N à infinity? Strong scaling = measuring speedup while keeping the problem size fixed. Weak scaling = measuring speedup when the problem size grows proportionally to the increased number of processors. I
5 7 8 Main Classes of s Task-Level (TLP) Different tasks of work that can work in independently and in parallel xample Many tasks at the farm Assume that there are many different things that can be done on the farm (fix the barn, sheep shearing, feed the pigs etc.) Task-level parallelism would be to let the farm hands do the different tasks in parallel. I An old (from the 960s) but still very useful classification of processors uses the notion of instruction and data streams. Data-level parallelism. xamples are multimedia extensions (e.g., SS, streaming SIMD extension), vector processors. Data Stream Single Single Multiple TLP Data-Level (DLP) Many data items can be processed at the same time. xample Sheep shearing Assume that sheep are data items and the task for the farmer is to do sheep shearing (remove the wool). Data-level parallelism would be the same as using several farm hands to do the shearing. Instruction Stream DLP SISD, SIMD, and MIMD Multiple SISD SIMD.g. Intel Pentium.g. SS Instruction in x86 MISD MIMD No examples today.g. Intel Core i7 Graphical Unit Processors (GPUs) are both SIMD and MIMD Task-level parallelism. xamples are multicore and cluster computers Physical Q/A What is a modern Intel CPU, such as Core i7? Stand for MIMD, on the table for SIMD I 9 0 What is? I (ILP) may increase performance without involvement of the programmer. It may be implemented in a SISD, SIMD, and MIMD computer. Two main approaches:. Deep pipelines with more pipeline stages If the length of all pipeline stages are balanced, we may increase performance by increasing the clock speed.. Multiple issue A technique where multiple instructions are issued in each in cycle. ILP may decrease the CPI to lower than, or using the inverse metric instructions per clock cycle (IPC) increase it above. Acknowledgement: The structure and several of the good examples are derived from the book Computer Organization and Design (0) by David A. Patterson and John L. Hennessy I I
6 Multiple Issue Approaches Static Multiple Issue (/) VLIW Problems with Multiple Issue Determining how many and which instructions to issue in each cycle. Problematic since instructions typically depend on each other. How to deal with data and control hazards. The two main approaches of multiple issue are. Static Multiple Issue Decisions on when and which instructions to issue at each clock cycle is determined by the compiler.. Dynamic Multiple Issue Many of the decisions of issuing instructions are made by the processor, dynamically, during execution. Very Long Instruction Word (VLIW) processors issue several instructions in each cycle using issue packages. add $s0, $s, $s add $t0, $t0, $0 and $t, $t, $s0 lw $t0, 0($s0) F D M W F D M W F D M W F D M W The compiler may insert no-ops to avoid hazards. An issue package may be seen as one large instruction with multiple operations. ach issue package has two issue slots. How is VLIW affecting the hardware implementation? I I PC next A Static Multiple Issue (/) Changing the hardware To issue two instructions in each cycle, we need the following (not shown in picture): 0 RD Instruction Memory + Fetch and decode 6-bit (two instructions) register file W A W RD A A RD RD 0 A WD 0:6 5: Sign xtend 0 << ALU + Double the number of ports for the Data Memory WD 0 Add another ALU Static Multiple Issue (/) VLIW xercise: Assume we have a 5-stage VLIW addi $s, $0, 0 MIPS processor that can issue two instructions L: lw $t0, 0($s) in the same clock cycle. For the following code, ori $t, $t0, 7 schedule the instructions into issue slots and sw $t, 0($s) compute IPC. addi $s, $s, addi $s, $s, - Solution. IPC = ( + 6*0) / ( + *0) =.87 bne $s, $0, L (max ) Slot Slot Cycle Reschedule to addi $s, $0, 0 avoid stalls because of lw L lw $t0, 0($s) addi $s, $s, addi $s, $s, - ori $t, $t0, 7 bne $s, $0, L sw $t, -($s) 5 mpty cells means no-ops I I
7 Dynamic Multiple Issue Processors (/) Superscalar Processors 5 Dynamic Multiple Issue Processors (/) Out-of-Order xecution, RAW, WAR 6 In many modern processors (e.g., Intel Core i7), instruction issuing is performed dynamically by the processor while executing the program. Such processor is called superscalar. Instruction Fetch and Decode unit RS RS RS RS FU FU FU FU Results are sent back to RS (if a unit waits on the operand) and to the commit unit. Commit Unit I The first unit fetches and decodes several instruction in-order Reservation Stations (RS) buffer operands to the FU before execution. Data dependencies are handled dynamically. Functional Units (FU) execute the instruction. xamples are integer units, floating point units, and load/store units. The commit unit commits instructions (stores in registers) conservatively, respecting the observable order of instructions. Called in-order commit. If the superscalar processor can reorder the instruction execution order, it is an out-of-order execution processor. lw $t0, 0($s) addi $t, $t0, 7 sub $t0, $t, $s0 addi $t, $s0, 0 addi $r, $s0, 0 sub $t0, $t, $s0 xample of a Read After Write (RAW) dependency (dependency on $t0). The superscalar pipeline must make sure that the data is available before read. xample of a Write After Read (WAR) dependency (dependency on $t). If the order is flipped due to out-of-order execution, we have a hazard. WAR dependencies can be resolved using register renaming, where the processor writes to a nonarchitectural renaming register (here in the pseudo asm code called $r, not accessible to the programmer) Note that out-of-order processor is not rewriting the code, but keeps track of the renamed registers during execution. I Some observations on Multi-Issue Processors 7 Intel Microprocessors, some examples 8 Multi Issue Processors VLIW processors tend to be more energy efficient than superscalar out-of-order processors (less hardware, the compiler does the job) Superscalar processors with dynamic scheduling can hide some latencies that are not statically predictable (e.g., cache misses, dynamic branch predictions). Processor Intel 86 Intel Pentium Intel Pentium Pro Intel Pentium Willamette Intel Pentium Prescott Intel Core Intel Core i5 Nehalem Intel Core i5 Ivy Bridge Year Clock Rate 5 MHz 66 MHz 00 MHz 000 MHz 600 MHz 90 MHz 00 MHz 00 MHz Pipeline Stages Issue Cores Power Width 8 5 W 0W 9 W 75W 0W 75W 87W 77W Although modern processors issues to 6 instructions per clock cycle, few applications results in IPC over. The reason is dependencies. Clock rate increase stopped (the power wall) around 006 Pipeline stages first increased The power peaked and then decreased, but the with Pentium number of cores increased after 006. Source: Patterson and Hennessey, 0, page. I I
8 9 0 Time-Aware Systems - xamples Cyber-Physical Systems (CPS) Bonus Part Time-Aware Systems Design Research Challenges KTH Process Industry and Industrial Automation Aircraft (not part of the xamination) Time-Aware Simulation Systems Time-Aware Distributed Systems Time-stamped distributed systems (.g. Google Spanner) Physical simulations (Simulink, Modelica, etc.) I I Programming Model and Time What is our goal? verything should be made as simple as possible, but not simpler attributed to Albert instein Timing is not part of the software semantics Correct execution of programs (e.g., in C, C++, C#, Java, Scala, Haskell, OCaml) has nothing to do with how long time things takes to execute. Traditional Approach xecution time should be as short as possible, but not shorter Programming Model Objective: Minimize area, memory, energy. Minimize the slack No point in making the execution time shorter, as long as the deadline is met. Deadline Task Challenge: Still guarantee to meet all timing constraints. Slack I Timing Dependent on the Hardware Platform Our Objective Programming Model Make time an abstraction within the programming model Timing is independent of the hardware platform (within certain constraints) I
9 Are you interested to be challenged? If you are interested in You know the drill Programming language design, or Compilers, or keep calm Computer Architecture You are ambitious and interested in learning new things You want to do a real research project as part of you Bachelor s or Master s thesis project Please send an to, so that we can discuss some ideas. I I 5 6 Reading Guidelines Summary Module 6: Parallel Processors and Programs Some key take away points: Lecture :, Concurrency,, and ILP H&H Chapter.8,.6, P&H5 Chapters.7-.8,.0,.0, or P&H Chapters.5-.6,.8,.0, Amdahl s law can be used to estimate maximal speedup when introducing parallelism in parts of a program. Lecture : SIMD, MIMD, and Parallel Programming H&H Chapter P&H5 Chapters.,.6, 5.0, or P&H Chapters.,,6, 5.8, (ILP) has been very important for performance improvements over the years, but improvements have not been as significant lately. Reading Guidelines See the course webpage for more information. Thanks for listening! I I
Lec 25: Parallel Processors. Announcements
Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationAdvanced Instruction-Level Parallelism
Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu
More information45-year CPU Evolution: 1 Law -2 Equations
4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing
More informationMulticore and Parallel Processing
Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement
More informationFundamentals of Computer Design
Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationFundamentals of Computers Design
Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationEN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)
EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering
More informationCOMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: C Multiple Issue Based on P&H Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationComputer Hardware Engineering
Computer Hardware ngineering IS2, spring 25 Lecture 6: Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides version.
More informationEECS4201 Computer Architecture
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be
More informationCS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction
CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design
ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationWritten Exam / Tentamen
Written Exam / Tentamen Computer Organization and Components / Datorteknik och komponenter (IS1500), 9 hp Computer Hardware Engineering / Datorteknik, grundkurs (IS1200), 7.5 hp KTH Royal Institute of
More informationEN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design
EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown
More informationParallelism, Multicore, and Synchronization
Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon, Bala, Bracy, McKee, and Sirer, Roth, Martin] xkcd/619 3 Big Picture: Multicore
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationCourse II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan
Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology
More informationChapter 7. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 7 <1>
Chapter 7 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 7 Chapter 7 :: Topics Introduction (done) Performance Analysis (done) Single-Cycle Processor
More informationComputer Organization and Components
2 Course Structure Computer Organization and Components Module 4: Memory Hierarchy Module 1: Logic Design IS1500, fall 2014 Lecture 4: and F1 DC Ö1 F2 DC Ö2 F7b Lab: dicom F8 Module 2: C and Associate
More informationTHREAD LEVEL PARALLELISM
THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture
More informationCS 152, Spring 2011 Section 8
CS 152, Spring 2011 Section 8 Christopher Celio University of California, Berkeley Agenda Grades Upcoming Quiz 3 What it covers OOO processors VLIW Branch Prediction Intel Core 2 Duo (Penryn) Vs. NVidia
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology
More informationOnline Course Evaluation. What we will do in the last week?
Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationLecture 26: Parallel Processing. Spring 2018 Jason Tang
Lecture 26: Parallel Processing Spring 2018 Jason Tang 1 Topics Static multiple issue pipelines Dynamic multiple issue pipelines Hardware multithreading 2 Taxonomy of Parallel Architectures Flynn categories:
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4 The Processor (Part 4)
Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline
More informationNOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline
CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism
More informationAdapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK]
Review and Advanced d Concepts Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Pipelining Review PC IF/ID ID/EX EX/M
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationCPU Architecture Overview. Varun Sampath CIS 565 Spring 2012
CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations
More informationExercises 6 Parallel Processors and Programs
Exercises 6 Parallel Processors and Programs Computer Organization and Components / Datorteknik och komponenter (IS1500), 9 hp Computer Hardware Engineering / Datorteknik, grundkurs (IS1200), 7.5 hp KTH
More informationChapter 4 The Processor 1. Chapter 4D. The Processor
Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 39 Intra-machine Parallelism 2010-04-30!!!Head TA Scott Beamer!!!www.cs.berkeley.edu/~sbeamer Old-Fashioned Mud-Slinging with
More informationCPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor
Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction
More informationADVANCED COMPUTER ARCHITECTURES: Prof. C. SILVANO Written exam 11 July 2011
ADVANCED COMPUTER ARCHITECTURES: 088949 Prof. C. SILVANO Written exam 11 July 2011 SURNAME NAME ID EMAIL SIGNATURE EX1 (3) EX2 (3) EX3 (3) EX4 (5) EX5 (5) EX6 (4) EX7 (5) EX8 (3+2) TOTAL (33) EXERCISE
More information5008: Computer Architecture
5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage
More informationIssues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationExploitation of instruction level parallelism
Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering
More informationIF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4
12 1 CMPE110 Fall 2006 A. Di Blas 110 Fall 2006 CMPE pipeline concepts Advanced ffl ILP ffl Deep pipeline ffl Static multiple issue ffl Loop unrolling ffl VLIW ffl Dynamic multiple issue Textbook Edition:
More informationUG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects
Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer
More informationMultithreading: Exploiting Thread-Level Parallelism within a Processor
Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced
More informationCPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor
1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic
More informationILP: Instruction Level Parallelism
ILP: Instruction Level Parallelism Tassadaq Hussain Riphah International University Barcelona Supercomputing Center Universitat Politècnica de Catalunya Introduction Introduction Pipelining become universal
More informationCourse on Advanced Computer Architectures
Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationParallel Programming
Parallel Programming Introduction Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Acknowledgements Prof. Felix Wolf, TU Darmstadt Prof. Matthias
More informationEECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)
Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static
More informationEN164: Design of Computing Systems Lecture 24: Processor / ILP 5
EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationParallel Systems I The GPU architecture. Jan Lemeire
Parallel Systems I The GPU architecture Jan Lemeire 2012-2013 Sequential program CPU pipeline Sequential pipelined execution Instruction-level parallelism (ILP): superscalar pipeline out-of-order execution
More informationHomework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures
Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationKeywords and Review Questions
Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain
More informationComputer Organization and Components
Computer Organization and Components IS1500, fall 2016 Lecture 2: Assembly Languages Associate Professor, KTH Royal Institute of Technology Slides version 1.0 2 Course Structure Module 1: C and Assembly
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationEIE/ENE 334 Microprocessors
EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationFundamentals of Quantitative Design and Analysis
Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications
More informationLimitations of Scalar Pipelines
Limitations of Scalar Pipelines Superscalar Organization Modern Processor Design: Fundamentals of Superscalar Processors Scalar upper bound on throughput IPC = 1 Inefficient unified pipeline
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c Review! UC Berkeley CS61C : Machine Structures Lecture 28 Intra-machine Parallelism Parallelism is necessary for performance! It looks like itʼs It is the future of computing!
More informationAdvanced processor designs
Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The
More informationINSTRUCTION LEVEL PARALLELISM
INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationCourse web site: teaching/courses/car. Piazza discussion forum:
Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start
More informationComputer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)
18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures
More informationComputer Architecture
Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationThe University of Texas at Austin
EE382 (20): Computer Architecture - Parallelism and Locality Lecture 4 Parallelism in Hardware Mattan Erez The University of Texas at Austin EE38(20) (c) Mattan Erez 1 Outline 2 Principles of parallel
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationLecture 21: Parallelism ILP to Multicores. Parallel Processing 101
18 447 Lecture 21: Parallelism ILP to Multicores S 10 L21 1 James C. Hoe Dept of ECE, CMU April 7, 2010 Announcements: Handouts: Lab 4 due this week Optional reading assignments below. The Microarchitecture
More informationIn-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution
In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationDYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING
DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,
More informationomputer Design Concept adao Nakamura
omputer Design Concept adao Nakamura akamura@archi.is.tohoku.ac.jp akamura@umunhum.stanford.edu 1 1 Pascal s Calculator Leibniz s Calculator Babbage s Calculator Von Neumann Computer Flynn s Classification
More informationMulticore Hardware and Parallelism
Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3
More informationAdvanced Computer Architecture
Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I
More informationChapter 4. The Processor
Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA
More information