History Importance Comparison to Tomasulo Demo Dataflow Overview: Static Dynamic Hybrids Problems: CAM I-Structures Potential Applications Conclusion

Size: px
Start display at page:

Download "History Importance Comparison to Tomasulo Demo Dataflow Overview: Static Dynamic Hybrids Problems: CAM I-Structures Potential Applications Conclusion"

Transcription

1 By Matthew Johnson

2 History Importance Comparison to Tomasulo Demo Dataflow Overview: Static Dynamic Hybrids Problems: CAM I-Structures Potential Applications Conclusion

3 st Scoreboard Computer: CDC Tomasulo MIT Static Dataflow Processors: Tokens trigger next instruction Kahn Dataflow Processors: Sequential Processes communicate with each via messages through FIFO s (Hybrid) MIT Dynamic Dataflow Processors 1985 Exception Solutions in Out of Order Execution : Smith & Pleszkun devise precise exception handling Restricted Out of Order Execution > RISC: Yale Pratt publishes paper on in which he simulates a restricted OO architecture and it outperforms RISC benchmarks 1988 MIT Monsoon Processors 1990 IBM POWER1: 1 st Out of Order Microprocessor (floating point only) 1995 Out of Order goes Mainstream KEY Data Flow Hybrid Out of Order

4 Naturally latency tolerant because of its ability to dynamically switch between threads Eliminate processor switching problems Eliminates Data dependencies Naturally takes advantage of Inherent parallelism in code

5

6 Token- Output of an operation and its destination Operation Packet- contains instructions, destination and 2 operands Instruction Cell- The location in main memory that holds the operation packet

7 Y=4 Y = X = 4 * Y Z = 4 * X Memory Distribution Network Y=4 Multiple X 4 Y ADD Y 1 3 Arbitration Network Y=4 Multiple Z 4 Y

8 Y = X = 4 * Y Z = 4 * X Memory Distribution Network Multiple X 4 4 Multiple Z 4 4 Arbitration Network

9 X execution X SWITCH T F T execution SWITCH T F X execution X SWITCH T F F execution SWITCH T F MERGE node SWITCH node

10 x n i SWITCH T F b P n j n k f g X

11 Describes dataflow graph Data tokens propagate along the arc Single Assignment Rule: variables can only be assigned once Examples: VAL, LUCID, Mean = (x + y + z)/3; StDev = SQRT( (x^2 + y^2 + z^2 )/3 Mean^2); z y x Mean sqrt StDev

12 Static Dynamic

13 Allows at most 1 data token on the arc of data flow To Ensure 1 token per arc, must use hand shaking Can only run 1 iteration of a loop at a time Static Pipeline: o Check instructions for operands o Select subset of available instruction o Update instructions that are complete Example: MIT Static Dataflow Machines x 2 * sqrt n i n j y 3 z data token acknowledge signal data arc acknowledgement arc

14 Pros: Simple Model Cons: Loops can not be run in parallel Token Traffic Double Hand shaking wastes time No support for procedure calls

15 Allows multiple data tokens on the arc of data flow Allows multiple loop iterations at one time An extra tag is added to tokens and contains address context and loop number Instruction Cells are changed to accept multiple tokens of different tags Overall effect is the creation of reentrant subgraphs Examples: MIT Tagged-Token Data Flow, Manchester Dataflow

16 Pros: Greater Performance( No handshaking, multiple tokens/arc, loops can run in parallel) Supports procedure calls Cons: Tokens need to be buffered because matching tokens takes LOTS of time(large Buffer Needed) No instruction locality (can t use registers)

17 Threaded Dataflow- subgraphs with low degrees of parallelisms are converted into sequential thread Monsoon- Replaces associative memory with ram Uses eight stage pipeline to inject instructions and execute enabled instructions

18 Lack of locality make current storage hierarchy ineffective Matching tokens takes a large amount of time and requires buffering Building CAMs large enough to hold dependencies of a real program Data Structures are difficult to implement

19 Content Addressable Memory Read- returns the address of the data Size is small, RAM is 8X (DRAM: 64 MB CAM: 8MB single chip ) Expensive Large footprint Excessive power

20 Main problem: every write to a structure requires the entire structure to be rewritten Solution: Define size of data structure Each element has a series of status bits (states) Can only write an element once

21 Digital Signal Processing Network routing Graphics processing Telemetry Data warehousing Everywhere Parallel computing is used

22 Potential to unlock the inherent parallelism of the algorithm Naturally eliminates data hazards Current technologies in CAM are not advanced enough to machines based in RAM Serious Problems still remain such as implementing data structures

23

24 SILC, JURIJ, BORUT ROBIC, and THEO UNGERER, eds. "DATAFLOW ARCHITECTURES."DATAFLOW ARCHITECTURES. Jožef Stefan Institute. Web. 7 May Wikipedia contributors. "DATAFLOW ARCHITECTURES." Wikipedia, The Free Encyclopedia. < Wikipedia contributors. "Out of Order Execution."Wikipedia, The Free Encyclopedia. < K. Pagiamtzis, A. Sheikholeslami, Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey, IEEE J. of Solid-state circuits. March 2006 Rajaraman, V. Elements of Parallel Computing. EE. New Delhi: PHI Learning, Web. < 02&dq=data flow "acknowledgement tokens&source=bl&ots=1akpb6pd8t&sig=mpfuclfj1ogbpjo3sts1byzbs8y &hl=en&sa=x&ei=ksyjuc23mijm0wh_iyhoda&ved=0ceqq6aewba Dennis, Jack, and David Misunas. "A preliminary architecture for a basic data-flow processor." ACM SIGARCH Computer Architecture News. 3.4 (1974): Web. 7 May < Najjar, Walid, Edward Lee, and Guang Gao. "Advances in the Dataflow Computational Model." Parallel Computing. 25. (1999): Web. 7 May

Computer Architecture: Dataflow/Systolic Arrays

Computer Architecture: Dataflow/Systolic Arrays Data Flow Computer Architecture: Dataflow/Systolic Arrays he models we have examined all assumed Instructions are fetched and retired in sequential, control flow order his is part of the Von-Neumann model

More information

Portland State University ECE 588/688. Dataflow Architectures

Portland State University ECE 588/688. Dataflow Architectures Portland State University ECE 588/688 Dataflow Architectures Copyright by Alaa Alameldeen and Haitham Akkary 2018 Hazards in von Neumann Architectures Pipeline hazards limit performance Structural hazards

More information

Dataflow Architectures. Karin Strauss

Dataflow Architectures. Karin Strauss Dataflow Architectures Karin Strauss Introduction Dataflow machines: programmable computers with hardware optimized for fine grain data-driven parallel computation fine grain: at the instruction granularity

More information

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University

More information

Dynamic Dataflow. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Dynamic Dataflow. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology Dynamic Dataflow Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L- Dataflow Machines Static - mostly for signal processing NEC - NEDIP and IPP Hughes, Hitachi,

More information

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Appendix C Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure C.2 The pipeline can be thought of as a series of data paths shifted in time. This shows

More information

Dennis [1] developed the model of dataflowschemas, building on work by Karp and Miller [2]. These dataflow graphs, as they were later called, evolved

Dennis [1] developed the model of dataflowschemas, building on work by Karp and Miller [2]. These dataflow graphs, as they were later called, evolved Predictive Block Dataflow Model for Parallel Computation EE382C: Embedded Software Systems Literature Survey Vishal Mishra and Kursad Oney Dept. of Electrical and Computer Engineering The University of

More information

Dataflow Processing. A.R. Hurson Computer Science Department Missouri Science & Technology

Dataflow Processing. A.R. Hurson Computer Science Department Missouri Science & Technology A.R. Hurson Computer Science Department Missouri Science & Technology hurson@mst.edu 1 Control Flow Computation Operands are accessed by their addresses. Shared memory cells are the means by which data

More information

EE414 Embedded Systems Ch 5. Memory Part 2/2

EE414 Embedded Systems Ch 5. Memory Part 2/2 EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage

More information

Dynamic Issue & HW Speculation. Raising the IPC Ceiling

Dynamic Issue & HW Speculation. Raising the IPC Ceiling Dynamic Issue & HW Speculation Today s topics: Superscalar pipelines Dynamic Issue Scoreboarding: control centric approach Tomasulo: data centric approach 1 CS6810 Raising the IPC Ceiling w/ single-issue

More information

Page 1. Raising the IPC Ceiling. Dynamic Issue & HW Speculation. Fix OOO Completion Problem First. Reorder Buffer In Action

Page 1. Raising the IPC Ceiling. Dynamic Issue & HW Speculation. Fix OOO Completion Problem First. Reorder Buffer In Action Raising the IPC Ceiling Dynamic Issue & HW Speculation Today s topics: Superscalar pipelines Dynamic Issue Scoreboarding: control centric approach Tomasulo: data centric approach w/ single-issue IPC max

More information

Ben Lee* and A. R. Hurson** *Oregon State University Department of Electrical and Computer Engineering Corvallis, Oregon

Ben Lee* and A. R. Hurson** *Oregon State University Department of Electrical and Computer Engineering Corvallis, Oregon ISSUES IN DATAFLOW COMPUTING Ben Lee* and A. R. Hurson** *Oregon State University Department of Electrical and Computer Engineering Corvallis, Oregon 97331-3211 **The Pennsylvania State University Department

More information

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 18-447 Computer Architecture Lecture 15: Load/Store Handling and Data Flow Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 Lab 4 Heads Up Lab 4a out Branch handling and branch predictors

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram Microprocessor Design & Organisation HCA2102 Cache Memory Characteristics Location Unit of transfer Access method Performance Physical type Physical Characteristics UTM-RHH Slide Set 5 2 Location Internal

More information

Haskell to Hardware and Other Dreams

Haskell to Hardware and Other Dreams Haskell to Hardware and Other Dreams Stephen A. Edwards Columbia University June 12, 2018 Popular Science, November 1969 Where Is My Jetpack? Popular Science, November 1969 Where The Heck Is My 10 GHz

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Advanced cache optimizations. ECE 154B Dmitri Strukov

Advanced cache optimizations. ECE 154B Dmitri Strukov Advanced cache optimizations ECE 154B Dmitri Strukov Advanced Cache Optimization 1) Way prediction 2) Victim cache 3) Critical word first and early restart 4) Merging write buffer 5) Nonblocking cache

More information

Precise Exceptions and Out-of-Order Execution. Samira Khan

Precise Exceptions and Out-of-Order Execution. Samira Khan Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take

More information

Computer & Microprocessor Architecture HCA103

Computer & Microprocessor Architecture HCA103 Computer & Microprocessor Architecture HCA103 Cache Memory UTM-RHH Slide Set 4 1 Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation

More information

ECEN 449 Microprocessor System Design. Memories

ECEN 449 Microprocessor System Design. Memories ECEN 449 Microprocessor System Design Memories 1 Objectives of this Lecture Unit Learn about different types of memories SRAM/DRAM/CAM /C Flash 2 1 SRAM Static Random Access Memory 3 SRAM Static Random

More information

MaanavaN.Com CS1202 COMPUTER ARCHITECHTURE

MaanavaN.Com CS1202 COMPUTER ARCHITECHTURE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK SUB CODE / SUBJECT: CS1202/COMPUTER ARCHITECHTURE YEAR / SEM: II / III UNIT I BASIC STRUCTURE OF COMPUTER 1. What is meant by the stored program

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Final Review Shuai Wang Department of Computer Science and Technology Nanjing University Computer Architecture Computer architecture, like other architecture, is the art

More information

Overview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition

Overview IN this chapter we will study. William Stallings Computer Organization and Architecture 6th Edition William Stallings Computer Organization and Architecture 6th Edition Chapter 4 Cache Memory Overview IN this chapter we will study 4.1 COMPUTER MEMORY SYSTEM OVERVIEW 4.2 CACHE MEMORY PRINCIPLES 4.3 ELEMENTS

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic

More information

High Performance Computing. University questions with solution

High Performance Computing. University questions with solution High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The

More information

Chip Multiprocessors A Cost-effective Alternative to Simultaneous Multithreading

Chip Multiprocessors A Cost-effective Alternative to Simultaneous Multithreading Chip Multiprocessors A Cost-effective Alternative to Simultaneous Multithreading BORUT ROBIČ JURIJ ŠILC THEO UNGERER Faculty of Computer and Information Sc. Computer Systems Department Dept. of Computer

More information

ENGIN 112 Intro to Electrical and Computer Engineering

ENGIN 112 Intro to Electrical and Computer Engineering ENGIN 112 Intro to Electrical and Computer Engineering Lecture 30 Random Access Memory (RAM) Overview Memory is a collection of storage cells with associated input and output circuitry Possible to read

More information

Dynamic Scheduling. CSE471 Susan Eggers 1

Dynamic Scheduling. CSE471 Susan Eggers 1 Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip

More information

Fall 2011 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic

Fall 2011 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic Fall 2011 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic Reading: Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000) If memory

More information

Hakam Zaidan Stephen Moore

Hakam Zaidan Stephen Moore Hakam Zaidan Stephen Moore Outline Vector Architectures Properties Applications History Westinghouse Solomon ILLIAC IV CDC STAR 100 Cray 1 Other Cray Vector Machines Vector Machines Today Introduction

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Dynamic Instruction Scheduling with Branch Prediction

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information

ECE 341 Final Exam Solution

ECE 341 Final Exam Solution ECE 341 Final Exam Solution Time allowed: 110 minutes Total Points: 100 Points Scored: Name: Problem No. 1 (10 points) For each of the following statements, indicate whether the statement is TRUE or FALSE.

More information

Topics to be covered. EEC 581 Computer Architecture. Virtual Memory. Memory Hierarchy Design (II)

Topics to be covered. EEC 581 Computer Architecture. Virtual Memory. Memory Hierarchy Design (II) EEC 581 Computer Architecture Memory Hierarchy Design (II) Department of Electrical Engineering and Computer Science Cleveland State University Topics to be covered Cache Penalty Reduction Techniques Victim

More information

Cache Designs and Tricks. Kyle Eli, Chun-Lung Lim

Cache Designs and Tricks. Kyle Eli, Chun-Lung Lim Cache Designs and Tricks Kyle Eli, Chun-Lung Lim Why is cache important? CPUs already perform computations on data faster than the data can be retrieved from main memory and microprocessor execution speeds

More information

Exploiting Locality of Array Data with Parallel Object-Oriented Model for Multithreaded Computation

Exploiting Locality of Array Data with Parallel Object-Oriented Model for Multithreaded Computation Exploiting Locality of Array Data with Parallel Object-Oriented Model for Multithreaded Computation Jonghoon Song, Juno Chang, Sangyong Han Dept. of Computer Science, Seoul National Univ. Seoul, 151-742,

More information

Topic B (Cont d) Dataflow Model of Computation

Topic B (Cont d) Dataflow Model of Computation opic B (Cont d) Dataflow Model of Computation Guang R. Gao ACM ellow and IEEE ellow Endowed Distinguished Professor Electrical & Computer Engineering University of Delaware ggao@capsl.udel.edu 09/07/20

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Topics. Digital Systems Architecture EECE EECE Predication, Prediction, and Speculation

Topics. Digital Systems Architecture EECE EECE Predication, Prediction, and Speculation Digital Systems Architecture EECE 343-01 EECE 292-02 Predication, Prediction, and Speculation Dr. William H. Robinson February 25, 2004 http://eecs.vanderbilt.edu/courses/eece343/ Topics Aha, now I see,

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

Processors. Young W. Lim. May 9, 2016

Processors. Young W. Lim. May 9, 2016 Processors Young W. Lim May 9, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

Lecture 2: Memory Systems

Lecture 2: Memory Systems Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2 Internal and External Memories CPU Date transfer

More information

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation Dynamic Dataflow Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering

More information

Concurrent Models of Computation

Concurrent Models of Computation Concurrent Models of Computation Edward A. Lee Robert S. Pepper Distinguished Professor, UC Berkeley EECS 219D: Concurrent Models of Computation Fall 2011 Copyright 2011, Edward A. Lee, All rights reserved

More information

Binghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter

Binghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter Cached Memory Computer Systems Chapter 6.2-6.5 Cost Speed The Memory Hierarchy Capacity The Cache Concept CPU Registers Addresses Data Memory ALU Instructions The Cache Concept Memory CPU Registers Addresses

More information

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue and Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://wwweecsberkeleyedu/~krste

More information

Portland State University ECE 587/687. Superscalar Issue Logic

Portland State University ECE 587/687. Superscalar Issue Logic Portland State University ECE 587/687 Superscalar Issue Logic Copyright by Alaa Alameldeen, Zeshan Chishti and Haitham Akkary 2017 Instruction Issue Logic (Sohi & Vajapeyam, 1987) After instructions are

More information

Complex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units

Complex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units 6823, L14--1 Complex Pipelining: Out-of-order Execution & Register Renaming Laboratory for Computer Science MIT http://wwwcsglcsmitedu/6823 Multiple Function Units 6823, L14--2 ALU Mem IF ID Issue WB Fadd

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 15 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory Memory Hierarchy Contents Memory System Overview Cache Memory Internal Memory External Memory Virtual Memory Memory Hierarchy Registers In CPU Internal or Main memory Cache RAM External memory Backing

More information

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

ECEN 449 Microprocessor System Design. Memories. Texas A&M University ECEN 449 Microprocessor System Design Memories 1 Objectives of this Lecture Unit Learn about different types of memories SRAM/DRAM/CAM Flash 2 SRAM Static Random Access Memory 3 SRAM Static Random Access

More information

Using a Victim Buffer in an Application-Specific Memory Hierarchy

Using a Victim Buffer in an Application-Specific Memory Hierarchy Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang Depment of lectrical ngineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Depment of Computer Science

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units CSCI 4717/5717 Computer Architecture Topic: Cache Memory Reading: Stallings, Chapter 4 Characteristics of Memory Location wrt Motherboard Inside CPU temporary memory or registers Motherboard main memory

More information

Grassroots ASPLOS. can we still rethink the hardware/software interface in processors? Raphael kena Poss University of Amsterdam, the Netherlands

Grassroots ASPLOS. can we still rethink the hardware/software interface in processors? Raphael kena Poss University of Amsterdam, the Netherlands Grassroots ASPLOS can we still rethink the hardware/software interface in processors? Raphael kena Poss University of Amsterdam, the Netherlands ASPLOS-17 Doctoral Workshop London, March 4th, 2012 1 Current

More information

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory William Stallings Computer Organization and Architecture 8th Edition Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics

More information

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A REVIEWARTICLE OF SDRAM DESIGN WITH NECESSARY CRITERIA OF DDR CONTROLLER Sushmita Bilani *1 & Mr. Sujeet Mishra 2 *1 M.Tech Student

More information

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory hierarchy review. ECE 154B Dmitri Strukov Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal

More information

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Lecture 17 Introduction to Memory Hierarchies Why it s important  Fundamental lesson(s) Suggested reading: (HP Chapter Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I

More information

Computer Architecture and Organization

Computer Architecture and Organization 7-1 Chapter 7 - Memory Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 7 Memory 7-2 Chapter 7 - Memory Chapter Contents 7.1 The Memory Hierarchy 7.2 Random-Access Memory

More information

Memory Hierarchies &

Memory Hierarchies & Memory Hierarchies & Cache Memory CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 4/26/2009 cse410-13-cache 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and

More information

Computer Architecture: Out-of-Order Execution II. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Out-of-Order Execution II. Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Out-of-Order Execution II Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-447 Spring 2013, Computer Architecture, Lecture 15 Video

More information

A Mechanism for Verifying Data Speculation

A Mechanism for Verifying Data Speculation A Mechanism for Verifying Data Speculation Enric Morancho, José María Llabería, and Àngel Olivé Computer Architecture Department, Universitat Politècnica de Catalunya (Spain), {enricm, llaberia, angel}@ac.upc.es

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences

More information

Unit 2. Chapter 4 Cache Memory

Unit 2. Chapter 4 Cache Memory Unit 2 Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation Location CPU Internal External Capacity Word

More information

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded Reference Case Study of a Multi-core, Multithreaded Processor The Sun T ( Niagara ) Computer Architecture, A Quantitative Approach, Fourth Edition, by John Hennessy and David Patterson, chapter. :/C:8

More information

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2. Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)

More information

Chapter 18 - Multicore Computers

Chapter 18 - Multicore Computers Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Cache and Memory. CS230 Tutorial 07

Cache and Memory. CS230 Tutorial 07 Cache and Memory CS230 Tutorial 07 Cache Overview Memory Hierarchy Blocks rom fastest and most expensive to slowest and least expensive astest memory is smallest and closest to CPU Slowest memory is largest

More information

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) 18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

DATAFLOW COMPUTERS: THEIR HISTORY AND FUTURE

DATAFLOW COMPUTERS: THEIR HISTORY AND FUTURE D DATAFLOW COMPUTERS: THEIR HISTORY AND FUTURE INTRODUCTION AND MOTIVATION As we approach the technological limitations, concurrency will become the major path to increase the computational speed of computers.

More information

Supporting Multithreading in Configurable Soft Processor Cores

Supporting Multithreading in Configurable Soft Processor Cores Supporting Multithreading in Configurable Soft Processor Cores Roger Moussali, Nabil Ghanem, and Mazen A. R. Saghir Department of Electrical and Computer Engineering American University of Beirut P.O.

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Memory. From Chapter 3 of High Performance Computing. c R. Leduc Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part Hierarchy - I Israel Koren ECE568/Koren Part.. 3 4 5 6 7 8 9 A B C D E F 6 blocks 3 4 block

More information

Multi MicroBlaze System for Parallel Computing

Multi MicroBlaze System for Parallel Computing Multi MicroBlaze System for Parallel Computing P.HUERTA, J.CASTILLO, J.I.MÁRTINEZ, V.LÓPEZ HW/SW Codesign Group Universidad Rey Juan Carlos 28933 Móstoles, Madrid SPAIN Abstract: - Embedded systems need

More information

ECEN Final Exam Fall Instructor: Srinivas Shakkottai

ECEN Final Exam Fall Instructor: Srinivas Shakkottai ECEN 424 - Final Exam Fall 2013 Instructor: Srinivas Shakkottai NAME: Problem maximum points your points Problem 1 10 Problem 2 10 Problem 3 20 Problem 4 20 Problem 5 20 Problem 6 20 total 100 1 2 Midterm

More information