Dynamic Dataflow. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Size: px
Start display at page:

Download "Dynamic Dataflow. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology"

Transcription

1 Dynamic Dataflow Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L- Dataflow Machines Static - mostly for signal processing NEC - NEDIP and IPP Hughes, Hitachi, AT&T, Loral, TI, Sanyo... Dynamic EM4: single-chip dataflow micro, 80PE Sigma-: multiprocessor, The largest ETL, Japan dataflow machine, ETL, Japan M.I.T. Engineering model K. Hiraki Manchester ( 8) Greg Papadopoulos S. Sakai M.I.T. - TTDA, Monsoon ( 88) M.I.T./Motorola - Monsoon ( 9) (8 PEs, 8 IS) ETL - SIGMA- ( 88) (8 PEs,8 John IS) Gurd T. Shimada ETL - EM4 ( 90) (80 PEs), EM-X ( 96) (80 PEs) Sandia - EPS88, EPS- IBM - Empire Andy... Y. Kodama Chris Jack Boughton Joerg Costanza Shown at Supercomputing 96 Shown at Supercomputing 9 Related machines: Burton Smith s Denelcor HEP, Monsoon Horizon, Tera L-

2 Outline Static Dataflow Machines Not general-purpose enough Dynamic Dataflow Machines As easy to build as a simple pipelined processor The software view The memory model: I-structures Monsoon and its performance L- Dataflow Graphs {x = a + b; y = b * 7 in (x-y) * (x+y)} Values in dataflow graphs are represented as tokens token < ip, p, v > instruction ptr port data ip = p = L a b + *7 x y An operator executes when all its input tokens are present; * copies of the result token are distributed to the destination operators no separate control flow L-4

3 Static Dataflow Machine: Instruction Templates 4 Opcode Destination + L 4L R 4R * - L + R * out Destination Operand Operand Presence bits Each arc in the graph has a operand slot in the program a + *7 x * b y L- Static Dataflow Machine Jack Dennis, 97 Receive Instruction Templates Op dest dest p src p src... FU FU FU FU FU Send <s, p, v >, <s, p, v > Many such processors can be connected together Programs can be statically divided among the processor L-6

4 Static Dataflow: Problems/Limitations Mismatch between the model and the implementation The model requires unbounded FIFO token queues per arc but the architecture provides storage for one token per arc The architecture does not ensure FIFO order in the reuse of an operand slot The merge operator has a unique firing rule The static model does not support Function calls - No easy solution in Data Structures the static framework - Dynamic dataflow provided a framework for solutions L-7 Outline Static Dataflow Machines Not general-purpose enough Dynamic Dataflow Machines As easy to build as a simple pipelined processor The software view The memory model: I-structures Monsoon and its performance L-8 4

5 Dynamic Dataflow Architectures Allocate instruction templates, i.e., a frame, dynamically to support each loop iteration and procedure call termination detection needed to deallocate frames The code can be shared if we separate the code and the operand storage a token <fp, ip, port, data> frame pointer instruction pointer L-9 A Frame in Dynamic Dataflow 4 + * - + * 4 L, 4L R, 4R L R out Program a b + *7 x <fp, ip, p, v> y * 4 Frame Need to provide storage for only one operand/operator L-0

6 Monsoon Processor Greg Papadopoulos op r d,d Code ip Instruction Fetch Frames fp+r Operand Fetch Token Queue ALU Form Token Network Network L- Temporary Registers & Threads Robert Iannucci n sets of registers (n = pipeline depth) op r S,S Code Frames Registers Instruction Fetch Operand Fetch Token Queue Registers evaporate when an instruction thread is broken Registers are also used for exceptions & interrupts Robert Iannucci Form Token Network ALU Network L- 6

7 Actual Monsoon Pipeline: Eight Stages Instruction Memory Instruction Fetch Effective Address Presence bits Frame Memory 7 Presence Bit Operation Frame Operation User Queue System Queue R, W ALU 44 Network Registers Form Token L- Instructions directly control the pipeline The opcode specifies an operation for each pipeline stage: opcode r dest [dest] EA WM RegOp ALU FormToken Easy to implement; EA - effective address no hazard detection FP + r: frame relative r: absolute IP + r: code relative (not supported) WM - waiting matching Unary; Normal; Sticky; Exchange; Imperative PBs X port PBs X Frame op X ALU inhibit Register ops: ALU: V L X V R V L X V R, CC Form token: V L X V R X Tag X Tag X CC Token X Token L-4 7

8 Procedure Linkage Operators f a... an get frame extract tag change Tag 0 change Tag change Tag n Like standard call/return but caller & callee can be active simultaneously Fork : n: Graph for f token in frame 0 token in frame change Tag 0 change Tag L- Outline Static Dataflow Machines Not general-purpose enough Dynamic Dataflow Machines As easy to build as a simple pipelined processor The software view The memory model: I-structures Monsoon and its performance L-6 8

9 Parallel Language Model Tree of Activation Frames f: Global Heap of Shared Objects g: h: active threads loop asynchronous and parallel at all levels L-7 Id World implicit parallelism Id Dataflow Graphs + I-Structures +... TTDA Monsoon *T *T-Voyager L-8 9

10 Id World people Rishiyur Nikhil, Keshav Pingali, Vinod Kathail, David Culler Ken Traub Steve Heller, Richard Soley, Dinart Mores Jamey Hicks, Alex Caro, Andy Shaw, Boon Ang Shail Anditya R Paul Johnson Paul Barth Jan Maessen Christine Flood Jonathan Young Derek Chiou Arun Iyangar Zena Ariola Mike Bekerle R.S. Nikhil Keshav Pingali David Culler Boon S. Ang Jamey Hicks Derek Chiou Ken Traub Steve Heller K. Eknadham (IBM), Wim Bohm (Colorado), Joe Stoy (Oxford),... L-9 Data Structures in Dataflow Data structures reside in a structure store tokens carry pointers P Memory.... P a I-structures: Write-once, Read multiple times or allocate, write, read,..., read, deallocate No problem if a reader arrives before the writer at the memory location I-fetch a v I-store L-0 0

11 I-Structure Storage: Split-phase operations & Presence bits s t I-Fetch <s, fp, a > address to be read forwarding address a a a 4 L- a I-structure Memory v v fp.ip s I-Fetch <a, Read, (t,fp)> split phase t Need to deal with multiple deferred reads other operations: fetch/store, take/put, clear Next time Compiling Id/pH into dataflow graphs Monsoon Performance L-

Computer Architecture: Dataflow/Systolic Arrays

Computer Architecture: Dataflow/Systolic Arrays Data Flow Computer Architecture: Dataflow/Systolic Arrays he models we have examined all assumed Instructions are fetched and retired in sequential, control flow order his is part of the Von-Neumann model

More information

Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Oregon Programming Language Summer School (OPLSS) Eugene, OR July 14, 2018

Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Oregon Programming Language Summer School (OPLSS) Eugene, OR July 14, 2018 Lecture 1 Dataflow Model of Parallelism Arvind Computer Science and Artificial Intelligence Laboratory M.I.. Oregon Programming Language Summer School (OPLSS) Eugene, OR July 14, 2018 1 Dataflow Jack Dennis

More information

Ben Lee* and A. R. Hurson** *Oregon State University Department of Electrical and Computer Engineering Corvallis, Oregon

Ben Lee* and A. R. Hurson** *Oregon State University Department of Electrical and Computer Engineering Corvallis, Oregon ISSUES IN DATAFLOW COMPUTING Ben Lee* and A. R. Hurson** *Oregon State University Department of Electrical and Computer Engineering Corvallis, Oregon 97331-3211 **The Pennsylvania State University Department

More information

Portland State University ECE 588/688. Dataflow Architectures

Portland State University ECE 588/688. Dataflow Architectures Portland State University ECE 588/688 Dataflow Architectures Copyright by Alaa Alameldeen and Haitham Akkary 2018 Hazards in von Neumann Architectures Pipeline hazards limit performance Structural hazards

More information

Good old days ended in Nov. 2002

Good old days ended in Nov. 2002 WaveScalar! Good old days 2 Good old days ended in Nov. 2002 Complexity Clock scaling Area scaling 3 Chip MultiProcessors Low complexity Scalable Fast 4 CMP Problems Hard to program Not practical to scale

More information

Dennis [1] developed the model of dataflowschemas, building on work by Karp and Miller [2]. These dataflow graphs, as they were later called, evolved

Dennis [1] developed the model of dataflowschemas, building on work by Karp and Miller [2]. These dataflow graphs, as they were later called, evolved Predictive Block Dataflow Model for Parallel Computation EE382C: Embedded Software Systems Literature Survey Vishal Mishra and Kursad Oney Dept. of Electrical and Computer Engineering The University of

More information

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue and Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://wwweecsberkeleyedu/~krste

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at

More information

Toward A Codelet Based Execution Model and Its Memory Semantics

Toward A Codelet Based Execution Model and Its Memory Semantics Toward A Codelet Based Execution Model and Its Memory Semantics -- For Future Extreme-Scale Computing Systems HPC Worshop Centraro, Italy June 20, 2012 Guang R. Gao ACM Fellow and IEEE Fellow Distinguished

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 9 Instruction-Level Parallelism Part 2

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 9 Instruction-Level Parallelism Part 2 ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 9 Instruction-Level Parallelism Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html

More information

Start Voyager: Message Passing and DSM on SMP Clusters

Start Voyager: Message Passing and DSM on SMP Clusters HPCS 97 --1 Start Voyager: Message Passing and DSM on SMP Clusters Laboratory for Computer Science Massachusetts Institute of Technology Massively Parallel Processors Shared Memory Cache-coherent KSR HP-Convex

More information

and Multithreading Ben Lee, Oregon State University A.R. Hurson, Pennsylvania State University

and Multithreading Ben Lee, Oregon State University A.R. Hurson, Pennsylvania State University and Multithreading Ben Lee, Oregon State University A.R. Hurson, Pennsylvania State University - Contrary to initial expectations, implementing dataflow computers has presented a monumental challenge.

More information

Dataflow Architectures. Karin Strauss

Dataflow Architectures. Karin Strauss Dataflow Architectures Karin Strauss Introduction Dataflow machines: programmable computers with hardware optimized for fine grain data-driven parallel computation fine grain: at the instruction granularity

More information

Computer Architecture: Out-of-Order Execution II. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Out-of-Order Execution II. Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Out-of-Order Execution II Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-447 Spring 2013, Computer Architecture, Lecture 15 Video

More information

Complex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units

Complex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units 6823, L14--1 Complex Pipelining: Out-of-order Execution & Register Renaming Laboratory for Computer Science MIT http://wwwcsglcsmitedu/6823 Multiple Function Units 6823, L14--2 ALU Mem IF ID Issue WB Fadd

More information

History Importance Comparison to Tomasulo Demo Dataflow Overview: Static Dynamic Hybrids Problems: CAM I-Structures Potential Applications Conclusion

History Importance Comparison to Tomasulo Demo Dataflow Overview: Static Dynamic Hybrids Problems: CAM I-Structures Potential Applications Conclusion By Matthew Johnson History Importance Comparison to Tomasulo Demo Dataflow Overview: Static Dynamic Hybrids Problems: CAM I-Structures Potential Applications Conclusion 1964-1 st Scoreboard Computer: CDC

More information

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 18-447 Computer Architecture Lecture 15: Load/Store Handling and Data Flow Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 Lab 4 Heads Up Lab 4a out Branch handling and branch predictors

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University. Instructions: ti Language of the Computer Rui Wang, Assistant professor Dept. of Information and Communication Tongji University it Email: ruiwang@tongji.edu.cn Computer Hierarchy Levels Language understood

More information

Lecture 4: Instruction Set Design/Pipelining

Lecture 4: Instruction Set Design/Pipelining Lecture 4: Instruction Set Design/Pipelining Instruction set design (Sections 2.9-2.12) control instructions instruction encoding Basic pipelining implementation (Section A.1) 1 Control Transfer Instructions

More information

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA CSE 490/590 Computer Architecture Complex Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo Last time Virtual address caches Virtually-indexed, physically-tagged cache design

More information

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance

More information

1 Introduction. Monsoon: an Explicit Token-Store Architecture. Abstract

1 Introduction. Monsoon: an Explicit Token-Store Architecture. Abstract Monsoon: an Explicit Token-Store Architecture Gregory hl. Papadopoulos Laboratory for Computer Science Masachusetts nstitute of Technology David E. Culler Computer Science Division University of California,

More information

Topic 4b Program Execution Models Dataflow Model of Computation: A Fine-Grain Kind of PXM. CPEG421/621: Compiler Design

Topic 4b Program Execution Models Dataflow Model of Computation: A Fine-Grain Kind of PXM. CPEG421/621: Compiler Design opic 4b Program Execution Models Dataflow Model of Computation: A ine-grain Kind of PXM CPEG42/62: Compiler Design Most slides taken from Prof. Gao s and J.Manzano s previous courses, with additional material

More information

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple

More information

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L09.1 Smith Spring 2008 MIPS

More information

CSAIL. Computer Science and Artificial Intelligence Laboratory. Massachusetts Institute of Technology

CSAIL. Computer Science and Artificial Intelligence Laboratory. Massachusetts Institute of Technology CSAIL Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Performance Tuning Scientific Codes for Dataflow Execution Andrew Shaw, Arvind, R. Paul Johnson PACT

More information

CS252 Lecture Notes Multithreaded Architectures

CS252 Lecture Notes Multithreaded Architectures CS252 Lecture Notes Multithreaded Architectures Concept Tolerate or mask long and often unpredictable latency operations by switching to another context, which is able to do useful work. Situation Today

More information

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

More information

Topics in computer architecture

Topics in computer architecture Topics in computer architecture Sun Microsystems SPARC P.J. Drongowski SandSoftwareSound.net Copyright 1990-2013 Paul J. Drongowski Sun Microsystems SPARC Scalable Processor Architecture Computer family

More information

*T: A Multithreaded Massively Parallel Architecture

*T: A Multithreaded Massively Parallel Architecture *T: A Multithreaded Massively Parallel Architecture R. S. Nikhil 00 Digital Equipment Corp. G. M. Papadopoulos 01 MIT Arvind 11 MIT Abstract What should the architecture of each node in a general purpose,

More information

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011 5-740/8-740 Computer Architecture Lecture 0: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Fall 20, 0/3/20 Review: Solutions to Enable Precise Exceptions Reorder buffer History buffer

More information

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation Dynamic Dataflow Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering

More information

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) 18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures

More information

Chapter 2. Instruction Set Principles and Examples. In-Cheol Park Dept. of EE, KAIST

Chapter 2. Instruction Set Principles and Examples. In-Cheol Park Dept. of EE, KAIST Chapter 2. Instruction Set Principles and Examples In-Cheol Park Dept. of EE, KAIST Stack architecture( 0-address ) operands are on the top of the stack Accumulator architecture( 1-address ) one operand

More information

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved. + EKT 303 WEEK 13 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. + Chapter 15 Reduced Instruction Set Computers (RISC) Table 15.1 Characteristics of Some CISCs, RISCs, and Superscalar

More information

Topic B (Cont d) Dataflow Model of Computation

Topic B (Cont d) Dataflow Model of Computation opic B (Cont d) Dataflow Model of Computation Guang R. Gao ACM ellow and IEEE ellow Endowed Distinguished Professor Electrical & Computer Engineering University of Delaware ggao@capsl.udel.edu 09/07/20

More information

SYSTEMS MEMO #12. A Synchronization Library for ASIM. Beng-Hong Lim Laboratory for Computer Science.

SYSTEMS MEMO #12. A Synchronization Library for ASIM. Beng-Hong Lim Laboratory for Computer Science. ALEWIFE SYSTEMS MEMO #12 A Synchronization Library for ASIM Beng-Hong Lim (bhlim@masala.lcs.mit.edu) Laboratory for Computer Science Room NE43-633 January 9, 1992 Abstract This memo describes the functions

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I

More information

A MULTIPLE PROCESSOR DATA FLOW MACHINE THAT SUPPORTS GENERALIZED PROCEDURES

A MULTIPLE PROCESSOR DATA FLOW MACHINE THAT SUPPORTS GENERALIZED PROCEDURES A MULTPLE PROCESSOR DATA FLOW MACHNE THAT SUPPORTS GENERALZED PROCEDURES ARVND, and VNOD KATHAL Laboratory of Computer Science Massachusetts nstitute of Technology Cambridge, Massachusetts ABSTRACT Programs

More information

Department of Electrical and Computer Systems Engineering

Department of Electrical and Computer Systems Engineering Department of Electrical and Computer Systems Engineering Technical Report MECSE-9-2007 An FPGA Based Implementation of the CSIRAC II Dataflow Computer A. Sloan and G. Egan An FPGA Based Implementation

More information

Lecture 14: Multithreading

Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

Computation Structures Group Progress Report Computation Structures Group Memo 337. June 6, G.A. Boughton (ed.)

Computation Structures Group Progress Report Computation Structures Group Memo 337. June 6, G.A. Boughton (ed.) Computation Structures Group Progress Report 1990-91 Computation Structures Group Memo 337 June 6, 1991 G.A. Boughton (ed.) This report describes research done at the Laboratory for Computer Science of

More information

Static Dataflow Graphs

Static Dataflow Graphs Static Dataflow Graphs Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of echnology L20-1 Motivation: Dataflow Graphs A common Base Language - to serve as target representation

More information

Multithreading: A Revisionist View of Dataflow Architectures

Multithreading: A Revisionist View of Dataflow Architectures Multithreading: A Revisionist View of Dataflow Architectures Gregory M. Papadopoulos Kenneth R. Traub Massachusetts Institute of Technology Motorola Cambridge Research Center Abstract Although they are

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

DATAFLOW COMPUTERS: THEIR HISTORY AND FUTURE

DATAFLOW COMPUTERS: THEIR HISTORY AND FUTURE D DATAFLOW COMPUTERS: THEIR HISTORY AND FUTURE INTRODUCTION AND MOTIVATION As we approach the technological limitations, concurrency will become the major path to increase the computational speed of computers.

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading CS 152 Computer Architecture and Engineering Lecture 18: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Lecture 4: MIPS Instruction Set

Lecture 4: MIPS Instruction Set Lecture 4: MIPS Instruction Set No class on Tuesday Today s topic: MIPS instructions Code examples 1 Instruction Set Understanding the language of the hardware is key to understanding the hardware/software

More information

CS152 Computer Architecture and Engineering. Complex Pipelines

CS152 Computer Architecture and Engineering. Complex Pipelines CS152 Computer Architecture and Engineering Complex Pipelines Assigned March 6 Problem Set #3 Due March 20 http://inst.eecs.berkeley.edu/~cs152/sp12 The problem sets are intended to help you learn the

More information

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 18-447 Computer Architecture Lecture 14: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Part II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1

Part II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1 Part II Instruction-Set Architecture Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 1 Short review of the previous lecture Performance = 1/(Execution time) = Clock rate / (Average CPI

More information

CS 152, Spring 2011 Section 8

CS 152, Spring 2011 Section 8 CS 152, Spring 2011 Section 8 Christopher Celio University of California, Berkeley Agenda Grades Upcoming Quiz 3 What it covers OOO processors VLIW Branch Prediction Intel Core 2 Duo (Penryn) Vs. NVidia

More information

Exploiting Locality of Array Data with Parallel Object-Oriented Model for Multithreaded Computation

Exploiting Locality of Array Data with Parallel Object-Oriented Model for Multithreaded Computation Exploiting Locality of Array Data with Parallel Object-Oriented Model for Multithreaded Computation Jonghoon Song, Juno Chang, Sangyong Han Dept. of Computer Science, Seoul National Univ. Seoul, 151-742,

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002

More information

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy

More information

COSC4201 Instruction Level Parallelism Dynamic Scheduling

COSC4201 Instruction Level Parallelism Dynamic Scheduling COSC4201 Instruction Level Parallelism Dynamic Scheduling Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Outline Data dependence and hazards Exposing parallelism

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/

More information

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2. Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)

More information

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng. CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. Part 3: von Neumann Architecture von Neumann Architecture Our goal: understand the basics of von Neumann architecture, including memory, control unit

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 8 Instruction-Level Parallelism Part 1

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 8 Instruction-Level Parallelism Part 1 ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 8 Instruction-Level Parallelism Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches Session xploiting ILP with SW Approaches lectrical and Computer ngineering University of Alabama in Huntsville Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar,

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

CS252 Graduate Computer Architecture Lecture 8. Review: Scoreboard (CDC 6600) Explicit Renaming Precise Interrupts February 13 th, 2010

CS252 Graduate Computer Architecture Lecture 8. Review: Scoreboard (CDC 6600) Explicit Renaming Precise Interrupts February 13 th, 2010 CS252 Graduate Computer Architecture Lecture 8 Explicit Renaming Precise Interrupts February 13 th, 2010 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

More information

Multiple Instruction Issue and Hardware Based Speculation

Multiple Instruction Issue and Hardware Based Speculation Multiple Instruction Issue and Hardware Based Speculation Soner Önder Michigan Technological University, Houghton MI www.cs.mtu.edu/~soner Hardware Based Speculation Exploiting more ILP requires that we

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Multiple Choice Type Questions

Multiple Choice Type Questions Techno India Batanagar Computer Science and Engineering Model Questions Subject Name: Computer Architecture Subject Code: CS 403 Multiple Choice Type Questions 1. SIMD represents an organization that.

More information

Storage in Programs. largest. address. address

Storage in Programs. largest. address. address Storage in rograms Almost all operand storage used by programs is provided by memory. Even though registers are more efficiently accessed by instructions, there are too few registers to hold the stored

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

Precise Exceptions and Out-of-Order Execution. Samira Khan

Precise Exceptions and Out-of-Order Execution. Samira Khan Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take

More information

Pipelined Processor Design. EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon

Pipelined Processor Design. EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon Pipelined Processor Design EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon Concept Identification of Pipeline Segments Add Pipeline Registers Pipeline Stage Control

More information

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2) ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2) Victor P. Nelson, Professor & Asst. Chair Vishwani D. Agrawal, James J. Danaher Professor Department

More information

Instruction Set Principles. (Appendix B)

Instruction Set Principles. (Appendix B) Instruction Set Principles (Appendix B) Outline Introduction Classification of Instruction Set Architectures Addressing Modes Instruction Set Operations Type & Size of Operands Instruction Set Encoding

More information

Computer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013

Computer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013 18-447 Computer Architecture Lecture 13: State Maintenance and Recovery Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences

More information

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Tomasulo

More information

CS222: MIPS Instruction Set

CS222: MIPS Instruction Set CS222: MIPS Instruction Set Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati 1 Outline Previous Introduction to MIPS Instruction Set MIPS Arithmetic's Register Vs Memory, Registers

More information

Program Partitioning for Multithreaded Dataflow Computers

Program Partitioning for Multithreaded Dataflow Computers Program Partitioning for Multithreaded Dataflow Computers Abstract Ben Lee Department of Electrical and Computer Engineering Oregon State University n recent years, there have been numerous proposals that

More information

Processors. Young W. Lim. May 12, 2016

Processors. Young W. Lim. May 12, 2016 Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro II Lect 10 Feb 15, 2008 Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L10.1

More information

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1 Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]

More information

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,

More information

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities

More information

Scheduled Dataflow Architecture

Scheduled Dataflow Architecture Scheduled Dataflow Architecture Krishna Kavi Department of Computer Science The University of North Texas kavi@cs.unt.edu http://crash.eb.uah.edu/~kavi University of Pisa, Sept 28, 2004 1 Overview Of Different

More information

Well-behaved Dataflow Graphs

Well-behaved Dataflow Graphs Well-behaved Dataflow Graphs Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of echnology L21-1 Outline Kahnian networks and dataflow Streams with holes & agged interpretation

More information

Pipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard.

Pipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard. Calcolatori Elettronici e Sistemi Operativi Pipeline issues Hazards Pipeline issues Data hazard Control hazard Structural hazard Pipeline hazard: RaW Pipeline hazard: RaW 5 6 7 8 9 5 6 7 8 9 : add R,R,R

More information

Do-While Example. In C++ In assembly language. do { z--; while (a == b); z = b; loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero

Do-While Example. In C++ In assembly language. do { z--; while (a == b); z = b; loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero Do-While Example In C++ do { z--; while (a == b); z = b; In assembly language loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero 25 Comparisons Set on less than (slt) compares its source registers

More information

Multithreading and the Tera MTA. Multithreading for Latency Tolerance

Multithreading and the Tera MTA. Multithreading for Latency Tolerance Multithreading and the Tera MTA Krste Asanovic krste@lcs.mit.edu http://www.cag.lcs.mit.edu/6.893-f2000/ 6.893: Advanced VLSI Computer Architecture, October 31, 2000, Lecture 6, Slide 1. Krste Asanovic

More information

Instructions: Assembly Language

Instructions: Assembly Language Chapter 2 Instructions: Assembly Language Reading: The corresponding chapter in the 2nd edition is Chapter 3, in the 3rd edition it is Chapter 2 and Appendix A and in the 4th edition it is Chapter 2 and

More information

Lecture 5: Procedure Calls

Lecture 5: Procedure Calls Lecture 5: Procedure Calls Today s topics: Memory layout, numbers, control instructions Procedure calls 1 Memory Organization The space allocated on stack by a procedure is termed the activation record

More information