Pipelining combinational circuits

Similar documents
Pipelining combinational circuits

Simple Inelastic and

Contributors to the course material

Simple Inelastic and

Module Interfaces and Concurrency

Elastic Pipelines and Basics of Multi-rule Systems. Elastic pipeline Use FIFOs instead of pipeline registers

Interrupts/Exceptions/Faults

Combinational Circuits and Simple Synchronous Pipelines

Contributors to the course material

Combinational Circuits in

Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Combinational Circuits in

Elastic Pipelines: Concurrency Issues

Elastic Pipelines: Concurrency Issues

Need for a scheduler. Concurrent rule execution. L17/L18 Review- Rule Scheduling

Contributors to the course material

Well formed BSV programs

Well formed BSV programs

Bluespec-5: Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Bluespec-5: Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Bluespec-4: Rule Scheduling and Synthesis. Synthesis: From State & Rules into Synchronous FSMs

Sequential Circuits as Modules with Interfaces. Arvind Computer Science & Artificial Intelligence Lab M.I.T.

Lab 4: N-Element FIFOs 6.175: Constructive Computer Architecture Fall 2014

IP Lookup block in a router

Sequential Circuits. Combinational circuits

Lecture 06: Rule Scheduling

Bluespec SystemVerilog TM Training. Lecture 05: Rules. Copyright Bluespec, Inc., Lecture 05: Rules

Muralidaran Vijayaraghavan

Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Modular Refinement. Successive refinement & Modular Structure

Hardware Synthesis from Bluespec

Non-Pipelined Processors - 2

ECE/CS 252: INTRODUCTION TO COMPUTER ENGINEERING UNIVERSITY OF WISCONSIN MADISON

IP Lookup block in a router

Lecture 14. Synthesizing Parallel Programs IAP 2007 MIT

Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Modular Refinement - 2. Successive refinement & Modular Structure

Implementing caches. Example. Client. N. America. Client System + Caches. Asia. Client. Africa. Client. Client. Client. Client. Client.

Programming Languages

Lecture 4: Synchronous Data Flow Graphs - HJ94 goal: Skiing down a mountain

Architecture exploration in Bluespec

Arvind (with the help of Nirav Dave) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Performance Specifications. Simple processor pipeline

Multiple Clock Domains

Recap. Recap. If-then-else expressions. If-then-else expressions. If-then-else expressions. If-then-else expressions

6.375 Tutorial 1 BSV. Ming Liu. Overview. Parts of a BSV module Lab 2 topics More types in BSV

Tail Recursion: Factorial. Begin at the beginning. How does it execute? Tail recursion. Tail recursive factorial. Tail recursive factorial

6.175: Constructive Computer Architecture. Tutorial 1 Bluespec SystemVerilog (BSV) Sep 30, 2016

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Side note: Tail Recursion. Begin at the beginning. Side note: Tail Recursion. Base Types. Base Type: int. Base Type: int

News. CSE 130: Programming Languages. Environments & Closures. Functions are first-class values. Recap: Functions as first-class values

Variables and Bindings

Successive refinement & Modular Structure. Bluespec-8: Modules and Interfaces. Designing a 2-Stage Processor with GAA. A 2-Stage Processor in RTL

Sequential Logic Design

Overview. BSV reviews/notes Guard lifting EHRs Scheduling Lab Tutorial 2 Guards and Scheduling. Ming Liu

BeiHang Short Course, Part 2: Description and Synthesis

IP Lookup: Some subtle concurrency issues. IP Lookup block in a router

Multiple Clock Domains

Transmitter Overview

The Wait-Free Hierarchy

EECS 3201: Digital Logic Design Lecture 4. Ihab Amer, PhD, SMIEEE, P.Eng.

Kermin Fleming. 543 Brigham Street Marlborough, MA (cell)

EE213A - EE298-2 Lecture 8

Programming Languages

Scheduling as Rule Composition

Well-behaved Dataflow Graphs

Programming in the Brave New World of Systems-on-a-chip

Typical workflow. CSE341: Programming Languages. Lecture 17 Implementing Languages Including Closures. Reality more complicated

VHDL vs. BSV: A case study on a Java-optimized processor

Bypassing and EHRs. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. L23-1

Constructive Computer Architecture. Tutorial 1 BSV Types. Andy Wright TA. September12, 2014

Scheduling as Rule Composition

Spiral 1 / Unit 6. Flip-flops and Registers

Constructive Computer Architecture. Combinational ALU. Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Non-pipelined Multicycle processors

Processor speed. Concurrency Structure and Interpretation of Computer Programs. Multiple processors. Processor speed. Mike Phillips <mpp>

Lab 2: Multipliers 6.175: Constructive Computer Architecture Fall 2014

Bluespec. Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge

Concurrent Objects and Linearizability

FFT: An example of complex combinational circuits

ECE 274 Digital Logic Verilog

IBM Power Multithreaded Parallelism: Languages and Compilers. Fall Nirav Dave

Super Advanced BSV* Andy Wright. October 24, *Edited for Tutorial 6. Andy Wright Super Advanced BSV* October 24, / 25

6.375: Complex Digital Systems. February 3, L01-1. Something new and exciting as well as useful

Dataflow Languages. Languages for Embedded Systems. Prof. Stephen A. Edwards. March Columbia University

Review of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Lock-based queue

Unit 6: Indeterminate Computation

Overview of Lecture 4. Memory Models, Atomicity & Performance. Ben-Ari Concurrency Model. Dekker s Algorithm 4

System Verilog Tagged Unions and Pattern Matching

Compilers and Interpreters

4/6/2011. Model Checking. Encoding test specifications. Model Checking. Encoding test specifications. Model Checking CS 4271

Bluespec for a Pipelined SMIPSv2 Processor

Ch 9: Control flow. Sequencers. Jumps. Jumps

CS477 Formal Software Development Methods / 39

A Deterministic Concurrent Language for Embedded Systems

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

Last Time. Introduction to Design Languages. Syntax, Semantics, and Model. This Time. Syntax. Computational Model. Introduction to the class

Virtual Memory and Interrupts

A Deterministic Concurrent Language for Embedded Systems

Transcription:

Constructive Computer Architecture: Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-1 Contributors to the course material Arvind, Rishiyur S. Nikhil, Joel Emer, Muralidaran Vijayaraghavan Staff and students in 6.375 (Spring 2013), 6.S195 (Fall 2012), 6.S078 (Spring 2012) Asif Khan, Richard Ruhler, Sang Woo Jun, Abhinav Agarwal, Myron King, Kermin Fleming, Ming Liu, Li- Shiuan Peh Eternal Prof Amey Karkare & students at IIT Kanpur Prof Jihong Kim & students at Seoul Nation University Prof Derek Chiou, University of Teas at Austin Prof Yoav Etsion & students at Technion September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-2 1

Contents Inelastic versus Elastic pipelines The role of FIFOs Concurrency issues BSV Concepts The Maybe Type Concurrency analysis September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-3 Comple Combinational Functions i+1 i i-1 f0 f1 f2 3 different datasets in the pipeline Lot of area and long combinational delay Folded or multi-cycle version can save area and reduce the combinational delay but throughput per clock cycle gets worse Pipelining: a method to increase the circuit throughput by evaluating multiple inputs September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-4 2

Inelastic vs Elastic pipeline f0 f1 f2 sreg1 sreg2 outq Inelastic: all pipeline stages move synchronously f1 f2 f3 fifo1 fifo2 outq Elastic: A pipeline stage can process data if its input FIFO is not empty and output FIFO is not Full Most comple processor pipelines are a combination of the two styles September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-5 Inelastic vs Elastic Pipelines Inelastic pipeline: typically only one rule or mutually eclusive rules; the designer controls precisely which activities go on in parallel downside: The designer must program the starting and draining of the pipeline. The rule can get complicated -- easy to make mistakes; difficult to make changes Elastic pipeline: several smaller rules, each easy to write, easier to make changes downside: sometimes rules do not fire concurrently when they should September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-6 3

first deq enq Inelastic pipeline f0 f1 f2 sreg1 sreg2 outq rule sync-pipeline (True);.deq(); sreg1 <= f0(.first()); sreg2 <= f1(sreg1); outq.enq(f2(sreg2)); This rule can fire only if - has an element - outq has space Atomicity: Either all or none of the state elements, outq, sreg1 and sreg2 will be updated September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-7 FIFO Module: methods with guarded interfaces n not full not empty not empty enab rdy enab rdy n rdy FIFO fifo.enq(); fifo.deq(); y=fifo.first() // action method // action method // value method September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-8 4

Inelastic pipeline Making implicit guard conditions eplicit f0 f1 f2 sreg1 sreg2 outq rule sync-pipeline (!.empty() &&!outq.full);.deq(); sreg1 <= f0(.first()); sreg2 <= f1(sreg1); outq.enq(f2(sreg2)); Suppose sreg1 and sreg2 have data, outq is not full but is empty. What behavior do you epect? Leave green and red data in the pipeline? September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-9 Pipeline bubbles f0 f1 f2 sreg1 sreg2 outq rule sync-pipeline (True);.deq(); sreg1 <= f0(.first()); sreg2 <= f1(sreg1); outq.enq(f2(sreg2)); Modify the rule to deal with these conditions Red and Green tokens must move even if there is nothing in! Also if there is no token in sreg2 then nothing should be enqueued in the outq Valid bits or the Maybe type September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-10 5

Eplicit encoding of Valid/Invalid data Valid/Invalid f0 f1 f2 sreg1 sreg2 outq typedef union tagged {void Valid; void Invalid; } Validbit deriving (Eq, Bits); rule sync-pipeline (True); if (.notempty()) begin sreg1 <= f0(.first());.deq(); sreg1f <= Valid end else sreg1f <= Invalid; sreg2 <= f1(sreg1); sreg2f <= sreg1f; if (sreg2f == Valid) outq.enq(f2(sreg2)); September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-11 When is this rule enabled? rule sync-pipeline (True); if (.notempty()) begin sreg1 <= f0(.first());.deq(); sreg1f <= Valid end else sreg1f <= Invalid; sreg2 <= f1(sreg1); sreg2f <= sreg1f; if (sreg2f == Valid) outq.enq(f2(sreg2)); sreg1f sreg2f outq NE V V NF NE V V F NE V I NF NE V I F NE I V NF NE I V F NE I I NF NE I I F yes No Yes Yes Yes No Yes yes NE = Not Empty; NF = Not Full f0 f1 f2 sreg1 sreg1f sreg2f outq E V V NF E V V F E V I NF E V I F E I V NF E I V F E I I NF E I I F Yes1 = yes but no change sreg2 outq yes No Yes Yes Yes No Yes1 yes September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-12 6

The Maybe type A useful type to capture valid/invalid data typedef union tagged { void Invalid; data_t Valid; } Maybe#(type data_t); data valid/invalid Registers contain Maybe type values Some useful functions on Maybe type: isvalid() returns true if is Valid frommaybe(d,) returns the data value in if is Valid the default value d if is Invalid September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-13 Using the Maybe type typedef union tagged { void Invalid; data_t Valid; } Maybe#(type data_t); data valid/invalid Registers contain Maybe type values rule sync-pipeline if (True); if (.notempty()) begin sreg1 <= Valid f0(.first());.deq(); end else sreg1 <= Invalid; sreg2 <= isvalid(sreg1)? Valid f1(frommaybe(d, sreg1)) : Invalid; if isvalid(sreg2) outq.enq(f2(frommaybe(d, sreg2))); September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-14 7

Pattern-matching: An alternative synta to etract datastructure components typedef union tagged { void Invalid; data_t Valid; } Maybe#(type data_t); case (m) matches tagged Invalid : return 0; tagged Valid. : return ; endcase will get bound to the appropriate part of m if (m matches (Valid.) &&& ( > 10)) The &&& is a conjunction, and allows pattern-variables to come into scope from left to right September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-15 The Maybe type data using the pattern matching synta typedef union tagged { void Invalid; data_t Valid; } Maybe#(type data_t); data valid/invalid Registers contain Maybe type values rule sync-pipeline if (True); if (.notempty()) begin sreg1 <= Valid (f0(.first()));.deq(); end else sreg1 <= Invalid; case (sreg1) matches tagged Valid.s1: sreg2 <= Valid f1(s1); tagged Invalid: case (sreg2) matches tagged Valid.s2: outq.enq(f2(s2)); endcase sreg2 <= Invalid; endcase s1 will get bound to the appropriate part of sreg1 September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-16 8

Generalization: n-stage pipeline f(0) f(1) f(2)... f(n-1) sreg[0] sreg[1] sreg[n-2] outq rule sync-pipeline (True); if (.notempty()) begin sreg[0]<= Valid f(1,.first());.deq();end else sreg[0]<= Invalid; for(integer i = 1; i < n-1; i=i+1) begin case (sreg[i-1]) matches tagged Valid.s: sreg[i] <= Valid f(i,s); tagged Invalid: sreg[i] <= Invalid; endcase end case (sreg[n-2]) matches tagged Valid.s: outq.enq(f(n-1,s)); endcase September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-17 Elastic pipeline Use FIFOs instead of pipeline registers f1 f2 f3 fifo1 fifo2 rule stage1 if (True); fifo1.enq(f1(.first());.deq(); rule stage2 if (True); fifo2.enq(f2(fifo1.first()); fifo1.deq(); rule stage3 if (True); outq.enq(f3(fifo2.first()); fifo2.deq(); outq What is the firing condition for each rule? Can tokens be left inside the pipeline? No need for Maybe types September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-18 9

Firing conditions for reach rule f1 f2 f3 fifo1 fifo1 fifo2 outq NE NE,NF NE,NF NF NE NE,NF NE,NF F NE NE,NF NE,F NF NE NE,NF NE,F F. fifo2 outq rule1 rule2 rule3 Yes Yes Yes Yes Yes No Yes No Yes Yes No No. This is the first eample we have seen where multiple rules may be ready to eecute concurrently Can we eecute multiple rules together? September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-19 Informal analysis f1 f2 f3 fifo1 fifo2 outq fifo1 fifo2 outq NE NE,NF NE,NF NF NE NE,NF NE,NF F NE NE,NF NE,F NF NE NE,NF NE,F F. rule1 rule2 rule3 Yes Yes Yes Yes Yes No Yes No Yes Yes No No. FIFOs must permit concurrent enq and deq for all three rules to fire concurrently September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-20 10

Concurrency when the FIFOs do not permit concurrent enq and deq not empty f1 f2 f3 fifo1 not empty & not full fifo2 not empty & not full outq not full At best alternate stages in the pipeline will be able to fire concurrently September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-21 Pipelined designs epressed using Multiple rules If rules for different pipeline stages never fire in the same cycle then the design can hardly be called a pipelined design If all the enabled rules fire in parallel every cycle then, in general, wrong results can be produced September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-22 11

BSV Eecution Model Repeatedly: Select a rule to eecute Compute the state updates Make the state updates Highly nondeterministic; User annotations can be used in rule selection A legal behavior of a BSV program can be eplained by observing the state updates obtained by applying only one rule at a time One-rule-at-time semantics September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-23 Concurrent scheduling of rules The one-rule-at-a-time semantics plays the central role in defining functional correctness and verification but for meaningful hardware design it is necessary to eecute multiple rules concurrently without violating the onerule-at-a-time semantics What do we mean by concurrent scheduling? First - some hardware intuition Later the semantics of concurrent scheduling September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-24 12

Hardware intuition for concurrent scheduling September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-25 Rule Eecution Application of a rule modifies some state elements of the system in a deterministic manner f f current state net state computation reg en s netstate net state values September 18, 2013 http://csg.csail.mit.edu/6.s195 L06-26 13

current state current state Eecuting multiple rules in one clock cycle f f Sequential composition rule 1 Parallel composition Forwards old values unless updated mu rule 1 rule2 rule2 Net State net state values Sequential composition preserves the one-rule-at-a-time semantics but generally increases the critical combinational path Parallel composition does not create longer combinational paths but may not preserve the one-rule-at-a-time semantics September 18, 2013 http://csgcsail.mit.edu/6.s195 L06-27 Reg en ensures that no double updates are made on any register merge NetState Reg en f f net state values Violation of sequential semantics rule ra <= y; rule rb y <= ; Suppose initially is 0 and y is y0 {0,y0} ra {y0,y0} rb {y0,y0} {0,y0} rb {0,0} ra {0,0} {0,y0} rb ra {y0,0} Parallel eecution does not behave like either ra<rb or rb<ra We do not want to allow concurrent eecution of ra and rb net time compiler analysis to determine which rules can be eecuted concurrently September 23, 2013 http://csg.csail.mit.edu/6.s195 L07-28 14