Bluespec-4: Rule Scheduling and Synthesis. Synthesis: From State & Rules into Synchronous FSMs

Similar documents
Lecture 06: Rule Scheduling

Elastic Pipelines and Basics of Multi-rule Systems. Elastic pipeline Use FIFOs instead of pipeline registers

Sequential Circuits. Combinational circuits

BeiHang Short Course, Part 2: Description and Synthesis

or Arvind goes commercial Joe Stoy, May

Module Interfaces and Concurrency

Combinational circuits

Bluespec SystemVerilog TM Training. Lecture 05: Rules. Copyright Bluespec, Inc., Lecture 05: Rules

Synthesizable Verilog

Pipelining combinational circuits

Multiple Clock Domains

Verilog 1 - Fundamentals

Register Transfer Level

Verilog for High Performance

MCMASTER UNIVERSITY EMBEDDED SYSTEMS

ECE 2300 Digital Logic & Computer Organization. More Sequential Logic Verilog

ECE 2300 Digital Logic & Computer Organization. More Verilog Finite State Machines

Logic Circuits II ECE 2411 Thursday 4:45pm-7:20pm. Lecture 3

Verilog Tutorial. Verilog Fundamentals. Originally designers used manual translation + bread boards for verification

Verilog Tutorial 9/28/2015. Verilog Fundamentals. Originally designers used manual translation + bread boards for verification

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

Multiple Clock Domains

Lecture 12 VHDL Synthesis

VERILOG: FLIP-FLOPS AND REGISTERS

VHDL: RTL Synthesis Basics. 1 of 59

Register Transfer Level in Verilog: Part I

Programming in the Brave New World of Systems-on-a-chip

ARM 64-bit Register File

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

Writing Circuit Descriptions 8

Synthesis vs. Compilation Descriptions mapped to hardware Verilog design patterns for best synthesis. Spring 2007 Lec #8 -- HW Synthesis 1

ECE 551: Digital System *

FPGA for Software Engineers

Homework deadline extended to next friday

VHDL Modeling Behavior from Synthesis Perspective -Part B - EL 310 Erkay Savaş Sabancı University

The Verilog Language COMS W Prof. Stephen A. Edwards Fall 2002 Columbia University Department of Computer Science

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Sequential Logic Design

Date Performed: Marks Obtained: /10. Group Members (ID):. Experiment # 11. Introduction to Verilog II Sequential Circuits

Bluespec. Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge

CS429: Computer Organization and Architecture

ECE 4514 Digital Design II. Spring Lecture 15: FSM-based Control

CHAPTER - 2 : DESIGN OF ARITHMETIC CIRCUITS

ECE 2300 Digital Logic & Computer Organization. More Verilog Finite State Machines

In this lecture, we will go beyond the basic Verilog syntax and examine how flipflops and other clocked circuits are specified.

Transmitter Overview

Digital VLSI Design with Verilog

Luleå University of Technology Kurskod SMD152 Datum Skrivtid

ENGR 3410: MP #1 MIPS 32-bit Register File

EE178 Lecture Verilog FSM Examples. Eric Crabill SJSU / Xilinx Fall 2007

Synthesis of Combinational and Sequential Circuits with Verilog

Lecture 3: Modeling in VHDL. EE 3610 Digital Systems

Simulation semantics (a.k.a. event semantics)

Formal Specification of RISC-V Systems Instructions

INSTITUTE OF AERONAUTICAL ENGINEERING Dundigal, Hyderabad ELECTRONICS AND COMMUNICATIONS ENGINEERING

Verilog for Synthesis Ing. Pullini Antonio

ENGR 3410: Lab #1 MIPS 32-bit Register File

Recommended Design Techniques for ECE241 Project Franjo Plavec Department of Electrical and Computer Engineering University of Toronto

Digital Integrated Circuits

Lecture 14. Synthesizing Parallel Programs IAP 2007 MIT

Spiral 1 / Unit 6. Flip-flops and Registers

Verilog 1 - Fundamentals

Control in Digital Systems

Music. Numbers correspond to course weeks EULA ESE150 Spring click OK Based on slides DeHon 1. !

L2: Design Representations

Logic Synthesis. EECS150 - Digital Design Lecture 6 - Synthesis

CSE140L: Components and Design

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

Computer Architecture (TT 2012)

Arvind (with the help of Nirav Dave) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

CSCB58 - Lab 3. Prelab /3 Part I (in-lab) /2 Part II (in-lab) /2 TOTAL /8

Lab 4: N-Element FIFOs 6.175: Constructive Computer Architecture Fall 2014

EEL 4783: HDL in Digital System Design

ECE 4514 Digital Design II. Spring Lecture 13: Logic Synthesis

Performance Specifications. Simple processor pipeline

EECS150 - Digital Design Lecture 20 - Finite State Machines Revisited

falling edge Intro Computer Organization

Processor speed. Concurrency Structure and Interpretation of Computer Programs. Multiple processors. Processor speed. Mike Phillips <mpp>

Lecture 32: SystemVerilog

ECEN 468 Advanced Logic Design

CSE140L: Components and Design Techniques for Digital Systems Lab

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Quick Introduction to SystemVerilog: Sequental Logic

MLR Institute of Technology

Note: Closed book no notes or other material allowed, no calculators or other electronic devices.

ENGR 3410: MP #1 MIPS 32-bit Register File

EECS150 - Digital Design Lecture 10 Logic Synthesis

Overview. BSV reviews/notes Guard lifting EHRs Scheduling Lab Tutorial 2 Guards and Scheduling. Ming Liu

Hardware Synthesis from Bluespec

Pipelining combinational circuits

Static Analysis of Embedded C Code

Simple Behavioral Model: the always block

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Verilog. What is Verilog? VHDL vs. Verilog. Hardware description language: Two major languages. Many EDA tools support HDL-based design

Graduate Institute of Electronics Engineering, NTU Design of Datapath Controllers

ECEU530. Homework 4 due Wednesday Oct 25. ECE U530 Digital Hardware Synthesis. VHDL for Synthesis with Xilinx. Schedule

Synthesis of Synchronous Assertions with Guarded Atomic Actions

L5: Simple Sequential Circuits and Verilog

ECE 5745 Complex Digital ASIC Design Topic 1: Hardware Description Languages

Transcription:

Bluespec-4: Rule Scheduling and Synthesis Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology Based on material prepared by Bluespec Inc, January 2005 March 2, 2005 L10-1 Synthesis: From State & Rules into Synchronous FSMs module interface I Transition Logic S Next Collection of State Elements S O March 2, 2005 L10-2 1

Hardware Elements Combinational circuits Mux, emux, ALU,... I 0 I 1 I n. Sel O Mux I Sel e- Mux. O 0 O 1 O n A B OpSelect - Add, Sub, AddU,... - And, Or, Not,... - GT, LT, E,... - SL, SR, SRA,... ALU Result NCVZ Synchronous state elements Flipflop, Register, Register file, SRAM, RAM En Clk register March 2, 2005 L10-3 Flip-flops with Write Enables EN C C EN Edge-triggered: ata is sampled at the rising edge EN EN C dangerous! C 0 1 March 2, 2005 L10-4 2

Semantics and synthesis Rules Semantics: Untimed (one rule at a time) Verification activities Using Rule Semantics, establish functional correctness Scheduling and Synthesis by the BSV compiler Verilog RTL Semantics: clocked synchronous HW (multiple rules per clock) Using Schedules, establish performance correctness March 2, 2005 L10-5 Rule semantics Given a set of rules and an initial state while ( some rules are applicable* in the current state ) choose one applicable rule apply that rule to the current state to produce the next state of the system** (*) applicable = a rule s condition is true in current state (**) These rule semantics are untimed the action to change the state can take as long as necessary provided the state change is seen as atomic, i.e., not divisible. March 2, 2005 L10-6 3

Why are these rule semantics useful? Much easier to reason about correctness of a system when you consider just one rule at a time No problems with concurrency (e.g., race conditions, mis-timing, inconsistent states) We also say that rules are interlocked Major impact on design entry time and on verification time March 2, 2005 L10-7 Extensive supporting theory Term Rewriting Systems, Terese, Cambridge Univ. Press, 2003, 884 pp. Parallel Program esign: A Foundation, K. Mani Chandy and Jayadev Misra, Addison Wesley, 1988 Using Term Rewriting Systems to esign and Verify Processors, Arvind and Xiaowei Shen, IEEE Micro 19:3, 1998, p36-46 Proofs of Correctness of Cache-Coherence Protocols, Stoy et al, in Formal Methods for Increasing Software Productivity, Berlin, Germany, 2001, Springer-Verlag LNCS 2021 Superscalar Processors via Automatic Microarchitecture Transformation, Mieszko Lis, Masters thesis, ept. of Electrical Eng. and Computer Science, MIT, 2000 and more The intuitions underlying this theory are easy to use in practice March 2, 2005 L10-8 4

Bluespec s synthesis introduces concurrency Synthesis is all about executing multiple rules simultaneously (in the same clock cycle) A. When executing a set of rules in a clock cycle in hardware, each rule reads state from the leading clock edge and sets state at the trailing clock edge none of the rules in the set can see the eects of any of the other rules in the set B. However, in one-rule-at-a-time semantics, each rule sees the eects of all previous rule executions Thus, a set of rules can be safely executed together in a clock cycle only if A and B produce the same net state change March 2, 2005 L10-9 Pictorially Rules HW Ri Rj Rk Rj Rk Ri rule steps clocks There are more intermediate states in the rule semantics (a state after each rule step) In the HW, states change only at clock edges In each clock, a dierent number of rules may fire March 2, 2005 L10-10 5

Parallel execution reorders reads and writes Rules reads writes reads writes reads writes reads writes reads writes rule steps reads writes reads writes HW In the rule semantics, each rule sees (reads) the eects (writes) of previous rules In the HW, rules only see the eects from previous clocks, and only aect subsequent clocks clocks March 2, 2005 L10-11 Correctness Rules HW Ri Rj Rk Rj Rk Ri rule steps clocks Rules are allowed to fire in parallel only if the net state change is equivalent to sequential rule exeuction Consequence: the HW can never reach a state unexpected in the rule semantics Therefore, correctness is preserved March 2, 2005 L10-12 6

Scheduling The tool schedules as many rules in a clock cycle as it can prove are safe Generates interlock hardware to prevent execution of unsafe combinations Scheduling is the tool s best attempt at the fastest possible correct and safe hardware elayed rule execution is a consequence of safety checking, i.e., a rule conflicts with another rule in the same clock, the tool may delay its execution to a later clock March 2, 2005 L10-13 Obviously safe to execute simultaneously rule r1; x <= x + 1; rule r2; y <= y + 2; always @(posedge CLK) x <= x + 1; always @(posedge CLK) y <= y + 2; always @(posedge CLK) begin x <= x + 1; y <= y + 2; end Simultaneous execution is equivalent to r1 followed by r2 And also to r2 followed by r1 March 2, 2005 L10-14 7

Safe to execute simultaneously rule r1; x <= y + 1; rule r2; y <= y + 2; always @(posedge CLK) x <= y + 1; always @(posedge CLK) y <= y + 2; Simultaneous execution is equivalent to r1 followed by r2 Not equivalent to r2 followed by r1 But that s ok; just need equivalence to some rule sequence March 2, 2005 L10-15 Actions within a single rule rule r1; x <= y + 1; y <= x + 2; always @(posedge CLK) x <= y + 1; always @(posedge CLK) y <= x + 2; Actions within a single rule are simultaneous (The above translation is ok assuming no interlocks needed with any other rules involving x and y) March 2, 2005 L10-16 8

Not safe to execute simultaneously rule r1; x <= y + 1; rule r2; y <= x + 2; always @(posedge CLK) x <= y + 1; always @(posedge CLK) y <= x + 2; Simultaneous execution is not equivalent to r1 followed by r2 nor to r2 followed by r1 A rule is not a Verilog always block! Interlocks will prevent these firing together (by delaying one of them) March 2, 2005 L10-17 Rule: As a State Transformer A rule may be decomposed into two parts π(s) and δ(s) such that s next = if π(s) then δ(s) else s π(s) is the condition (predicate) of the rule, a.k.a. the CAN_FIRE signal of the rule. (conjunction of explicit and implicit conditions) δ(s) is the state transformation function, i.e., computes the next-state value in terms of the current state values. March 2, 2005 L10-18 9

Compiling a Rule rule r (f.first() > 0) ; x <= x + 1 ; f.deq (); enable current state f x rdy signals read methods π δ enable signals action parameters π = enabling condition δ = action signals & values next state values March 2, 2005 L10-19 f x Combining State Updates: strawman π s from the rules that update R π 1 π n OR latch enable δ s from the rules that update R δ 1,R δ n,r OR next state value R What if more than one rule is enabled? March 2, 2005 L10-20 10

Combining State Updates π s from all the rules π 1 π n Scheduler: Priority Encoder φ 1 φ n OR latch enable δ s from the rules that update R δ 1,R δ n,r OR next state value R Scheduler ensures that at most one φ i is true March 2, 2005 L10-21 One-rule-at-a-time Scheduler π 1 π 2 π n Scheduler: Priority Encoder φ 1 φ 2 φ n 1. φ i π i 2. π 1 π 2... π n φ 1 φ 2... φ n 3. One rewrite at a time i.e. at most one φ i is true Very conservative way of guaranteeing correctness March 2, 2005 L10-22 11

Executing Multiple Rules Per Cycle rule ra (z > 10); x <= x + 1; rule rb (z > 20); y <= y + 2; Can these rules be executed simultaneously? These rules are conflict free because they manipulate dierent parts of the state Rule a and Rule b are conflict-free if s. π a (s) π b (s) 1. π a (δ b (s)) π b (δ a (s)) 2. δ a (δ b (s)) == δ b (δ a (s)) March 2, 2005 L10-23 Executing Multiple Rules Per Cycle rule ra (z > 10); x <= y + 1; rule rb (z > 20); y <= y + 2; Can these rules be executed simultaneously? These rules are sequentially composable, parallel execution behaves like ra < rb Rule a and Rule b are sequentially composable if s. π a (s) π b (s) π b (δ a (s)) March 2, 2005 L10-24 12

Multiple-Rules-per-Cycle Scheduler π 1 π 2 Scheduler Scheduler φ 1 φ 2 π n Scheduler φ n 1. φ i π i 2. π 1 π 2... π n φ 1 φ 2... φ n 3. Multiple operations such that φ i φ j R i and R j are conflict-free or sequentially composable March 2, 2005 L10-25 Scheduling and control logic Modules (Current state) Rules CAN_FIRE π 1 WILL_FIRE φ 1 Modules (Next state) π 1 π n Scheduler φ n δ 1 cond action π n δ n δ 1 δ n Muxing March 2, 2005 L10-26 13

Synthesis Summary Bluespec generates a combinational hardware scheduler allowing multiple enabled rules to execute in the same clock cycle The hardware makes a rule-execution decision on every clock (i.e., it is not a static schedule) Among those rules that CAN_FIRE, only a subset WILL_FIRE that is consistent with a Rule order Since multiple rules can write to a common piece of state, the compiler introduces suitable muxing and mux control logic This is very simple logic: the compiler will not introduce long paths on its own (details later) March 2, 2005 L10-27 Conditionals and rule-spliting In Rule Semantics this rule: rule r1 (p1); if (q1) f.enq(x); else g.enq(y); Is equivalent to the following two rules: rule r1a (p1 && q1); f.enq(x); rule r1b (p1 &&! q1); g.enq(y); rule r1 won t fire unless both f anf g queues are not full! but not quite because of the compiler treats implicit conditions conservatively March 2, 2005 L10-28 14

Rules for Example 3 (* descending_urgency = "r1, r2" *) // Moving packets from input FIFO i1 rule r1; Tin x = i1.first(); if (dest(x)== 1) o1.enq(x); else o2.enq(x); i1.deq(); if (interesting(x)) c <= c + 1; etermine ueue etermine ueue // Moving packets from input FIFO i2 Count rule r2; certain packets Tin x = i2.first(); if (dest(x)== 1) o1.enq(x); else o2.enq(x); i2.deq(); if (interesting(x)) c <= c + 1; March 2, 2005 L10-29 +1 This example won t work properly without rule spliting Conditionals & Concurrency rule r1a (p1 && q1); f.enq(x); rule r1b (p1 &&! q1); g.enq(y); rule r1 (p1); if (q1) f.enq(x); else g.enq(y); rule r2 (p2); f.enq(z); Suppose there is another rule r2 Rule r2 cannot be executed simultaneously with r1 (conflict on f) But rule r2 may be executed simultaneously with r1b (provided p1,!q1 and p2 so permit) Thus, splitting a rule can allow more concurrency because of fewer resource conflicts March 2, 2005 L10-30 15

Scheduling conflicting rules When two rules conflict on a shared resource, they cannot both execute in the same clock The compiler produces logic that ensures that, when both rules are enabled, only one will fire Which one? The compiler chooses (and informs you, during compilation) The descending_urgency attribute allows the designer to control the choice March 2, 2005 L10-31 Example 2: Concurrent Updates Process 0 increments register x; Process 1 transfers a unit from register x to register y; Process 2 decrements register y 0 +1-1 1 +1-1 2 x y rule proc0 (cond0); x <= x + 1; rule proc1 (cond1); y <= y + 1; x <= x 1; rule proc2 (cond2); y <= y 1; (* descending_urgency = proc2, proc1, proc0 *) March 2, 2005 L10-32 16

Functionality and performance It is often possible to separate the concerns of functionality and performance First, use Rules to achieve correct functionality If necessary, adjust scheduling to achieve application performance goals (latency and bandwidth)* *I.e., # clocks per datum and # data per clock. Technology performance (clock speed) is a separate issue. March 2, 2005 L10-33 Improving performance via scheduling Latency and bandwidth can be improved by performing more operations in each clock cycle That is, by firing more rules per cycle Bluespec schedules all applicable rules in a cycle to execute, except when there are resource conflicts Therefore: Improving performance is often about resolving conflicts found by the scheduler March 2, 2005 L10-34 17

Viewing the schedule The command-line flag -show-schedule can be used to dump the schedule Three groups of information: method scheduling information rule scheduling information the static execution order of rules and methods more on this and other rule and method attributes to be discussed in the Friday tutorial... March 2, 2005 L10-35 End March 2, 2005 L10-36 18