or Arvind goes commercial Joe Stoy, May 18 2007 Bluespec, Inc., 2006
Network evolution: IP Sprawl Managed Switch/Router Professional Park ADM Metropolitan Area or Regional Area Service Provider ADM ADM Downtown Financial District ADM High tech Center Servers Managed Switch/Router Managed Switch/Router Workstations LAN Storage Arrays SAN Remote back-office Storage Storage Arrays LAN SAN Servers Internet data center Servers LAN Enterprise
Network evolution: IP Sprawl Managed Switch/Router Professional Park ADM Metropolitan Area or Regional Area Service Provider ADM ADM Downtown Financial District ADM High tech Center Servers Managed Switch/Router Managed Switch/Router Workstations LAN Storage Arrays SAN Remote back-office Storage Storage Arrays LAN SAN Servers Internet data center Servers LAN Enterprise
HIBEAM: Packet-Switch Architecture Line Card Fabric Card Route table Packet Buffer Forwarding Engine (FE) IPv4, MPLS & Ethernet Packet Forwarding 500K IP entries 1M MPLS labels 1000 Layer 2/3/4 rules Packet Processing Queuing Engine (QE) Bandwidth Manager (BM) Traffic Management DiffServ model Policing/Scheduling/Shaping Packet Switching single store & forward 1 to 64 10G ports Global bandwidth control functions Traffic Mgmt & Switching 4
Delivering Efficient switch fabrics and building blocks for 10G IP differentiated services lower cost higher port density less power 5
..by rewriting the rules for semiconductor development Development time (months) Sandburst s revolutionary BlueSpec TM technology reduces ASIC development time by up to 50%! Design Complexity (gates) 6
What is BlueSpec A language for hardware description Based on TRS (term rewriting systems) Syntax and type system based on Haskell Run-time execution model quite unlike Haskell s Compiled to structural Verilog... or C... or FPGAs. 7
Bluespec at Sandburst Lennart Augustsson designed the Bluespec language Notation, type system, and static abstraction mechanisms borrow heavily from Haskell Designed and implemented the compiler v1 circa 7/2000, v2 circa 9/2000, v3 11/2001 Mieszko Lis implemented the scheduler Initially produced just Verilog Process/execute with std. Verilog tools Later (10/2000) also produced C C code is "cycle accurate" to the Verilog Can mix Bluespec-generated code with other code (hand-written Verilog, legacy Verilog, imported IP,...) 8
The Bluespec team Rishiyur Nikhil, Director (DEC/Compaq Research, MIT) Lennart Augustsson (CRT, Chalmers) Stephen Bailey (DEC, startups) Joe Stoy (Oxford) Mieszko Lis (MIT) Jacob Schwartz (MIT) Dan Rosenband (MIT) plus Arvind Consultants Professor James Hoe (CMU) Professor Krste Asanovic (MIT) Professor Srini Devadas (MIT) Niklas Rojemo (Sweden) 9
BlueSpec TM Advantage: HIBEAM Development Cycle accurate C and Verilog models for all four chips and the system Q2'01 Q3'01 Q1 02 Q2 02 Model generation enables early customer engagements 10
Bluespec use at Sandburst HIBEAM chip set system model for performance analysis close to 15K lines of Bluespec April 2001 through the present Mesa "pathfinding" project Goal: flesh out the Bluespec "design flow" Understand how Bluespec design fits into the larger picture of the full ASIC design process 6/2001 through 11/2001 11
Development Roadmap HIBEAM Xbar-16 Xbar-32 (ADI) Xbar-64 (ADI) BM-16 QE FE1 BM-32 BM-64 Mesa BlueSpec Trailblazer QE2 For FE2 SAN? More FE2 rules? IP Duplex? Storage? Verilog coding BlueSpec coding Jan 02 Apr 02 Jul 02 Oct 02 Jan 03 Apr 03 Jul 03 Oct 03 Jan 04 12
Bluespec, Inc. background Research@MIT on high-level synthesis & verification Technology Sandburst Corp: 10Gb/s core router ASICs (Bluespec: further technology development) VC funding Technology VC funding Bluespec, Inc.: highlevel design and synthesis tool (SystemVerilog-based) ~1996 2000 2003 13
14
The Bluespec team Rishiyur Nikhil, Director (DEC/Compaq Research, MIT) Lennart Augustsson (CRT, Chalmers) Stephen Bailey (DEC, startups) Joe Stoy (Oxford) Mieszko Lis (MIT) Jacob Schwartz (MIT) Dan Rosenband (MIT) plus Arvind 15
Evolution of HDLs (Hardware Description Languages) Hand-drawn circuit diagrams (schematics) Schematic Capture (automated) ~1985 Text-based RTL langs: Verilog & VHDL 2004 SystemVerilog (Accellera) IEEE Verilog standards (also VHDL standards) 2005 IEEE 1995 2001 2005? time (RTL = Register-Transfer Level) 16
Bluespec: Better Design Accelerates Everything! Architectural exploration More architectural flexibility during design Early executable models Architecture Design Verification and Test Physical Design 50% reduction from design to verified netlist Faster fixes, to achieve closure 50% reduction in errors, faster correction Better reuse Fully synthesizable without compromise! 17
Bluespec SystemVerilog A one slide overview Bluespec SystemVerilog Rules and Rule-based Interfaces For complex concurrency and control, across multiple shared resources, across module boundaries High-level abstract types Powerful static checking Powerful parameterization Powerful static elaboration Advanced clock management Behavioral Two dimensions raising the level of abstraction (fully synthesizable) Structural VHDL/Verilog/SystemVerilog/SystemC 18
Bluespec SystemVerilog A one-slide overview Bluespec SystemVerilog Rules and Rule-based Interfaces For complex concurrency and control, across multiple shared resources, across module boundaries High-level abstract types Powerful static checking Powerful parameterization Powerful static elaboration Advanced clock management Behavioral Two dimensions raising the level of abstraction (fully synthesizable) Structural VHDL/Verilog/SystemVerilog/SystemC 19
One-at-a-time intuitions Untimed rule semantics are one-rule-at-a-time steps The Bluespec compiler schedules multiple rule steps into each clock, producing a timed semantics and resolving non-determinism But before we go there, let s build some HW intuitions based on one-rule-at-a-time We ll use the following example (Can you guess what it computes? Hint: Euclid ) rule decr ( x <= y && y!= 0 ); y <= y - x; endrule : decr rule swap (x > y && y!= 0); x <= y; y <= x; endrule: swap Answer: Euclid s algorithm for computing GCD (Greatest Common Divisor) of initial values in x and y registers; result is in x 20
Suppose we want to execute just one rule The HW would be quite simple rule decr ( x <= y && y!= 0 ); y <= y - x; endrule : decr rule decr (all comb. logic) (y-x) next state D state y Q current state rule body (x<=y && y!=0) EN D x Q rule cond EN 21
Two mutually exclusive rules (Later: what to do if they re not mutually exclusive) rule decr ( x <= y && y!= 0 ); y <= y - x; endrule : decr rule swap (x > y && y!= 0); x <= y; y <= x; endrule: swap decr cond swap data select D EN D state y x Q Q cond scheduler EN 22
Practical rule composition In practice, we only do restricted rule compositions Every rule executes entirely within one cycle A rule fires at most once in a cycle (no Rule1<Rule1) Greedy: as many rules as possible per clock cycle Rule1 body does not feed into Rule2 cond Therefore, rule conds can be evaluated based on state at beginning of cycle Rule1 body does not feed into Rule2 body Therefore, Rule2 can use state from beginning of cycle 23
Practical Rule Composition Rules Ri Rj Rk rule steps HW Rj Rk Ri clocks No intra-cycle communication between rules, in the HW Correctness: HW scheduler only allows rules Ri, Rj,, Rk to execute concurrently when net state change is equivalent to Ri<Rj< <Rk Thus, never get into inconsistent state (no race conditions) Every state in the HW exists in the one-rule-at-a-time semantics 24
Multiple Rules and Conflicts If two rules are enabled in a particular cycle, what prevents them from executing concurrently? Answer: conflicts Since state read/write ordering is different in rule-at-a-time semantics compared to concurrent execution, certain concurrent executions would result in inconsistent states Certain resource sharings prevent concurrency 25
rule r1 (c1); x <= y + 1; endrule Conflict Rule untimed semantics: r1 and then r2 ( r1<r2 ), or r2 and then r1 ( r2<r1 ) x 10 21 13 y 20 23 12 rule r2 (c2); y <= x + 2; endrule HW: concurrent execution of r1 and r2 Each rule reads state (y,x) at start of cycle Each rule updates state (y+1, x+2) at end of cycle 21 12 Net HW state change any order (r1<r2 or r2<r1) not ok to execute concurrently A rule is not a Verilog always block! Bluespec HW scheduler will prevent these firing together 26
CAN_FIRE and WILL_FIRE current state Rule1 cond CAN_FIREs RuleN cond Scheduler WILL_FIREs controls data muxing and state ENables Scheduler incorporates conflict analysis by compiler CAN_FIRE is False WILL_FIRE is False CAN_FIRE is True WILL_FIRE is True if not precluded by conflicts with other rules False otherwise (the rule is blocked for this cycle) 27
Bluespec SystemVerilog A one-slide overview Bluespec SystemVerilog Rules and Rule-based Interfaces For complex concurrency and control, across multiple shared resources, across module boundaries High-level abstract types Powerful static checking Powerful parameterization Powerful static elaboration Advanced clock management Behavioral Two dimensions raising the level of abstraction (fully synthesizable) Structural VHDL/Verilog/SystemVerilog/SystemC 28
Modules, rules, interfaces, methods module interface state The big picture: modules contain rules which use methods that are provided by sub-modules in their interfaces. Methods, too, can use other methods. rule 29
Multiplier Example module mktest (); Reg#(int) state <- mkreg(0); Mult_ifc m <- mkmult1(); interface Mult_ifc; method Action start (Tin x, Tin y); method Tout result (); endinterface: Mult_ifc rule go (state == 0); m.start (9, 5); state <= 1; endrule rule finish (state == 1); $display ( Product = %d,m.result()); state <= 2; endrule endmodule: mktest module mkmult1 (Mult_ifc); Reg#(Tout) product <- mkreg(0); Reg#(Tout) d <- mkreg(0); Reg#(Tin) r <- mkreg(0); rule cycle (r!= 0); if (r[0] == 1) product <= product + d; d <= d << 1; r <= r >> 1; endrule method Action start (x,y) if (r == 0); d <= x; r <= y; product <= 0; endmethod method result () if (r == 0); return product; endmethod endmodule: mkmult1 30
Multiplier Example mktest :: Module Empty mktest = module state :: Reg int state <- mkreg 0 m :: Mult_ifc m <- mkmult1 rules go : when state == 0 ==> m.start (9, 5) state := 1 finish : when state == 1 ==> $display ( Product = %d,m.result()) state := 2 interface Mult_ifc = start :: Tin ->Tin -> Action result :: Tout mkmult1 :: Module Mult_ifc mkmult1 = module product :: Reg Tout product <- mkreg 0 d :: Reg Tout d <- mkreg 0 r :: Reg Tin r <- mkreg 0 rules cycle : when r /= 0 ==> if r[0] == 1 product := product + d d := d << 1 r := r >> 1 interface start x y = action { d:=x; r:=y; } when r == 0 result = product when r == 0 31
Multiplier Example ESL_MODULE (mktest, ESL_EMPTY) { esl_reg<int> state; Mult_ifc * m; ESL_RULE (go, state == 0) { m->start (9, 5); state = 1; } ESL_RULE (finish, state == 1) { cout << Product = << m->result() << endl; state = 2; } ESL_INTERFACE ( Mult_ifc ) { ESL_ACTION_METHOD_INTERFACE (start, Tin x,tin y); ESL_VALUE_METHOD_INTERFACE (result, Tout); } } ESL_CTOR (mktest) { ESL_NEW_REG(state, int, 0); m = new mkmult1(); ESL_END_CTOR; } 32
Concern: the learning curve Hardware designers: Atomic means (if anything) in the same clock cycle Bluespec: Many atomic actions in same cycle, as if they happened one at a time 33
Concern: the learning curve Hardware designers: Begin by designing the overall finite state machine to control the design Bluespec: Design little rules locally: the compiler will generate the overall logic 34
Simple example with concurrency and shared resources cond0 cond1 cond2 0 +1-1 1 +1-1 2 x y Process priority: 2 > 1 > 0 Process 0: increments register x when cond0 Process 1: transfers a unit from register x to register y when cond1 Process 2: decrements register y when cond2 Each register can only be updated by one process on each clock. Priority: 2 > 1 > 0 Just like real applications, e.g.: Packet arrives, is processed, departs 35
cond0 cond1 cond2 0 +1-1 1 +1-1 2 Process priority: 2 > 1 > 0 x y Which one is correct? always @(posedge CLK) begin if (!cond2 && cond1) x <= x 1; else if (cond0) x <= x + 1; if (cond2) y <= y 1; else if (cond1) y <= y + 1; end always @(posedge CLK) begin if (!cond2 cond1) x <= x 1; else if (cond0) x <= x + 1; if (cond2) y <= y 1; else if (cond1) y <= y + 1; end What s required to verify that they re correct? What if the priorities changed: cond1 > cond2 > cond0? What if the processes are in different modules? 36
With Bluespec, the design is direct cond0 cond1 cond2 0 +1-1 1 +1-1 2 x y Process priority: 2 > 1 > 0 (* descending_urgency = proc2, proc1, proc0 *) rule proc0 (cond0); x <= x + 1; endrule rule proc1 (cond1); y <= y + 1; x <= x 1; endrule rule proc2 (cond2); y <= y 1; endrule Hand-written RTL: Complexity due to: State-centric (for synthesizability) Scheduling clutter BSV: Functional correctness follows directly from rule semantics Executable spec (operation-centric) Automatic handling of shared resource mux logic Same hardware as the RTL 37
Now, let s make a small change: add a new process and insert its priority cond0 cond1 cond2 0 +1 x 1-1 +1 +2 3-2 cond3 Process priority: 2 > 3 > 1 > 0 y -1 2 38
Changing the Bluespec design cond0 cond1 cond2 Process priority: 2 > 3 > 1 > 0 0 +1 x Pre-Change 1-1 +1 +2 3-2 cond3 y -1 2 (* descending_urgency = "proc2, proc3, proc1, proc0" *) (* descending_urgency = proc2, proc1, proc0 *) rule proc0 (cond0); x <= x + 1; endrule rule proc1 (cond1); y <= y + 1; x <= x 1; endrule rule proc2 (cond2); y <= y 1; endrule rule proc0 (cond0); x <= x + 1; endrule rule proc1 (cond1); y <= y + 1; x <= x - 1; endrule rule proc2 (cond2); y <= y - 1; x <= x + 1; endrule rule proc3 (cond3); y <= y - 2; x <= x + 2; endrule? 39
Changing the Verilog design cond0 cond1 cond2 Process priority: 2 > 3 > 1 > 0 0 +1 x 1-1 +1 +2 3-2 y -1 2 cond3 Pre-Change always @(posedge CLK) begin if (!cond2 && cond1) x <= x 1; else if (cond0) x <= x + 1; if (cond2) y <= y 1; else if (cond1) y <= y + 1; end always @(posedge CLK) begin if ((cond2 && cond0) (cond0 &&!cond1 &&!cond3)) x <= x + 1; else if (cond3 &&?!cond2) x <= x + 2; else if (cond1 &&!cond2) x <= x - 1 if (cond2) y <= y - 1; else if (cond3) y <= y - 2; else if (cond1) y <= y + 1; end 40
Concern: the learning curve Other ways to make customers immediately productive Offer generic IP they can use off the shelf with very little training They can customize But then they ll have to learn more, to do so 41
So how does Arvind fit into this? Board member Provides longer-term view of the technical development 42
A FIFO problem module mkfifo(fifo#(t)) provisos(bits#(t,ts)); Reg#(t) dta <- mkregu; Reg#(Bool) full <- mkreg(false); method push(t x) if (!full); full <= True; dta <= x; endmethod method t pop() if (full); full <= False; return dta; endmethod endmodule 43
A FIFO problem Immediate solution (Bluespec) Invent wires Wires similar to registers, but: Registers: Read early in clock cycle, written at end Wires: Written first, read later Value goes away at end of cycle 44
A FIFO problem Immediate solution (Bluespec) New primitive: no great upheaval to language Methods set wires; an internal rule makes the appropriate state change Allows the problem to be overcome by enhancing the FIFO s design 45
A FIFO problem Immediate solution (Bluespec) But: Separating method from state change breaks atomicity needs care to get internal behavior right Leads users into bad practices, and spaghetti-like designs 46
A FIFO problem Longer-term solution (Arvind) Extend language: allow designers to specify scheduling requirements Performance Guarantees Allow pop before push in same cycle Compiler generates necessary hardware Cleaner; more robust Need more intuition about what the compiler will do 47
A Final Concern Has Arvind s fame spread far and wide? 48
c:\users\stoy>finger arvind@mit.edu [mit.edu] 1 Student data loaded as of May 17, Staff data loaded as of May 17. Notify Personnel or use WebSIS as appropriate to change your information. Our on-line help system describes How to change data, how the directory works, where to get more info. For a listing of help topics, enter finger help@mit.edu. Try finger help_about@mit.edu to read about how the directory works. Directory bluepages may be found at http://mit.edu/communications/bp. There were 3 matches to your request. Complete information will be shown only when one individual matches your query. Resubmit your query with more information. For example, use both firstname and lastname or use the alias field. name: Arvind, Amarnath department: System Design And Management year: G alias: A-arvind name: Jairam, Arvind department: Lincoln Laboratory title: LL - Associate Staff alias: A-jairam name: Arvind department: Dept of Electrical Engineering & Computer Science title: Charles W & Jennifer C Johnson Professor in CS Eng alias: arvind c:\users\stoy> 49
c:\users\stoy>finger arvind@mit.edu [mit.edu] 2 Student data loaded as of May 17, Staff data loaded as of May 17. Notify Personnel or use WebSIS as appropriate to change your information. Our on-line help system describes How to change data, how the directory works, where to get more info. For a listing of help topics, enter finger help@mit.edu. Try finger help_about@mit.edu to read about how the directory works. Directory bluepages may be found at http://mit.edu/communications/bp. There were 3 matches to your request. Complete information will be shown only when one individual matches your query. Resubmit your query with more information. For example, use both firstname and lastname or use the alias field. name: Arvind, Amarnath department: System Design And Management year: G alias: A-arvind name: Jairam, Arvind department: Lincoln Laboratory title: LL - Associate Staff alias: A-jairam name: Arvind department: Dept of Electrical Engineering & Computer Science title: Charles W & Jennifer C Johnson Professor in CS Eng alias: arvind c:\users\stoy> 50
Conclusion: Arvind is recursively impossible. 51