Computer Architecture (TT 2012) The Register Transfer Level Daniel Kroening Oxford University, Computer Science Department Version 1.0, 2011
Outline Reminders Gates Implementations of Gates Latches, Flip-flops and Clocks Timing Analysis Transition Systems A Brief Verilog Primer Four-valued Logic Event-driven Simulation D. Kroening: Computer Architecture (TT 2012) 2
Reminders Abstract gates abstract and CMOS gates Flip-flops and clocks D. Kroening: Computer Architecture (TT 2012) 3
Logic Gates Basic building block of digital circuitry Implement Boolean functions Gates have multiple inputs, usually one output Inputs/outputs are assigned a logical value 0 or 1 Representation using voltage, for example: 0 0V voltage 1 3.3V voltage D. Kroening: Computer Architecture (TT 2012) 4
Abstract Switches x x Closed when x = 1 Closed when x = 0 Idea: switch closed upper and lower port are connected D. Kroening: Computer Architecture (TT 2012) 5
Example: Inverter and NOR Inverter logical negation x x 0 1 1 0 (INV PIC) NOR disjunction with negation x y x y 0 0 1 0 1 0 1 0 0 1 1 0 (NOR PIC) D. Kroening: Computer Architecture (TT 2012) 6
Building an Inverter using Switches x y input output D. Kroening: Computer Architecture (TT 2012) 7
Building an Inverter using Switches open x y input output The 1 on the input closes the lower switch. D. Kroening: Computer Architecture (TT 2012) 8
Building an Inverter using Switches x y input output The 0 on the input closes the upper switch. D. Kroening: Computer Architecture (TT 2012) 9
Building NOR using Switches x y inputs output z = x nor y D. Kroening: Computer Architecture (TT 2012) 10
Building NOR using Switches x y inputs output z = x nor y D. Kroening: Computer Architecture (TT 2012) 11
Building NOR using Switches x y inputs 2nd switch still open! output z = x nor y D. Kroening: Computer Architecture (TT 2012) 12
Building NOR using Switches x y inputs output z = x nor y D. Kroening: Computer Architecture (TT 2012) 13
Typical Implementations of Gates Examples: vacuum tube BJT: Bipolar Junction Transistor (used in TTL) FET: Field effect transistor (used in MOS) MOS = Metal Oxide Semiconductor CMOS: Complementary MOS D. Kroening: Computer Architecture (TT 2012) 14
NMOS source metal gate metal drain metal insulator SiO2 n doped n doped p doped substrate Implementation of a switch with doped silicon n/p-doping: excess of negative/positive charge carriers PMOS is dual to NMOS. http://www.digitaltechnik.org/flash/nmos_flash.html D. Kroening: Computer Architecture (TT 2012) 15
NMOS Open source n doped gate p doped substrate n doped drain insulator SiO 2 Voltage gate/source 0 V: switch open (no current) D. Kroening: Computer Architecture (TT 2012) 16
NMOS Closed source n doped gate n channel p doped substrate n doped drain insulator SiO 2 Voltage gate/source > 0 V: switch closed (drain/source current possible) D. Kroening: Computer Architecture (TT 2012) 17
Schematics Drain Drain Gate Gate Source NMOS Source PMOS D. Kroening: Computer Architecture (TT 2012) 18
CMOS Inverter VDD IN PMOS OUT IN is 0: PMOS closed, NMOS open OUT is 1 GND NMOS IN is 1: PMOS open, NMOS closed OUT is 0 D. Kroening: Computer Architecture (TT 2012) 19
CMOS NOR VDD y x nor y x GND D. Kroening: Computer Architecture (TT 2012) 20
CMOS NAND VDD x y x nand y GND D. Kroening: Computer Architecture (TT 2012) 21
VLSI Prescott (2004), 90 nm process, 125 m. transistors on 122mm 2 Poulson (2012): 32 nm process, 3.1 bn transistors on 544mm 2 D. Kroening: Computer Architecture (TT 2012) 22
Latches and Flip-flops So far: combinational circuitry Outputs are a function of the inputs But we would like to store data D. Kroening: Computer Architecture (TT 2012) 23
Latches and Flip-flops So far: combinational circuitry Outputs are a function of the inputs But we would like to store data Latches store data level-sensitive Flip-flops store data edge-triggered D. Kroening: Computer Architecture (TT 2012) 23
D-Flip-Flop input (data) clock D Q C Q state state negated most simplistic sequential building block: stores input for one clock period usually comes with (asynchronous) reset or set input signal D. Kroening: Computer Architecture (TT 2012) 24
Positive and Negative Edges raising falling C falling and rising clock edge D. Kroening: Computer Architecture (TT 2012) 25
D-Flip-Flop with Positive Edge Triggering Transition table D C Q Q 0 0 1 1 1 0 Q Q 0 Q Q 1 Q Q Notation: the prime in Q means value in the next state positive edge, negative edge D. Kroening: Computer Architecture (TT 2012) 26
D-Flip-Flop with Positive Edge Triggering D C Q Q Changing D only has an effect on the next (positive) clock edge! D. Kroening: Computer Architecture (TT 2012) 27
A High-level, Simplified View inputs combinational logic outputs current state registers next state D. Kroening: Computer Architecture (TT 2012) 28
A High-level, Simplified View This ignores (fails to model) techniques such as multi-cycle paths asynchronous sub-circuits multiple clock domains But: somewhat suitable to model clock multipliers D. Kroening: Computer Architecture (TT 2012) 29
Timing Analysis How much can we crank up the clock? D. Kroening: Computer Architecture (TT 2012) 30
Timing Analysis How much can we crank up the clock? What happens if we overdo it? D. Kroening: Computer Architecture (TT 2012) 30
Timing Requirements of a D-Flip-Flop t w t phl clk t plh t s t h t s t h Input stable during setup phase t s before the edge Input stable during hold phase t h after the edge Output is stable after propagation phase t plh or t phl, resp., after the edge minimal clock period (width) t w D. Kroening: Computer Architecture (TT 2012) 31
Timing Example t s t h t s t h D t w t w clk t plh Q t phl D. Kroening: Computer Architecture (TT 2012) 32
Maximal Clock Frequency setup + hold time + propagation delay of the flip-flops + delay of the combinational circuitry (longest path!) = cycle time The maximal clock frequency is the inverse of the cycle time D. Kroening: Computer Architecture (TT 2012) 33
Example D Q Q 9 6 6 6 D Q Q D Q Q 7 9 9 D Q Q Component t p t s AND 7 ns - NAND 6 ns - OR 6 ns - Component t p t s NOR 5 ns - XOR 9 ns - D-Flipflop 11 ns 3 ns D. Kroening: Computer Architecture (TT 2012) 34
Example D Q Q 0 0 9 9 6 15 0 6 21 6 27 D Q Q D Q Q 0 0 0 7 7 9 16 9 25 D Q Q Component t p t s AND 7 ns - NAND 6 ns - OR 6 ns - Component t p t s NOR 5 ns - XOR 9 ns - D-Flipflop 11 ns 3 ns D. Kroening: Computer Architecture (TT 2012) 34
Example D Q Q 0 0 9 9 6 15 0 6 21 6 27 D Q Q D Q Q 0 0 7 0 7 9 16 9 25 D Q Q Component t p t s AND 7 ns - NAND 6 ns - OR 6 ns - Component t p t s NOR 5 ns - XOR 9 ns - D-Flipflop 11 ns 3 ns D. Kroening: Computer Architecture (TT 2012) 34
Example 3 ns Setup + hold time + 11 ns propagation Delay of the flip-flops + 27 ns longest path = 41 ns cycle time Maximum clock frequency: 1 41ns 24390244 Hz 24.4 MHz D. Kroening: Computer Architecture (TT 2012) 35
Pipelining Q: How can we increase the clock frequency? (will please marketing department) Reminder: the clock frequency is determined by the longest path between two D-flip-flops. D. Kroening: Computer Architecture (TT 2012) 36
Pipelining Q: How can we increase the clock frequency? (will please marketing department) Reminder: the clock frequency is determined by the longest path between two D-flip-flops. Idea: put a D-flip-flop into that path! D. Kroening: Computer Architecture (TT 2012) 36
Example Pipelining D Q Q 0 0 9 9 6 15 0 6 21 6 27 D Q Q D Q Q 0 0 0 7 7 9 16 9 25 D Q Q D. Kroening: Computer Architecture (TT 2012) 37
Example Pipelining D D Q Q Q Q 0 0 0 9 0 9 0 6 7 15 0 7 D D D Q Q Q Q Q Q 6 9 9 6 6 9 12 18 D D Q Q Q Q D. Kroening: Computer Architecture (TT 2012) 37
Example Pipelining D D Q Q Q Q 0 0 0 9 0 9 0 6 7 15 0 7 D D D Q Q Q Q Q Q 6 9 9 6 6 9 12 18 D D Q Q Q Q D. Kroening: Computer Architecture (TT 2012) 37
Example Pipelining 3 ns setup + hold time + 11 ns propagation delay of the flip-flops + 18 ns longest path = 32 ns cycle time Maximum clock frequency: 1 32ns 31250000 Hz 31.3 MHz D. Kroening: Computer Architecture (TT 2012) 38
Modelling Sequential Circuits combinational no state think of functional programming (Lisp, ML): no side effects! Outputs are fully determined by the current inputs sequential with state think of programming with state variables and assignment current inputs and past behaviour determine outputs D. Kroening: Computer Architecture (TT 2012) 39
Formalisation of Sequential Circuitry Definition (transition system) A transition system is a triple S, I, T with S: set of states I S: initial states T : S S: transition relation D. Kroening: Computer Architecture (TT 2012) 40
Formalisation of Sequential Circuitry Definition (transition system) A transition system is a triple S, I, T with S: set of states I S: initial states T : S S: transition relation Definition (computation) A computation is a sequence s 0,..., s n with s i S and s 0 I i : (s i, s i+1 ) T D. Kroening: Computer Architecture (TT 2012) 40
Example: Modulo 4 Counter 0 1 3 2 set of states S: the numbers 0 to 3 one initial state: I = {0} transition relation: the counter is incremented modulo 4 T = {(0, 1), (1, 2), (2, 3), (3, 0)} D. Kroening: Computer Architecture (TT 2012) 41
Determinism A state machine may, in general, be non-deterministic 1. deadends: states without successor 2. states with more than one successor D. Kroening: Computer Architecture (TT 2012) 42
Determinism A state machine may, in general, be non-deterministic 1. deadends: states without successor 2. states with more than one successor Hardware is usually deterministic: every state has exactly one successor D. Kroening: Computer Architecture (TT 2012) 42
Transition Functions What about the inputs of our device? We introduce an input alphabet E Transition function δ : (S E) S Initial states: I : (S E) This is a special case of a transition relation: (x, y) T e E.δ(x, e) = y D. Kroening: Computer Architecture (TT 2012) 43
Transition Functions What about the inputs of our device? We introduce an input alphabet E Transition function δ : (S E) S Initial states: I : (S E) This is a special case of a transition relation: (x, y) T e E.δ(x, e) = y for a given e this is deterministic, and there are no deadends D. Kroening: Computer Architecture (TT 2012) 43
Example: Modulo 4 Counter with Synchronous Reset rst rst 0 1 rst rst rst S = {0,..., 3} E = {rst, rst} 3 rst 2 The counter is reset to 0 when rst = 1, otherwise is incremented modulo 4: { 0 : rst δ(s, rst) = (s + 1) mod 4 : otherwise D. Kroening: Computer Architecture (TT 2012) 44
Example: Modulo 4 Counter with Synchronous Reset rst rst 0 1 rst rst rst S = {0,..., 3} E = {rst, rst} 3 rst 2 The counter is reset to 0 when rst = 1, otherwise is incremented modulo 4: { 0 : rst δ(s, rst) = (s + 1) mod 4 : otherwise In hardware, there would usually be bits and bytes. D. Kroening: Computer Architecture (TT 2012) 44
Modulo 4 Counter: Binary Encoding rst 00 rst 01 rst rst rst 11 10 rst States ab are valuations of D-flip-flops with a, b {0, 1} Binary encoding: 0 00, 1 01, 2 10, 3 11 A transition ab c a b current state ab, a next state a b and a condition c is a triple of a D. Kroening: Computer Architecture (TT 2012) 45
Modulo 4 Counter: Transition Function δ = { (00, 0, 01), (00, 1, 00), (01, 0, 10), (01, 1, 00), (10, 0, 11), (10, 1, 00), (11, 0, 00), (11, 1, 00) } Examples: δ(00, 0) = 01, δ(10, 1) = 00 D. Kroening: Computer Architecture (TT 2012) 46
Modulo 4 Counter: Transition Table state a b rst a b 0 0 0 0 0 1 1 0 1 0 1 0 2 1 0 0 1 1 3 1 1 0 0 0 1 0 0 This can be encoded using the following next-state functions: a rst (a b) b rst b D. Kroening: Computer Architecture (TT 2012) 47
Some Computations of the Modulo 4 Counter Transition # 0 1 2 3 4 5 6... rst 1 0 0 0 0 0 0 a 1 0 0 1 1 0 0 b 0 0 1 0 1 0 1 D. Kroening: Computer Architecture (TT 2012) 48
Some Computations of the Modulo 4 Counter Transition # 0 1 2 3 4 5 6... rst 1 0 0 0 0 0 0 a 1 0 0 1 1 0 0 b 0 0 1 0 1 0 1 Transition # 0 1 2 3 4 5 6... rst 1 0 0 1 0 0 0 a 1 0 0 1 0 0 1 b 1 0 1 0 0 1 0 D. Kroening: Computer Architecture (TT 2012) 48
Modulo 4 Counter with Synchronous Reset rst b a clk global clock: clk (we omit this from the model) just one input: rst next-state logic: as marked in blue no output logic in this example D. Kroening: Computer Architecture (TT 2012) 49
Summary: Register-transfer Level 1. Set of registers R i with i {1,..., n} with domain D i = {0, 1} wi for some w i 2. S = D 1 D 2... D n 3. Input e E (possibly partitioned) 4. Next-state function f i (s, e) for each register T (s, s ) e. s = (f 1 (s, e),..., f n (s, e)) D. Kroening: Computer Architecture (TT 2012) 50
Real-world Models of Hardware Old days: schematics Verilog or VHDL plain C C++ or SystemC (functional programming languages practicals) D. Kroening: Computer Architecture (TT 2012) 51
A Brief Verilog Primer Verilog was designed for circuit simulation Now also used for synthesis Procedural, event-driven semantics Circuits are composed of modules, which are connected via ports. D. Kroening: Computer Architecture (TT 2012) 52
Simulation with Four-valued Logic Verilog uses a four-valued logic A wire in the high impedance state has value Z In addition, we have X for unknown (e.g., at the beginning of the simulation) There is also a weak 1 and a weak 0 (we ll ignore these) D. Kroening: Computer Architecture (TT 2012) 53
Semantics of the Four-valued Logic & 0 1 X Z 0 0 0 0 0 1 0 1 X X X 0 X X X Z 0 X X X ˆ 0 1 X Z 0 0 1 X X 1 1 0 X X X X X X X Z X X X X 0 1 X Z 0 0 1 X X 1 1 1 1 1 X X 1 X X Z X 1 X X ˆ 0 1 X Z 0 1 0 X X 1 0 1 X X X X X X X Z X X X X 0 1 1 0 X X Z X D. Kroening: Computer Architecture (TT 2012) 54
Combinational Logic in Verilog timescale 1 ns / 1 ns module gate delay(input a, output b, c, z); assign #10 b =!a; 5 assign #10 c =!b; assign #10 z =!c; endmodule a b c z D. Kroening: Computer Architecture (TT 2012) 55
Delays and Time in Verilog module gate delay tb; wire b, c, z; reg a; 5 gate delay M(a, b, c, z); initial begin a=0; #50; a=1; #10; a=0; end 10 endmodule D. Kroening: Computer Architecture (TT 2012) 56
Example 2-Bit Adder module add2impl(input [1:0] a, input [1:0] b, output [2:0] sum); wire carry1; 5 assign sum[0] = a[0] ˆ b [0]; assign carry1 = a[0] & b [0]; assign sum[1] = a[1] ˆ b[1] ˆ carry1; assign sum[2] = (a[1]&b[1]) (a[1]&carry1) ( b[1]&carry1); 10 endmodule D. Kroening: Computer Architecture (TT 2012) 57
2-Bit Adder: Testbench module add2 tb(); reg [1:0] a, b; wire [2:0] s; 5 add2impl add(a, b, s); initial begin a=0; b=0; #1; assert(s==0); 10 a=1; b=0; #1; assert(s==1); a=0; b=1; #1; assert(s==1); a=1; b=1; #1; assert(s==2); a=2; b=1; #1; assert(s==3); end 15 endmodule The test vectors contain the expected result as assertions D. Kroening: Computer Architecture (TT 2012) 58
Word-level Descriptions in Verilog module my ALU( input [31:0] a, input [31:0] b, input op, output [31:0] out ); 5 assign out = op? a b : a+b; endmodule This actually generates an adder and a multiplier Verilog s expression syntax more or less matches that of Java and C D. Kroening: Computer Architecture (TT 2012) 59
2-Bit Adder: Word-level Specification module main(input [1:0] a, input [1:0] b, input clk, output reg [2:0] x); wire [2:0] s; 5 add2impl add(a, b, s); always @(posedge clk) begin x=s; 10 assert(s==a+b); end endmodule Separates test-vectors and specification D. Kroening: Computer Architecture (TT 2012) 60
Latch in Verilog reg q; always @(d, c) if (c) q <= d; assign q <= c? d : q; assign q <= (d && c) (q &&!c); standard-compliant not compliant Synthesis maps compliant descriptions to a latch from the library non-compliant descriptions may result in asynchronous logic, and hazards D. Kroening: Computer Architecture (TT 2012) 61
D-Flip-flop in Verilog Flip-flops are meant to be implemented as follows: reg q; always @(posedge c) q <= d; Caveat: The contents of the sensitivity list are important: @(posedge clock) There need not be an assignment to q on all paths D. Kroening: Computer Architecture (TT 2012) 62
Verilog: combinational vs. sequential module combinational( input a, b, sel, output reg out); 5 always @ (a or b or sel) begin if (sel) out = a; else out = b; end 10 endmodule module sequential( input a, b, sel, clk, output reg out); 5 always @ (posedge clk) begin if (sel) out = a; else out = b; end 10 endmodule a b 1 0 out a b 1 0 D Q out sel sel reg may be combinational! clk D. Kroening: Computer Architecture (TT 2012) 63
A Verilog Template for State Machines 1. Define state-holding elements 2. Define next-state function 3. Define output function D. Kroening: Computer Architecture (TT 2012) 64
Verilog Module module statemachine(input clk, rst,... // inputs output reg out,...);... 5 endmodule As usual, one clock for everything Inputs and outputs as required D. Kroening: Computer Architecture (TT 2012) 65
State localparam [2:0] OFF = b000, RED = b001, YELLOW= b010, 5 GREEN = b011; reg [2:0] state ; localparam defines module-local constants reg defines the state-holding variables state is usually split into multiple variables D. Kroening: Computer Architecture (TT 2012) 66
Next-State Logic always @(posedge clk) begin if ( rst ) state=init; // synchronous reset else case(state) 5 INIT: if (input1) state=q2; else state=q1; 10 Q1:... Q2:... 15 default: next state=init; endcase end D. Kroening: Computer Architecture (TT 2012) 67
Output Logic // note the dependency on the inputs! always @(state or rst or...) begin out1 = state [1] input1; out2 = state [1] & state [2]; 5 end May contain case(state) to implement a case-split D. Kroening: Computer Architecture (TT 2012) 68
File Operations Verilog has commands for reading files This is usually used to apply externally-generated test data E.g., read an assembler program into a CPU model Output can be saved in a file as well Checker does not have to be Verilog code (e.g., use Perl instead) D. Kroening: Computer Architecture (TT 2012) 69
Event-driven Simulation Goal: maximise simulation speed Exploit causal ordering of events Changing an input may result in a new output value Considers delays Event: signal and value and timestamp D. Kroening: Computer Architecture (TT 2012) 70
Event Queue events current t 1 time t 2 t 3 t 4 D. Kroening: Computer Architecture (TT 2012) 71
Single-pass Event Scheduler 1: for e = each event at current time do 2: UPDATE NODE(e); 3: 4: for j = each gate on the fanout list of e do 5: update input values of j; 6: EVALUATE(j); 7: if new value(j) last scheduled value(j) then 8: schedule new value(j) at (current time + delay(j)); 9: last scheduled value(j) := new value(j); 10: end if 11: end for 12: end for D. Kroening: Computer Architecture (TT 2012) 72
Example (1) x b a c z wire x, a, b, c, z; assign a =!x; assign b =!x; assign c =!b; 5 assign z = a c; D. Kroening: Computer Architecture (TT 2012) 73
Example (2) Suppose we just grab any event off the queue: time (x, a, b, c, z) event queue t (0,1,1,0,1) (x,1,t) external event for x D. Kroening: Computer Architecture (TT 2012) 74
Example (2) Suppose we just grab any event off the queue: time (x, a, b, c, z) event queue t (0,1,1,0,1) (x,1,t) external event for x t (1,1,1,0,1) (a,0,t)(b,0,t) yields two new events D. Kroening: Computer Architecture (TT 2012) 74
Example (2) Suppose we just grab any event off the queue: time (x, a, b, c, z) event queue t (0,1,1,0,1) (x,1,t) external event for x t (1,1,1,0,1) (a,0,t)(b,0,t) yields two new events t (1,1,0,0,1) (a,0,t)(c,1,t) pick b event t (1,1,0,1,1) (a,0,t) pick c event D. Kroening: Computer Architecture (TT 2012) 74
Example (2) Suppose we just grab any event off the queue: time (x, a, b, c, z) event queue t (0,1,1,0,1) (x,1,t) external event for x t (1,1,1,0,1) (a,0,t)(b,0,t) yields two new events t (1,1,0,0,1) (a,0,t)(c,1,t) pick b event t (1,1,0,1,1) (a,0,t) pick c event t (1,0,0,1,1) Finally do a event (note that the value of z never changes, and we thus don t have any event for it) D. Kroening: Computer Architecture (TT 2012) 74
Event Ordering Which event do we pick? The result will heavily depend on the ordering! Solution: introduce (artificial) delta delays along the cause-effect chain (this does not correspond to any notion of real time) Goal: standardise behaviour of simulators D. Kroening: Computer Architecture (TT 2012) 75
Using Delta Delays We always pick the event with the smallest delta delay: time (x, a, b, c, z) event queue t (0,1,1,0,1) (x,1,t) external event for x D. Kroening: Computer Architecture (TT 2012) 76
Using Delta Delays We always pick the event with the smallest delta delay: time (x, a, b, c, z) event queue t (0,1,1,0,1) (x,1,t) external event for x t (1,1,1,0,1) (a,0,t + δ)(b,0,t + δ) yields two events D. Kroening: Computer Architecture (TT 2012) 76
Using Delta Delays We always pick the event with the smallest delta delay: time (x, a, b, c, z) event queue t (0,1,1,0,1) (x,1,t) external event for x t (1,1,1,0,1) (a,0,t + δ)(b,0,t + δ) yields two events t + δ (1,0,1,0,1) (b,0,t + δ)(z,0,t + 2δ) t + δ (1,0,0,0,1) (z,0,t + 2δ)(c,1,t + 2δ) t + 2δ (1,0,0,0,0) (c,1,t + 2δ) t + 2δ (1,0,0,1,0) (z,1,t + 3δ) t + 3δ (1,0,0,1,1) Result: 0-impulse on output z D. Kroening: Computer Architecture (TT 2012) 76
Event Queue with Delta Delays current t 1 timeframe delta event t + δ 1 delta event t 1 + 2δ delta event t 1 + 3δ next timeframe t 2 D. Kroening: Computer Architecture (TT 2012) 77