Hardware/Software Introduction Chapter 2: Custom single-purpose processors
Outline Introduction Combinational logic Sequential logic Custom single-purpose processor design RT-level custom single-purpose processor design 2
Introduction Processor Digital circuit that performs a computation tasks Controller and datapath General-purpose: variet of computation tasks Single-purpose: one particular computation task Custom single-purpose: non-standard task A custom single-purpose processor ma be Fast, small, low power But, high NRE, longer time-to-market, less fleible CCD lens Digital camera chip A2D JPEG codec DMA controller CCD preprocessor Microcontroller Piel coprocessor Displa ctrl D2A Multiplier/Accum Memor controller ISA bus interface UART LCD ctrl 3
CMOS transistor on silicon Transistor The basic electrical component in digital sstems Acts as an on/off switch Voltage at gate controls whether current flows from source to drain Don t confuse this gate with a logic gate gate source Conducts if gate= drain IC package IC source gate oide channel drain Silicon substrate 4
CMOS transistor implementations Complementar Metal Oide Semiconductor We refer to logic levels Tpicall is V, is 5V Two basic CMOS tpes nmos conducts if gate= pmos conducts if gate= Hence complementar Basic gates Inverter, NAND, NOR gate source Conducts gate source Conducts if gate= if gate= drain drain nmos pmos F = ' F = ()' F = (+)' inverter NAND gate NOR gate 5
Basic logic gates F = Driver F F F = AND F F F = + OR F F F = XOR F F F = Inverter F F F = ( ) NAND F F F = (+) NOR F F F = XNOR F F 6
Combinational logic design A) Problem description B) Truth table C) Output equations is if a is to, or b and c are. z is if b or c is to, but not both, or if all are. D) Minimized output equations a bc z a bc = a + bc Inputs Outputs a b c z a b c = a'bc + ab'c' + ab'c + abc' + abc z =a'b'c+a'bc'+ab'c+abc'+abc E) Logic Gates z z = ab + b c + bc 7
Combinational components I(m-) I I n I I(log n -) A n B n A n B n A n B S S(log m) n-bit, m Multipleor n log n n Decoder n-bit Adder n n-bit Comparator n bit, m function S ALU n S(log m) O O(n-) O O carr sum less equal greater O O = I if S=.. I if S=.. I(m-) if S=.. O = if I=.. O = if I=.. O(n-) = if I=.. sum = A+B (first n bits) carr = (n+) th bit of A+B less = if A<B equal = if A=B greater= if A>B O = A op B op determined b S. With enable input e all O s are if e= With carr-in input Ci sum = A + B + Ci Ma have status outputs carr, zero, etc. 8
Sequential components n I load clear n-bit Register n shift I n-bit Shift register Q n-bit Counter n Q Q Q = if clear=, I if load= and clock=, Q(previous) otherwise. Q = lsb - Content shifted - I stored in msb Q = if clear=, Q(prev)+ if count= and clock=. 9
Sequential logic design A) Problem Description C) Implementation Model D) State Table (Moore-tpe) You want to construct a clock divider. Slow down our preeisting clock so that ou output a for ever four clock ccles a= B) State Diagram = a= = 3 a= a Combinational logic Q Q State register I I I I Inputs Outputs Q Q a I I a= a= a= a= 2 = = a= Given this implementation model Sequential logic design quickl reduces to combinational logic design
Sequential logic design (cont.) E) Minimized Output Equations F) Combinational Logic I QQ a a I = Q Qa + Qa + QQ I QQ a I = Qa + Q a I a QQ = QQ Q Q I
Custom single-purpose processor basic model eternal control inputs controller datapath control inputs eternal data inputs datapath controller net-state and control logic datapath registers datapath control outputs state register functional units eternal control outputs eternal data outputs controller and datapath a view inside the controller and datapath 2
Eample: greatest common divisor First create algorithm Convert algorithm to comple state machine Known as FSMD: finitestate machine with datapath Can use templates to perform such conversion (a) black-bo view go_i _i GCD d_o _i (b) desired functionalit : int, ; : while () { 2: while (!go_i); 3: = _i; 4: = _i; 5: while (!= ) { 6: if ( < ) 7: = - ; else 8: = - ; } 9: d_o = ; } : 2: 2-J: 5: 6: 7: = - 8: = - -J: 3: 4: 6-J: 5-J: 9: < = _i = _i d_o =!go_i!=!!(!go_i)!(!=)!(<) (c) state diagram 3
State diagram templates Assignment statement a = b net statement Loop statement while (cond) { loop-bodstatements } net statement Branch statement if (c) c stmts else if c2 c2 stmts else other stmts net statement a = b C: cond!cond C: c!c*c2!c*!c2 net statement loop-bodstatements c stmts c2 stmts others J: J: net statement net statement 4
Creating the datapath Create a register for an declared variable Create a functional unit for each arithmetic operation Connect the ports, registers and functional units Based on reads and writes Use multipleors for multiple sources Create unique identifier for each datapath component control input and output : 2: 2-J: 3: 4: 5: 6: 7: = - 8: = - -J: 6-J: 5-J: 9: < = _i = _i d_o =!go_i!=!!(!go_i)!(!=)!(<) _i _i _sel n-bit 2 n-bit 2 _sel _ld : : _ld!= < subtractor 5:!= 6: < 8: - _neq lt_ d_ld Datapath subtractor 7: - 9: d d_o 5
Creating the controller s FSM : 2: 2-J: 3: 4: 5: 6: 7: = - 8: = - 6-J: 5-J: 9: < = _i = _i d_o =!go_i!=!!(!go_i)!(!=)!(<) Controller : 2: 2-J: 5: 6: _neq_!_neq lt_!_lt_ 7: _sel = 8: _sel = _ld = _ld = 5-J: 9: 3: 4:!go_i _sel = _ld = _sel = _ld = d_ld =!!(!go_i) 6-J: go_i Same structure as FSMD Replace comple actions/conditions with datapath configurations _sel _sel _ld _ld 5:!= 6: < _neq lt_ d_ld _i!= n-bit 2 _i : : < n-bit 2 subtractor 8: - Datapath subtractor 7: - 9: d -J: -J: d_o 6
Splitting into a controller and datapath Controller implementation model go_i Combinational logic _sel _sel _ld _ld _neq lt_ Controller : 2: 2-J: 3:!go_i _sel = _ld =!!(!go_i) go_i _sel _sel _ld _ld _i _i n-bit 2 n-bit 2 : : (b) Datapath Q3 I3 Q2 Q Q State register I2 I I d_ld 4: 5: 6: _sel = _ld = _neq_= _neq_= _lt_= _lt_= 7: _sel = 8: _sel = _ld = _ld = 6-J:!= < 5:!= 6: < _neq lt_ d_ld subtractor 8: - subtractor 7: - 9: d d_o 5-J: 9: -J: d_ld = 7
Controller state table for the GCD eample Inputs Outputs Q3 Q2 Q Q _neq _lt_ go_i I3 I2 I I _sel _sel _ld _ld d_ld _ * * * X X * * X X * * X X * * * X X * * * X * * * X * * X X * * X X * * X X * * X X * * * X * * * X * * * X X * * * X X * * * X X * * * X X * * * X X * * * X X * * * X X 8
Completing the GCD custom single-purpose processor design We finished the datapath We have a state table for the net state and control logic All that s left is combinational logic design This is not an optimized design, but we see the basic steps controller net-state and control logic state register datapath registers functional units a view inside the controller and datapath 9
RT-level custom single-purpose processor design We often start with a state machine Rather than algorithm Ccle timing often too central to functionalit Problem Specification Sende r rd_in clock data_in(4) Bridge A single-purpose processor that converts two 4-bit inputs, arriving one at a time over data_in along with a rd_in pulse, into one 8-bit output on data_out along with a rd_out pulse. rd_out data_out(8) Rece iver Eample Bus bridge that converts 4-bit bus to 8-bit bus Start with FSMD Known as register-transfer (RT) level Eercise: complete the design FSMD rd_in= rd_in= WaitFirst4 rd_in= WaitSecond4 Send8Start data_out=data_hi & data_lo rd_out= Bridge RecFirst4Start data_lo=data_in rd_in= rd_in= RecSecond4Start data_hi=data_in rd_in= Send8End rd_out= rd_in= RecFirst4End rd_in= RecSecond4End Inputs rd_in: bit; data_in: bit[4]; Outputs rd_out: bit; data_out:bit[8] Variables data_lo, data_hi: bit[4]; 2
RT-level custom single-purpose processor design (cont ) (a) Controller rd_in= WaitFirst4 rd_in= WaitSecond4 Bridge rd_in= RecFirst4Start data_lo_ld= rd_in= rd_in= RecSecond4Start data_hi_ld= rd_in= RecFirst4End rd_in= RecSecond4End Send8Start data_out_ld= rd_out= Send8End rd_out= rd_in clk data_in(4) to all registers (b) Datapath data_out_ld data_hi_ld data_hi data_out data_lo data_lo_ld rd_out data_out 2
Optimizing single-purpose processors Optimization is the task of making design metric values the best possible Optimization opportunities original program FSMD datapath FSM 22
Optimizing the original program Analze program attributes and look for areas of possible improvement number of computations size of variable time and space compleit operations used multiplication and division ver epensive 23
Optimizing the original program (cont ) original program : int, ; : while () { 2: while (!go_i); 3: = _i; 4: = _i; 5: while (!= ) { 6: if ( < ) 7: = - ; else 8: = - ; } 9: d_o = ; } GCD(42, 8) - 9 iterations to complete the loop replace the subtraction operation(s) with modulo operation in order to speed up program and values evaluated as follows : (42, 8), (43, 8), (26,8), (8,8), (, 8), (2,8), (2,6), (2,4), (2,2). optimized program : int,, r; : while () { 2: while (!go_i); // must be the larger number 3: if (_i >= _i) { 4: =_i; 5: =_i; } 6: else { 7: =_i; 8: =_i; } 9: while (!= ) { : r = % ; : = ; 2: = r; } 3: d_o = ; } GCD(42,8) - 3 iterations to complete the loop and values evaluated as follows: (42, 8), (8,2), (2,) 24
Optimizing the FSMD Areas of possible improvements merge states states with constants on transitions can be eliminated, transition taken is alread known states with independent operations can be merged separate states states which require comple operations (a*b*c*d) can be broken into smaller states to reduce hardware size scheduling 25
Optimizing the FSMD (cont.) : 2: 2-J: 3: int, ;!!(!go_i)!go_i = _i original FSMD eliminate state transitions have constant values merge state 2 and state 2J no loop operation in between them optimized FSMD int, ; 2: 3: 5: go_i = _i = _i!go_i 4: 5: = _i!(!=) merge state 3 and state 4 assignment operations are independent of one another < > 7: = - 8: = - 7: 6: = - 6-J:!= <!(<) 8: = - merge state 5 and state 6 transitions from state 6 can be done in state 5 eliminate state 5J and 6J transitions from each state can be done from state 7 and state 8, respectivel 9: d_o = 5-J: 9: d_o = eliminate state -J transition from state -J can be done directl from state 9 -J: 26
Optimizing the datapath Sharing of functional units one-to-one mapping, as done previousl, is not necessar if same operation occurs in different states, the can share a single functional unit Multi-functional units ALUs support a variet of operations, it can be shared among operations occurring in different states 27
Optimizing the FSM State encoding task of assigning a unique bit pattern to each state in an FSM size of state register and combinational logic var can be treated as an ordering problem State minimization task of merging equivalent states into a single state state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the net same state 28
Summar Custom single-purpose processors Straightforward design techniques Can be built to eecute algorithms Tpicall start with FSMD CAD tools can be of great assistance 29