Design of Embedded DSP Processors Unit 3: Microarchitecture, Register file, and ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 1
Contents 1. Microarchitecture and its design 2. Hardware design fundamentals 3. Microarchitecture specification 4. Register file 5. Arithmetic and logic unit ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 2
Microarchitecture concept 9/11/2017 Unit 3 of TSEA26-2017 H1 3
Architecture design System level HW specifications, such as SoC behavior, black box descriptions for hardware modules, memory subsystem, interconnection between modules. Architecture design does not involve in the implementation of HW modules in detail. 9/11/2017 Unit 3 of TSEA26-2017 H1 4
Microarchitecture design Module functional implementation including function specification, partition and allocation (mapping function to pipeline and functional devices), connection and integration. The inputs for the RTL coding Could be IP design, independent to SoC 9/11/2017 Unit 3 of TSEA26-2017 H1 5
ASIP Microarchitecture design HW implementation of each assembly instruction, design of ASIP core. partition each instruction to micro ops, allocate each micro op to a HW module, schedule each micro op into different pipeline stages performance, cost, power trade offs 9/11/2017 Unit 3 of TSEA26-2017 H1 6
ASIP (IP) micro architecture Fuctions of each instruction HW functions in each pipeline Data precision Design corners HW components HW sharing Pipeline Connections Data in-out Adress in-out Control in-out Where are critical paths, speed up Hardware cost Power consumption 9/11/2017 7
Micro arch component: Register A storage component or a function isolation device between pipeline stages. A register is a D flip-flops (no others) Input signal Scan input Scan mode 0 1 reset D In D-flip-flop Q output clock enable Clock Q 9/11/2017 Unit 3 of TSEA26-2017 H1 8
control control input 1 input 2 input 3 input 4 input Copyright of Linköping University, all rights reserved Multiplexer and operand keeper MUX: Selecting one of multiple inputs as the output according to its selection control Operand keeper: a two-way multiplexer and a register. Can keep its value, though bus value changes. 00 01 10 11 1 0 output (a) multiplexer output (b) operand keeper 9/11/2017 Unit 3 of TSEA26-2017 H1 9
Ripple-Carry Adder, a simplified view x k 1 y k 1 x y k-2 k 2 x y 1 1 x y 0 0 c k c out c k 1 c k 2 c 2 c 1 FA FA... FA FA c 0 c in s k s k 1 s k 2 s 1 s 0 Advanced adders are designs for carry-accelerations Mar. 2011 Unit 3 of TSEA26-2017 H1 Slide 10
A 8b x 8b unsigned multiplier As a primitive, could be signed or unsigned! unsigned 8-bit multiplier 9/11/2017 For teachers using the book 11
Column line + Column bar line The row decoder Copyright of Linköping University, all rights reserved A single port 4b SRAM module A memory cell Row line The column decoder and R-W circuit (a) A memory cell 4 Data in-out bits (b) 128x4-bit signal port SRAM 9/11/2017 Unit 3 of TSEA26-2017 H1 12
PC and PM Instruction buffer AGU M2 RF M1 ALU MAC Copyright of Linköping University, all rights reserved ASIP top view (using components introduced in this lecture) Instruction decoder 9/11/2017 Unit 3 of TSEA26-2017 H1 13
Microarchitecture design 9/11/2017 Unit 3 of TSEA26-2017 H1 14
Function allocation Pre operations Arithmetic 1 Arithmetic 2 Selection Common operations 9/11/2017 Unit 3 of TSEA26-2017 H1 15
HW multiplexing Possible functions 1. A + C 2. A + D 3. B + C 4. B + D 5. A * C 6. A * D 7. B * C 8. B * D 9. SAT(A + C) 10. SAT(A + D) 11. SAT(B + C) 12. SAT(B + D) 13. SAT(A * C) 14. SAT(A * D) 15. SAT(B * C) 16. SAT(B * D) Pre processing A B C D Control[1] 0 1 0 MA MB 1 opa opb Kernel processing ADD MUL Post processing-1 Control[2] 0 1 MP1 result1 Post processing-2 Saturation Control[3] 0 1 MP2 Control[0] 9/11/2017 Unit 3 of TSEA26-2017 H1 16
Instruction fetch from PM Instruction decoding and memory addressing Memory MAC multiplication MAC accumulation OP fetch ALU and next PC Copyright of Linköping University, all rights reserved Pipeline scheduling 9/11/2017 Unit 3 of TSEA26-2017 H1 17
Often used legends in flowcharts Copyright of Linköping University, all rights reserved Combinational logic flowchart start Start or stop Action or process choice action or Document Decision action Input or output choice action or Subroutine Database end Case 9/11/2017 Unit 3 of TSEA26-2017 H1 18
Sequential logic flowchart Combinational flowchart: Specifies F2<=...F3<=... Combinational flowchart: Specifies F2<=...F3<=... Sequential flowchart Start: sequential logic with sync reset Sequential flowchart Start: sequential logic with async reset clk = 1 and clk event @(posedge clk) (Verilog) high rst_b low high rst_b low clk = 1 and clk event @(posedge clk) (Verilog) F2_r <= F2; F3_r <= F3; F2_r <= 8 b0; F3_r <= 8 b0; F2_r <= F2; F3_r <= F3; F2_r <= 8 b0; F3_r <= 8 b0; Stop: sequential logic with sync reset Stop: sequential logic with async reset 9/11/2017 (a) For teachers using the book (b) 19
Design a PC FSM reset PC <= PC in loop reset else PC <= 0 reset Hold reset To loop else else Default state: PC <= PC +1 Stack pop Jump taken else else PC <= Jump target address reset reset PC <= stack 9/11/2017 Unit 3 of TSEA26-2017 H1 20
Register file 9/11/2017 Unit 3 of TSEA26-2017 H1 21
General register file RF: A general register file, a group of registers as the lowest level computing / storage buffers, multi read and (one) write can be executed in parallel. DM: Data memory (with a single read/write port) can access one data at a time, read and write cannot be executed simultaneously. 9/11/2017 Unit 3 of TSEA26-2017 H1 22
Store circuit RF: register file Copyright of Linköping University, all rights reserved Write circuit from register file from memory 1 from memory 2 from ALU from ports... from MAC ctrl_reg_in register 1 register 2 register 3... register n Read circuit OPA ctrl_o_a ctrl_o_b OPB 9/11/2017 Unit 5 of TSEA26 23
Physical design: fan-in fan-out problem Fan-out of the control signal For the first stage: 16*16*2 = 512 From 32 registers in a register file Fan-out of the control signal For the second stage: 16*8*2 = 256 Fan-out of the control signal For the third stage: 16*4*2 = 128 Fan-out of the control signal For the fourth stage: 16*2*2 = 64 Fan-out of the control signal For the fivth stage: 16*1*2 = 32 9/11/2017 Unit 5 of TSEA26 24 Selected operand
PC and PM Instruction buffer AGU M2 RF M1 ALU MAC Copyright of Linköping University, all rights reserved Register file in an ASIP core Instruction decoder 9/11/2017 Unit 3 of TSEA26-2017 H1 25
ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 26
ALU in general ALU: Arithmetic and Logic Unit Arithmetic, Logic, Shift / rotate, others Get operands from RF and immediate Send result to RF One guard bit for single step computing, not for iterative computing 9/11/2017 Unit 3 of TSEA26-2017 H1 27
ALU Schematic A [15:0] B [15:0] Masker, guard, carry-in, and other preprocessing Logic unit Shift unit Saturation and flag processing Result [15:0] FA/FC, FS, FZ 9/11/2017 Unit 3 of TSEA26-2017 H1 28
ALU specification Arithmetic computing Logic computing Shift/rotate Special function Corner cases What HW components are needed How to share hardware Data in/out-processes Needs control signals The critical paths may not be here The HW cost/power may not be critical 9/11/2017 29
Pre-and-post processing Select operands: from RF, ID (immediate) Operand pre processing: Guard: sign extension, guard=sign: [16] = [15] Other pre-operations: mask, carry in Post operations Select result from AU, LU, shift unit, and others To generate carry-out or to saturate Flag operation: Flag computing and prediction 9/11/2017 Unit 3 of TSEA26-2017 H1 30
AU (arithmetic unit) in ALU A [15:0] B [15:0] Masker, guard, carry-in, and other preprocessing Logic unit Shift unit Saturation and flag processing Result [15:0] FA/FC, FS, FZ 9/11/2017 Unit 3 of TSEA26-2017 H1 31
How to design a full adder in an IP {A[15], A[15:0], 1 } {B[15],B[15:0],CIN} + 18b full adder FAO [17:0] Result [16:0] < =FAO [17:1] Full adder may have no carry in One guard bit We need 18b full adder LSB of 18b result will not be used MSB of 18b result will be the guard 9/11/2017 Unit 3 of TSEA26-2017 H1 32
Example: ALU instruction list Instructions Function OP CIN SAT ADD SAT A + B with saturation 000 00 1 ADD COUT A + B without saturation 000 00 0 ADD CIN SAT A + B + Cin with saturation 000 1x 1 ADD CIN COUT A + B + Cin without saturation 000 1x 0 SUB SAT, CMP A - B with saturation and compare 100 01 1 SUB COUT A - B without saturation 100 01 1 ABS(A) ABS(A) Absolute operation, saturation 111 00 1 NEG(A) NEG(A) Negate operation, saturation 101 00 1 INC(A) Increment and saturation 001 00 1 DEC(A) Decrement and saturation 011 00 1 AVG (A+B)/2 Average operation, saturation 010 00 1 9/11/2017 Unit 3 of TSEA26-2017 H1 33
Example: HW Function of each instruction A B A B A B A B A B A B + Saturation + + Saturation Cin + Cin + 1 Saturation + 1 a. SAT(A + B) b. A + B c. SAT(A + B + C) d. A + B +C e. SAT(A -B) f. A - B A MSB of A A B=1 A B=1 A B=-1 A B 0 1 + + + + + ARS g. ABS(A) h. NEG(A) i. INC(A) j. DEC(A) k. Average (A+B) 9/11/2017 Unit 3 of TSEA26-2017 H1 34
Example: Implement without HW sharing A B 1 1-1 S S S 1 1 >>1 S S S S S Cin a b c d e f g h i j k Control signal Result 9/11/2017 Unit 3 of TSEA26-2017 H1 35
Example: sharing step1: saturation A B A B A B + + + Saturation Saturation a b a. SAT(A+B) b. A + B Share the common part and multiplexing the rest 9/11/2017 Unit 3 of TSEA26-2017 H1 36
The difference is here a/b c/d Copyright of Linköping University, all rights reserved Example: sharing step2, carry-in A B A B A B C + Cin=0 + Cin=C + 0 Saturation Saturation Saturation a b c d n= a or b m=c or d a/c b/d Share the common part and multiplexing the rest 9/11/2017 Unit 3 of TSEA26-2017 H1 37
a/b c/d ab cd ef Copyright of Linköping University, all rights reserved Example: sharing step3, +/- A B A + B 1 0 + abcd ef 1 C 0 Saturation Saturation a/c b/d a/c b/d 9/11/2017 Unit 3 of TSEA26-2017 H1 38
00 01 1X 00 01 1x Copyright of Linköping University, all rights reserved Example: The final circuit likes this Instruction decoder in control path IF OP=101 C1<= 01 ELSEIF OP=111 C1<= 1x ELSE C1<= 00 C1 Simple arithmetic unit in datapath {A[15], A[15:0]} {B[15], B[15:0]} A[15] {16 b0, A[15]} 1-1 IF OP=111 C2<= 1xx ELSEIF OP=100 C2<= 001 ELSEIF OP=x01 C2<= 010 ELSEIF OP=011 C2<= 011 ELSE C2<= 000 C2 1 0 0 1 000 001 010 011 1XX C 1 C3<=CIN C3 + 0 IF OP=010 C4<= 1X ELSEIF SAT= 0 C4<= 00 ELSE C4<= 01 C4 SAT >>1 01 00 1x FLAG 9/11/2017 Unit 3 of TSEA26-2017 H1 39
Questions to discuss If there is no guard bit, what will be the result of ABS(A) when A= 1 (fractional) In what cases, an ALU output needs carry-out, and in what cases, an ALU output needs saturation For what operations, we need carry-in, and when we do not need carry-in Write RTL codes for result flags (sign and zero). Shall we use sign bit or guard bit as the sign flag? 9/11/2017 For teachers using the book 40
Concepts Copyright of Linköping University, all rights reserved Skills Review on Unit 3 System understanding Plan HW schematic HW coding Finite precision Micro architecture Register file ALU: Arithmetic & Logic Program flow control ALU cannot be used for iterative computing How to collect instructions, and extract functions from selected instructions. Finally map on hardware Hardware sharing design process, read my book! Write conflict concept Get the data at the same pipeline step, no wait Design for HW sharing ------ ------ Critical path Schematic design and plan hardware for IP reuse sharing Fanout Coding for IP reuse How to extract/require ALU/RF control signals according to the instruction specification and binary assembly codes 9/11/2017 Unit 3 of TSEA26-2017 H1 41
Self reading after the lecture Why microarchitecture design is essential Quick read chapter 10 /11, read chapter 12 Think about: How to specify microarchitecture on Y-chart How to design a RF with multi-write ports How to design an IP module using any kinds of RTL primitive and any synthesis tool based on the design of ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 42
Exciting time now! Let us discuss Whatever related to HW you want to discuss You will have the chance after each lecture (Fö), do take the chance! Prepare your Qs for the next time 9/11/2017 Unit 3 of TSEA26-2017 H1 43
LOGO Welcome to ask any questions you want to I can answer Or discuss together I want to know what you want Dake Liu, Room 556 coridoor B, Hus-B, phone 281256, dake.liu@liu.se