Design of Embedded DSP Processors

Similar documents
05 - Microarchitecture, RF and ALU

TSEA 26 exam page 1 of Examination. Design of Embedded DSP Processors, TSEA26 Date 8-12, G34, G32, FOI hus G

Design of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1

04 - DSP Architecture and Microarchitecture

Design of Embedded DSP Processors Unit 7: Programming toolchain. 9/26/2017 Unit 7 of TSEA H1 1

Design of Embedded DSP Processors

Examination Design of Embedded DSP Processors, TSEA26

REGISTER TRANSFER LANGUAGE

Two hours - online EXAM PAPER MUST NOT BE REMOVED FROM THE EXAM ROOM UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

UNIT - V MEMORY P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

Note that none of the above MAY be a VALID ANSWER.

BUILDING BLOCKS OF A BASIC MICROPROCESSOR. Part 1 PowerPoint Format of Lecture 3 of Book

Design of Embedded DSP Processors Unit 8: Firmware design and benchmarking. 9/27/2017 Unit 8 of TSEA H1 1

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

Digital Design with FPGAs. By Neeraj Kulkarni

04 - DSP Architecture and Microarchitecture

08 - Address Generator Unit (AGU)

1. Choose a module that you wish to implement. The modules are described in Section 2.4.

14.1 Control Path in General

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

The functional block diagram of 8085A is shown in fig.4.1.

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University

Computer Architecture

CS222: Processor Design

TSEA44 - Design for FPGAs

COMPUTER ORGANIZATION

Design and Implementation of Single Issue DSP Processor Core. Vinodh Ravinath

Basics of Microprocessor

Topics. Midterm Finish Chapter 7

One and a half hours. Section A is COMPULSORY UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Verilog for High Performance

Darshan Institute of Engineering & Technology for Diploma Studies Unit - 1


Chapter 5 Registers & Counters

MICROPROGRAMMED CONTROL

Microcomputer Architecture and Programming

02 - Numerical Representation and Introduction to Junior

Final Exam Solution Sunday, December 15, 10:05-12:05 PM

EXPERIMENT NUMBER 11 REGISTERED ALU DESIGN

EE577A FINAL PROJECT REPORT Design of a General Purpose CPU

REGISTER TRANSFER AND MICROOPERATIONS

CSE 141L Computer Architecture Lab Fall Lecture 3

In this lecture, we will go beyond the basic Verilog syntax and examine how flipflops and other clocked circuits are specified.

ELCT 501: Digital System Design

STRUCTURE OF DESKTOP COMPUTERS

Computer Organization

For Example: P: LOAD 5 R0. The command given here is used to load a data 5 to the register R0.

REGISTER TRANSFER AND MICROOPERATIONS

CHAPTER 8: Central Processing Unit (CPU)

CS 151 Midterm. (Last Name) (First Name)

CHAPTER 4: Register Transfer Language and Microoperations

1. Micro Architecture and Finite Length. Olle Seger Andreas Ehliar Dake Liu, Rizwan Azhgar

Blog -

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

CAD for VLSI Design - I. Lecture 21 V. Kamakoti and Shankar Balachandran

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

Binary Adders. Ripple-Carry Adder

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

COSC 243. Computer Architecture 1. COSC 243 (Computer Architecture) Lecture 6 - Computer Architecture 1 1

VHDL for Synthesis. Course Description. Course Duration. Goals

07 - Program Flow Control

6.1 Combinational Circuits. George Boole ( ) Claude Shannon ( )

Register Transfer Level

UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT

Class Notes. Dr.C.N.Zhang. Department of Computer Science. University of Regina. Regina, SK, Canada, S4S 0A2

Computer Architecture Programming the Basic Computer

CHAPTER 5 Basic Organization and Design Outline Instruction Codes Computer Registers Computer Instructions Timing and Control Instruction Cycle

Microcomputers. Outline. Number Systems and Digital Logic Review

DSP Resources. Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages

COMPUTER ORGANIZATION AND ARCHITECTURE

Register Transfer and Micro-operations

Manual to use the simulator for computer organization and architecture

Computer Architecture and Organization: L04: Micro-operations

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

16.1. Unit 16. Computer Organization Design of a Simple Processor

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics

Topics. Midterm Finish Chapter 7

Computer Organization. Structure of a Computer. Registers. Register Transfer. Register Files. Memories

Finite State Machines (FSMs) and RAMs and CPUs. COS 116, Spring 2011 Sanjeev Arora

ECE 2300 Digital Logic & Computer Organization. More Single Cycle Microprocessor

1 MALP ( ) Unit-1. (1) Draw and explain the internal architecture of 8085.

Lecture 7: Instruction Set Architectures - IV

Control and Datapath 8

a, b sum module add32 sum vector bus sum[31:0] sum[0] sum[31]. sum[7:0] sum sum overflow module add32_carry assign

Nikhil Gupta. FPGA Challenge Takneek 2012

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

PESIT Bangalore South Campus

Register Transfer Level in Verilog: Part I

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Computer Organization and Technology Processor and System Structures

Basic Computer Organization - Designing your first computer. Acknowledgment: Most of the slides are adapted from Prof. Hyunsoo Yoon s slides.

FPGA for Software Engineers

Verilog Fundamentals. Shubham Singh. Junior Undergrad. Electrical Engineering

CPE300: Digital System Architecture and Design

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Assembly Language Programming of 8085

Contents. Chapter 9 Datapaths Page 1 of 28

Transcription:

Design of Embedded DSP Processors Unit 3: Microarchitecture, Register file, and ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 1

Contents 1. Microarchitecture and its design 2. Hardware design fundamentals 3. Microarchitecture specification 4. Register file 5. Arithmetic and logic unit ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 2

Microarchitecture concept 9/11/2017 Unit 3 of TSEA26-2017 H1 3

Architecture design System level HW specifications, such as SoC behavior, black box descriptions for hardware modules, memory subsystem, interconnection between modules. Architecture design does not involve in the implementation of HW modules in detail. 9/11/2017 Unit 3 of TSEA26-2017 H1 4

Microarchitecture design Module functional implementation including function specification, partition and allocation (mapping function to pipeline and functional devices), connection and integration. The inputs for the RTL coding Could be IP design, independent to SoC 9/11/2017 Unit 3 of TSEA26-2017 H1 5

ASIP Microarchitecture design HW implementation of each assembly instruction, design of ASIP core. partition each instruction to micro ops, allocate each micro op to a HW module, schedule each micro op into different pipeline stages performance, cost, power trade offs 9/11/2017 Unit 3 of TSEA26-2017 H1 6

ASIP (IP) micro architecture Fuctions of each instruction HW functions in each pipeline Data precision Design corners HW components HW sharing Pipeline Connections Data in-out Adress in-out Control in-out Where are critical paths, speed up Hardware cost Power consumption 9/11/2017 7

Micro arch component: Register A storage component or a function isolation device between pipeline stages. A register is a D flip-flops (no others) Input signal Scan input Scan mode 0 1 reset D In D-flip-flop Q output clock enable Clock Q 9/11/2017 Unit 3 of TSEA26-2017 H1 8

control control input 1 input 2 input 3 input 4 input Copyright of Linköping University, all rights reserved Multiplexer and operand keeper MUX: Selecting one of multiple inputs as the output according to its selection control Operand keeper: a two-way multiplexer and a register. Can keep its value, though bus value changes. 00 01 10 11 1 0 output (a) multiplexer output (b) operand keeper 9/11/2017 Unit 3 of TSEA26-2017 H1 9

Ripple-Carry Adder, a simplified view x k 1 y k 1 x y k-2 k 2 x y 1 1 x y 0 0 c k c out c k 1 c k 2 c 2 c 1 FA FA... FA FA c 0 c in s k s k 1 s k 2 s 1 s 0 Advanced adders are designs for carry-accelerations Mar. 2011 Unit 3 of TSEA26-2017 H1 Slide 10

A 8b x 8b unsigned multiplier As a primitive, could be signed or unsigned! unsigned 8-bit multiplier 9/11/2017 For teachers using the book 11

Column line + Column bar line The row decoder Copyright of Linköping University, all rights reserved A single port 4b SRAM module A memory cell Row line The column decoder and R-W circuit (a) A memory cell 4 Data in-out bits (b) 128x4-bit signal port SRAM 9/11/2017 Unit 3 of TSEA26-2017 H1 12

PC and PM Instruction buffer AGU M2 RF M1 ALU MAC Copyright of Linköping University, all rights reserved ASIP top view (using components introduced in this lecture) Instruction decoder 9/11/2017 Unit 3 of TSEA26-2017 H1 13

Microarchitecture design 9/11/2017 Unit 3 of TSEA26-2017 H1 14

Function allocation Pre operations Arithmetic 1 Arithmetic 2 Selection Common operations 9/11/2017 Unit 3 of TSEA26-2017 H1 15

HW multiplexing Possible functions 1. A + C 2. A + D 3. B + C 4. B + D 5. A * C 6. A * D 7. B * C 8. B * D 9. SAT(A + C) 10. SAT(A + D) 11. SAT(B + C) 12. SAT(B + D) 13. SAT(A * C) 14. SAT(A * D) 15. SAT(B * C) 16. SAT(B * D) Pre processing A B C D Control[1] 0 1 0 MA MB 1 opa opb Kernel processing ADD MUL Post processing-1 Control[2] 0 1 MP1 result1 Post processing-2 Saturation Control[3] 0 1 MP2 Control[0] 9/11/2017 Unit 3 of TSEA26-2017 H1 16

Instruction fetch from PM Instruction decoding and memory addressing Memory MAC multiplication MAC accumulation OP fetch ALU and next PC Copyright of Linköping University, all rights reserved Pipeline scheduling 9/11/2017 Unit 3 of TSEA26-2017 H1 17

Often used legends in flowcharts Copyright of Linköping University, all rights reserved Combinational logic flowchart start Start or stop Action or process choice action or Document Decision action Input or output choice action or Subroutine Database end Case 9/11/2017 Unit 3 of TSEA26-2017 H1 18

Sequential logic flowchart Combinational flowchart: Specifies F2<=...F3<=... Combinational flowchart: Specifies F2<=...F3<=... Sequential flowchart Start: sequential logic with sync reset Sequential flowchart Start: sequential logic with async reset clk = 1 and clk event @(posedge clk) (Verilog) high rst_b low high rst_b low clk = 1 and clk event @(posedge clk) (Verilog) F2_r <= F2; F3_r <= F3; F2_r <= 8 b0; F3_r <= 8 b0; F2_r <= F2; F3_r <= F3; F2_r <= 8 b0; F3_r <= 8 b0; Stop: sequential logic with sync reset Stop: sequential logic with async reset 9/11/2017 (a) For teachers using the book (b) 19

Design a PC FSM reset PC <= PC in loop reset else PC <= 0 reset Hold reset To loop else else Default state: PC <= PC +1 Stack pop Jump taken else else PC <= Jump target address reset reset PC <= stack 9/11/2017 Unit 3 of TSEA26-2017 H1 20

Register file 9/11/2017 Unit 3 of TSEA26-2017 H1 21

General register file RF: A general register file, a group of registers as the lowest level computing / storage buffers, multi read and (one) write can be executed in parallel. DM: Data memory (with a single read/write port) can access one data at a time, read and write cannot be executed simultaneously. 9/11/2017 Unit 3 of TSEA26-2017 H1 22

Store circuit RF: register file Copyright of Linköping University, all rights reserved Write circuit from register file from memory 1 from memory 2 from ALU from ports... from MAC ctrl_reg_in register 1 register 2 register 3... register n Read circuit OPA ctrl_o_a ctrl_o_b OPB 9/11/2017 Unit 5 of TSEA26 23

Physical design: fan-in fan-out problem Fan-out of the control signal For the first stage: 16*16*2 = 512 From 32 registers in a register file Fan-out of the control signal For the second stage: 16*8*2 = 256 Fan-out of the control signal For the third stage: 16*4*2 = 128 Fan-out of the control signal For the fourth stage: 16*2*2 = 64 Fan-out of the control signal For the fivth stage: 16*1*2 = 32 9/11/2017 Unit 5 of TSEA26 24 Selected operand

PC and PM Instruction buffer AGU M2 RF M1 ALU MAC Copyright of Linköping University, all rights reserved Register file in an ASIP core Instruction decoder 9/11/2017 Unit 3 of TSEA26-2017 H1 25

ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 26

ALU in general ALU: Arithmetic and Logic Unit Arithmetic, Logic, Shift / rotate, others Get operands from RF and immediate Send result to RF One guard bit for single step computing, not for iterative computing 9/11/2017 Unit 3 of TSEA26-2017 H1 27

ALU Schematic A [15:0] B [15:0] Masker, guard, carry-in, and other preprocessing Logic unit Shift unit Saturation and flag processing Result [15:0] FA/FC, FS, FZ 9/11/2017 Unit 3 of TSEA26-2017 H1 28

ALU specification Arithmetic computing Logic computing Shift/rotate Special function Corner cases What HW components are needed How to share hardware Data in/out-processes Needs control signals The critical paths may not be here The HW cost/power may not be critical 9/11/2017 29

Pre-and-post processing Select operands: from RF, ID (immediate) Operand pre processing: Guard: sign extension, guard=sign: [16] = [15] Other pre-operations: mask, carry in Post operations Select result from AU, LU, shift unit, and others To generate carry-out or to saturate Flag operation: Flag computing and prediction 9/11/2017 Unit 3 of TSEA26-2017 H1 30

AU (arithmetic unit) in ALU A [15:0] B [15:0] Masker, guard, carry-in, and other preprocessing Logic unit Shift unit Saturation and flag processing Result [15:0] FA/FC, FS, FZ 9/11/2017 Unit 3 of TSEA26-2017 H1 31

How to design a full adder in an IP {A[15], A[15:0], 1 } {B[15],B[15:0],CIN} + 18b full adder FAO [17:0] Result [16:0] < =FAO [17:1] Full adder may have no carry in One guard bit We need 18b full adder LSB of 18b result will not be used MSB of 18b result will be the guard 9/11/2017 Unit 3 of TSEA26-2017 H1 32

Example: ALU instruction list Instructions Function OP CIN SAT ADD SAT A + B with saturation 000 00 1 ADD COUT A + B without saturation 000 00 0 ADD CIN SAT A + B + Cin with saturation 000 1x 1 ADD CIN COUT A + B + Cin without saturation 000 1x 0 SUB SAT, CMP A - B with saturation and compare 100 01 1 SUB COUT A - B without saturation 100 01 1 ABS(A) ABS(A) Absolute operation, saturation 111 00 1 NEG(A) NEG(A) Negate operation, saturation 101 00 1 INC(A) Increment and saturation 001 00 1 DEC(A) Decrement and saturation 011 00 1 AVG (A+B)/2 Average operation, saturation 010 00 1 9/11/2017 Unit 3 of TSEA26-2017 H1 33

Example: HW Function of each instruction A B A B A B A B A B A B + Saturation + + Saturation Cin + Cin + 1 Saturation + 1 a. SAT(A + B) b. A + B c. SAT(A + B + C) d. A + B +C e. SAT(A -B) f. A - B A MSB of A A B=1 A B=1 A B=-1 A B 0 1 + + + + + ARS g. ABS(A) h. NEG(A) i. INC(A) j. DEC(A) k. Average (A+B) 9/11/2017 Unit 3 of TSEA26-2017 H1 34

Example: Implement without HW sharing A B 1 1-1 S S S 1 1 >>1 S S S S S Cin a b c d e f g h i j k Control signal Result 9/11/2017 Unit 3 of TSEA26-2017 H1 35

Example: sharing step1: saturation A B A B A B + + + Saturation Saturation a b a. SAT(A+B) b. A + B Share the common part and multiplexing the rest 9/11/2017 Unit 3 of TSEA26-2017 H1 36

The difference is here a/b c/d Copyright of Linköping University, all rights reserved Example: sharing step2, carry-in A B A B A B C + Cin=0 + Cin=C + 0 Saturation Saturation Saturation a b c d n= a or b m=c or d a/c b/d Share the common part and multiplexing the rest 9/11/2017 Unit 3 of TSEA26-2017 H1 37

a/b c/d ab cd ef Copyright of Linköping University, all rights reserved Example: sharing step3, +/- A B A + B 1 0 + abcd ef 1 C 0 Saturation Saturation a/c b/d a/c b/d 9/11/2017 Unit 3 of TSEA26-2017 H1 38

00 01 1X 00 01 1x Copyright of Linköping University, all rights reserved Example: The final circuit likes this Instruction decoder in control path IF OP=101 C1<= 01 ELSEIF OP=111 C1<= 1x ELSE C1<= 00 C1 Simple arithmetic unit in datapath {A[15], A[15:0]} {B[15], B[15:0]} A[15] {16 b0, A[15]} 1-1 IF OP=111 C2<= 1xx ELSEIF OP=100 C2<= 001 ELSEIF OP=x01 C2<= 010 ELSEIF OP=011 C2<= 011 ELSE C2<= 000 C2 1 0 0 1 000 001 010 011 1XX C 1 C3<=CIN C3 + 0 IF OP=010 C4<= 1X ELSEIF SAT= 0 C4<= 00 ELSE C4<= 01 C4 SAT >>1 01 00 1x FLAG 9/11/2017 Unit 3 of TSEA26-2017 H1 39

Questions to discuss If there is no guard bit, what will be the result of ABS(A) when A= 1 (fractional) In what cases, an ALU output needs carry-out, and in what cases, an ALU output needs saturation For what operations, we need carry-in, and when we do not need carry-in Write RTL codes for result flags (sign and zero). Shall we use sign bit or guard bit as the sign flag? 9/11/2017 For teachers using the book 40

Concepts Copyright of Linköping University, all rights reserved Skills Review on Unit 3 System understanding Plan HW schematic HW coding Finite precision Micro architecture Register file ALU: Arithmetic & Logic Program flow control ALU cannot be used for iterative computing How to collect instructions, and extract functions from selected instructions. Finally map on hardware Hardware sharing design process, read my book! Write conflict concept Get the data at the same pipeline step, no wait Design for HW sharing ------ ------ Critical path Schematic design and plan hardware for IP reuse sharing Fanout Coding for IP reuse How to extract/require ALU/RF control signals according to the instruction specification and binary assembly codes 9/11/2017 Unit 3 of TSEA26-2017 H1 41

Self reading after the lecture Why microarchitecture design is essential Quick read chapter 10 /11, read chapter 12 Think about: How to specify microarchitecture on Y-chart How to design a RF with multi-write ports How to design an IP module using any kinds of RTL primitive and any synthesis tool based on the design of ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 42

Exciting time now! Let us discuss Whatever related to HW you want to discuss You will have the chance after each lecture (Fö), do take the chance! Prepare your Qs for the next time 9/11/2017 Unit 3 of TSEA26-2017 H1 43

LOGO Welcome to ask any questions you want to I can answer Or discuss together I want to know what you want Dake Liu, Room 556 coridoor B, Hus-B, phone 281256, dake.liu@liu.se