Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 2: Custom single-purpose processors

Similar documents
Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 2: Custom single-purpose processors

EE414 Embedded Systems Ch 2. Custom Single- Purpose Processors: Hardware

Optional: Building a processor from scratch

Register Transfer Methodology II

Outline. Register Transfer Methodology II. 1. One shot pulse generator. Refined block diagram of FSMD

Digital Design with FPGAs. By Neeraj Kulkarni

Microcomputers. Outline. Number Systems and Digital Logic Review

Recitation Session 6

Lecture 10: Combinational Circuits

END-TERM EXAMINATION

CS222: Processor Design

EEL 4783: Hardware/Software Co-design with FPGAs

BUILDING BLOCKS OF A BASIC MICROPROCESSOR. Part 1 PowerPoint Format of Lecture 3 of Book

Injntu.com Injntu.com Injntu.com R16

Register-Level Design

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

Introduction to Digital Logic Missouri S&T University CPE 2210 Multipliers/Dividers

Introduction to Verilog design. Design flow (from the book) Hierarchical Design. Lecture 2

Chapter 10. case studies in sequential logic design

CONTENTS CHAPTER 1: NUMBER SYSTEM. Foreword...(vii) Preface... (ix) Acknowledgement... (xi) About the Author...(xxiii)

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University

Code No: R Set No. 1

Introduction to Verilog design. Design flow (from the book)

Chapter 4. The Processor

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 2: Custom single-purpose processors.

Systems Programming. Lecture 2 Review of Computer Architecture I

Mark Redekopp, All rights reserved. EE 352 Unit 8. HW Constructs

The Need of Datapath or Register Transfer Logic. Number 1 Number 2 Number 3 Number 4. Numbers from 1 to million. Register

(ii) Simplify and implement the following SOP function using NOR gates:

Mid-Term Exam Solutions

CAD4 The ALU Fall 2009 Assignment. Description

6: Combinational Circuits

Why Should I Learn This Language? VLSI HDL. Verilog-2

Tailoring the 32-Bit ALU to MIPS

Principles of Digital Techniques PDT (17320) Assignment No State advantages of digital system over analog system.

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 3 DLD P VIDYA SAGAR

Reference Sheet for C112 Hardware

Davide Rossi DEI University of Bologna AA

CS8803: Advanced Digital Design for Embedded Hardware

Problem 4 The order, from easiest to hardest, is Bit Equal, Split Register, Replace Under Mask.

ELCT 501: Digital System Design

An Introduction to the Logic. Silicon Chips

Digital Design Using Digilent FPGA Boards -- Verilog / Active-HDL Edition

R10. II B. Tech I Semester, Supplementary Examinations, May

Digital Logic Design Exercises. Assignment 1

HANSABA COLLEGE OF ENGINEERING & TECHNOLOGY (098) SUBJECT: DIGITAL ELECTRONICS ( ) Assignment

COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING QUESTION BANK SUBJECT CODE & NAME: EC 1312 DIGITAL LOGIC CIRCUITS UNIT I

6.1 Combinational Circuits. George Boole ( ) Claude Shannon ( )

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

1. Mark the correct statement(s)

Processor (I) - datapath & control. Hwansoo Han

Computer Architecture (TT 2012)

CS/COE 0447 Example Problems for Exam 2 Spring 2011

10EC33: DIGITAL ELECTRONICS QUESTION BANK

ECEN 468 Advanced Logic Design

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

1. Draw general diagram of computer showing different logical components (3)

Least Common Multiple (LCM)

Week 7: Assignment Solutions

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-II COMBINATIONAL CIRCUITS

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor

DIGITAL ELECTRONICS. Vayu Education of India

Hardware Description Language VHDL (1) Introduction

CS61C : Machine Structures

COMBINATIONAL LOGIC CIRCUITS

ECE 448 Lecture 3. Combinational-Circuit Building Blocks. Data Flow Modeling of Combinational Logic

Music. Numbers correspond to course weeks EULA ESE150 Spring click OK Based on slides DeHon 1. !

The MIPS Processor Datapath

Finite State Machine with Datapath

3. The high voltage level of a digital signal in positive logic is : a) 1 b) 0 c) either 1 or 0

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation

In this lecture, we will focus on two very important digital building blocks: counters which can either count events or keep time information, and

Custom single-purpose processors: Hardware. 4.1 Introduction. 4.2 Combinational logic design 4-1

SIR C.R.REDDY COLLEGE OF ENGINEERING, ELURU DEPARTMENT OF INFORMATION TECHNOLOGY LESSON PLAN

Lecture (05) Boolean Algebra and Logic Gates

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Code No: R Set No. 1

UNIT 6 CIRCUIT DESIGN

CSE 140L Final Exam. Prof. Tajana Simunic Rosing. Spring 2008

Arithmetic Logic Unit

SRI SUKHMANI INSTITUTE OF ENGINEERING AND TECHNOLOGY, DERA BASSI (MOHALI)

Chapter 2 Logic Gates and Introduction to Computer Architecture

VALLIAMMAI ENGINEERING COLLEGE. SRM Nagar, Kattankulathur DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING EC6302 DIGITAL ELECTRONICS

REGISTER TRANSFER LANGUAGE

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CS Spring Combinational Examples - 1

Verilog. What is Verilog? VHDL vs. Verilog. Hardware description language: Two major languages. Many EDA tools support HDL-based design

Chapter 6. CMOS Functional Cells

LABORATORY MANUAL VLSI DESIGN LAB EE-330-F

ECE 448 Lecture 3. Combinational-Circuit Building Blocks. Data Flow Modeling of Combinational Logic

University of Toronto Faculty of Applied Science and Engineering Edward S. Rogers Sr. Department of Electrical and Computer Engineering

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Ch. 5 : Boolean Algebra &

ECE 4514 Digital Design II. Spring Lecture 7: Dataflow Modeling

Format. 10 multiple choice 8 points each. 1 short answer 20 points. Same basic principals as the midterm

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

IA Digital Electronics - Supervision I

ECE 250 / CPS 250 Computer Architecture. Processor Design Datapath and Control

Transcription:

Hardware/Software Introduction Chapter 2: Custom single-purpose processors

Outline Introduction Combinational logic Sequential logic Custom single-purpose processor design RT-level custom single-purpose processor design 2

Introduction Processor Digital circuit that performs a computation tasks Controller and datapath General-purpose: variet of computation tasks Single-purpose: one particular computation task Custom single-purpose: non-standard task A custom single-purpose processor ma be Fast, small, low power But, high NRE, longer time-to-market, less fleible CCD lens Digital camera chip A2D JPEG codec DMA controller CCD preprocessor Microcontroller Piel coprocessor Displa ctrl D2A Multiplier/Accum Memor controller ISA bus interface UART LCD ctrl 3

CMOS transistor on silicon Transistor The basic electrical component in digital sstems Acts as an on/off switch Voltage at gate controls whether current flows from source to drain Don t confuse this gate with a logic gate gate source Conducts if gate= drain IC package IC source gate oide channel drain Silicon substrate 4

CMOS transistor implementations Complementar Metal Oide Semiconductor We refer to logic levels Tpicall is V, is 5V Two basic CMOS tpes nmos conducts if gate= pmos conducts if gate= Hence complementar Basic gates Inverter, NAND, NOR gate source Conducts gate source Conducts if gate= if gate= drain drain nmos pmos F = ' F = ()' F = (+)' inverter NAND gate NOR gate 5

Basic logic gates F = Driver F F F = AND F F F = + OR F F F = XOR F F F = Inverter F F F = ( ) NAND F F F = (+) NOR F F F = XNOR F F 6

Combinational logic design A) Problem description B) Truth table C) Output equations is if a is to, or b and c are. z is if b or c is to, but not both, or if all are. D) Minimized output equations a bc z a bc = a + bc Inputs Outputs a b c z a b c = a'bc + ab'c' + ab'c + abc' + abc z =a'b'c+a'bc'+ab'c+abc'+abc E) Logic Gates z z = ab + b c + bc 7

Combinational components I(m-) I I n I I(log n -) A n B n A n B n A n B S S(log m) n-bit, m Multipleor n log n n Decoder n-bit Adder n n-bit Comparator n bit, m function S ALU n S(log m) O O(n-) O O carr sum less equal greater O O = I if S=.. I if S=.. I(m-) if S=.. O = if I=.. O = if I=.. O(n-) = if I=.. sum = A+B (first n bits) carr = (n+) th bit of A+B less = if A<B equal = if A=B greater= if A>B O = A op B op determined b S. With enable input e all O s are if e= With carr-in input Ci sum = A + B + Ci Ma have status outputs carr, zero, etc. 8

Sequential components n I load clear n-bit Register n shift I n-bit Shift register Q n-bit Counter n Q Q Q = if clear=, I if load= and clock=, Q(previous) otherwise. Q = lsb - Content shifted - I stored in msb Q = if clear=, Q(prev)+ if count= and clock=. 9

Sequential logic design A) Problem Description C) Implementation Model D) State Table (Moore-tpe) You want to construct a clock divider. Slow down our preeisting clock so that ou output a for ever four clock ccles a= B) State Diagram = a= = 3 a= a Combinational logic Q Q State register I I I I Inputs Outputs Q Q a I I a= a= a= a= 2 = = a= Given this implementation model Sequential logic design quickl reduces to combinational logic design

Sequential logic design (cont.) E) Minimized Output Equations F) Combinational Logic I QQ a a I = Q Qa + Qa + QQ I QQ a I = Qa + Q a I a QQ = QQ Q Q I

Custom single-purpose processor basic model eternal control inputs controller datapath control inputs eternal data inputs datapath controller net-state and control logic datapath registers datapath control outputs state register functional units eternal control outputs eternal data outputs controller and datapath a view inside the controller and datapath 2

Eample: greatest common divisor First create algorithm Convert algorithm to comple state machine Known as FSMD: finitestate machine with datapath Can use templates to perform such conversion (a) black-bo view go_i _i GCD d_o _i (b) desired functionalit : int, ; : while () { 2: while (!go_i); 3: = _i; 4: = _i; 5: while (!= ) { 6: if ( < ) 7: = - ; else 8: = - ; } 9: d_o = ; } : 2: 2-J: 5: 6: 7: = - 8: = - -J: 3: 4: 6-J: 5-J: 9: < = _i = _i d_o =!go_i!=!!(!go_i)!(!=)!(<) (c) state diagram 3

State diagram templates Assignment statement a = b net statement Loop statement while (cond) { loop-bodstatements } net statement Branch statement if (c) c stmts else if c2 c2 stmts else other stmts net statement a = b C: cond!cond C: c!c*c2!c*!c2 net statement loop-bodstatements c stmts c2 stmts others J: J: net statement net statement 4

Creating the datapath Create a register for an declared variable Create a functional unit for each arithmetic operation Connect the ports, registers and functional units Based on reads and writes Use multipleors for multiple sources Create unique identifier for each datapath component control input and output : 2: 2-J: 3: 4: 5: 6: 7: = - 8: = - -J: 6-J: 5-J: 9: < = _i = _i d_o =!go_i!=!!(!go_i)!(!=)!(<) _i _i _sel n-bit 2 n-bit 2 _sel _ld : : _ld!= < subtractor 5:!= 6: < 8: - _neq lt_ d_ld Datapath subtractor 7: - 9: d d_o 5

Creating the controller s FSM : 2: 2-J: 3: 4: 5: 6: 7: = - 8: = - 6-J: 5-J: 9: < = _i = _i d_o =!go_i!=!!(!go_i)!(!=)!(<) Controller : 2: 2-J: 5: 6: _neq_!_neq lt_!_lt_ 7: _sel = 8: _sel = _ld = _ld = 5-J: 9: 3: 4:!go_i _sel = _ld = _sel = _ld = d_ld =!!(!go_i) 6-J: go_i Same structure as FSMD Replace comple actions/conditions with datapath configurations _sel _sel _ld _ld 5:!= 6: < _neq lt_ d_ld _i!= n-bit 2 _i : : < n-bit 2 subtractor 8: - Datapath subtractor 7: - 9: d -J: -J: d_o 6

Splitting into a controller and datapath Controller implementation model go_i Combinational logic _sel _sel _ld _ld _neq lt_ Controller : 2: 2-J: 3:!go_i _sel = _ld =!!(!go_i) go_i _sel _sel _ld _ld _i _i n-bit 2 n-bit 2 : : (b) Datapath Q3 I3 Q2 Q Q State register I2 I I d_ld 4: 5: 6: _sel = _ld = _neq_= _neq_= _lt_= _lt_= 7: _sel = 8: _sel = _ld = _ld = 6-J:!= < 5:!= 6: < _neq lt_ d_ld subtractor 8: - subtractor 7: - 9: d d_o 5-J: 9: -J: d_ld = 7

Controller state table for the GCD eample Inputs Outputs Q3 Q2 Q Q _neq _lt_ go_i I3 I2 I I _sel _sel _ld _ld d_ld _ * * * X X * * X X * * X X * * * X X * * * X * * * X * * X X * * X X * * X X * * X X * * * X * * * X * * * X X * * * X X * * * X X * * * X X * * * X X * * * X X * * * X X 8

Completing the GCD custom single-purpose processor design We finished the datapath We have a state table for the net state and control logic All that s left is combinational logic design This is not an optimized design, but we see the basic steps controller net-state and control logic state register datapath registers functional units a view inside the controller and datapath 9

RT-level custom single-purpose processor design We often start with a state machine Rather than algorithm Ccle timing often too central to functionalit Problem Specification Sende r rd_in clock data_in(4) Bridge A single-purpose processor that converts two 4-bit inputs, arriving one at a time over data_in along with a rd_in pulse, into one 8-bit output on data_out along with a rd_out pulse. rd_out data_out(8) Rece iver Eample Bus bridge that converts 4-bit bus to 8-bit bus Start with FSMD Known as register-transfer (RT) level Eercise: complete the design FSMD rd_in= rd_in= WaitFirst4 rd_in= WaitSecond4 Send8Start data_out=data_hi & data_lo rd_out= Bridge RecFirst4Start data_lo=data_in rd_in= rd_in= RecSecond4Start data_hi=data_in rd_in= Send8End rd_out= rd_in= RecFirst4End rd_in= RecSecond4End Inputs rd_in: bit; data_in: bit[4]; Outputs rd_out: bit; data_out:bit[8] Variables data_lo, data_hi: bit[4]; 2

RT-level custom single-purpose processor design (cont ) (a) Controller rd_in= WaitFirst4 rd_in= WaitSecond4 Bridge rd_in= RecFirst4Start data_lo_ld= rd_in= rd_in= RecSecond4Start data_hi_ld= rd_in= RecFirst4End rd_in= RecSecond4End Send8Start data_out_ld= rd_out= Send8End rd_out= rd_in clk data_in(4) to all registers (b) Datapath data_out_ld data_hi_ld data_hi data_out data_lo data_lo_ld rd_out data_out 2

Optimizing single-purpose processors Optimization is the task of making design metric values the best possible Optimization opportunities original program FSMD datapath FSM 22

Optimizing the original program Analze program attributes and look for areas of possible improvement number of computations size of variable time and space compleit operations used multiplication and division ver epensive 23

Optimizing the original program (cont ) original program : int, ; : while () { 2: while (!go_i); 3: = _i; 4: = _i; 5: while (!= ) { 6: if ( < ) 7: = - ; else 8: = - ; } 9: d_o = ; } GCD(42, 8) - 9 iterations to complete the loop replace the subtraction operation(s) with modulo operation in order to speed up program and values evaluated as follows : (42, 8), (43, 8), (26,8), (8,8), (, 8), (2,8), (2,6), (2,4), (2,2). optimized program : int,, r; : while () { 2: while (!go_i); // must be the larger number 3: if (_i >= _i) { 4: =_i; 5: =_i; } 6: else { 7: =_i; 8: =_i; } 9: while (!= ) { : r = % ; : = ; 2: = r; } 3: d_o = ; } GCD(42,8) - 3 iterations to complete the loop and values evaluated as follows: (42, 8), (8,2), (2,) 24

Optimizing the FSMD Areas of possible improvements merge states states with constants on transitions can be eliminated, transition taken is alread known states with independent operations can be merged separate states states which require comple operations (a*b*c*d) can be broken into smaller states to reduce hardware size scheduling 25

Optimizing the FSMD (cont.) : 2: 2-J: 3: int, ;!!(!go_i)!go_i = _i original FSMD eliminate state transitions have constant values merge state 2 and state 2J no loop operation in between them optimized FSMD int, ; 2: 3: 5: go_i = _i = _i!go_i 4: 5: = _i!(!=) merge state 3 and state 4 assignment operations are independent of one another < > 7: = - 8: = - 7: 6: = - 6-J:!= <!(<) 8: = - merge state 5 and state 6 transitions from state 6 can be done in state 5 eliminate state 5J and 6J transitions from each state can be done from state 7 and state 8, respectivel 9: d_o = 5-J: 9: d_o = eliminate state -J transition from state -J can be done directl from state 9 -J: 26

Optimizing the datapath Sharing of functional units one-to-one mapping, as done previousl, is not necessar if same operation occurs in different states, the can share a single functional unit Multi-functional units ALUs support a variet of operations, it can be shared among operations occurring in different states 27

Optimizing the FSM State encoding task of assigning a unique bit pattern to each state in an FSM size of state register and combinational logic var can be treated as an ordering problem State minimization task of merging equivalent states into a single state state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the net same state 28

Summar Custom single-purpose processors Straightforward design techniques Can be built to eecute algorithms Tpicall start with FSMD CAD tools can be of great assistance 29