Custom single-purpose processors: Hardware. 4.1 Introduction. 4.2 Combinational logic design 4-1

Similar documents
CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 2: Custom single-purpose processors

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 2: Custom single-purpose processors

Chapter 4 The Datapath

Elementary Educational Computer

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

Behavioral Modeling in Verilog

Chapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Module Instantiation. Finite State Machines. Two Types of FSMs. Finite State Machines. Given submodule mux32two: Instantiation of mux32two

Computers and Scientific Thinking

EE414 Embedded Systems Ch 2. Custom Single- Purpose Processors: Hardware

Chapter 5. Functions for All Subtasks. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

Digital System Design

UNIVERSITY OF MORATUWA

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Appendix D. Controller Implementation

Data Structures and Algorithms. Analysis of Algorithms

Lecture 1: Introduction and Strassen s Algorithm

One advantage that SONAR has over any other music-sequencing product I ve worked

Lecture 3. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Analysis of Algorithms

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff

Data diverse software fault tolerance techniques

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

. Written in factored form it is easy to see that the roots are 2, 2, i,

Python Programming: An Introduction to Computer Science

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

Solutions to Final COMS W4115 Programming Languages and Translators Monday, May 4, :10-5:25pm, 309 Havemeyer

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 2: Custom single-purpose processors.

Linear Time-Invariant Systems

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Threads and Concurrency in Java: Part 1

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Threads and Concurrency in Java: Part 1

L6: FSMs and Synchronization

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen

Computer Systems - HS

Pattern Recognition Systems Lab 1 Least Mean Squares

Description of Single Cycle Computer (SCC)

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

BOOLEAN MATHEMATICS: GENERAL THEORY

Chapter 3 Classification of FFT Processor Algorithms

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018

Ones Assignment Method for Solving Traveling Salesman Problem

Analysis of Algorithms

Chapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

How do we evaluate algorithms?

Modern Systems Analysis and Design Seventh Edition

Image Segmentation EEE 508

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Chapter 3. Floating Point Arithmetic

Introduction CHAPTER Computers

6.854J / J Advanced Algorithms Fall 2008

Chapter 2. C++ Basics. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

1. SWITCHING FUNDAMENTALS

Improving Template Based Spike Detection

Human-Computer Interaction IS4300

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

Using the Keyboard. Using the Wireless Keyboard. > Using the Keyboard

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

The number n of subintervals times the length h of subintervals gives length of interval (b-a).

Evaluation scheme for Tracking in AMI

A collection of open-sourced RISC-V processors

Python Programming: An Introduction to Computer Science

Guide to Applying Online

Chapter 3. More Flow of Control. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Optimum Solution of Quadratic Programming Problem: By Wolfe s Modified Simplex Method

IMP: Superposer Integrated Morphometrics Package Superposition Tool

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Section 7.2: Direction Fields and Euler s Methods

Term Project Report. This component works to detect gesture from the patient as a sign of emergency message and send it to the emergency manager.

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Weston Anniversary Fund

Examples and Applications of Binary Search

The Magma Database file formats

Computer Architecture

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits

COP4020 Programming Languages. Compilers and Interpreters Prof. Robert van Engelen

Security of Bluetooth: An overview of Bluetooth Security

OCR Statistics 1. Working with data. Section 3: Measures of spread

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

8.0 Resolving Multi-Signal Drivers

Transcription:

Chapter 4: Custom sigle-purpose processors: Hardware 4- Chapter 4 Custom sigle-purpose processors: Hardware 4. Itroductio As metioed i the previous chapter, a sigle-purpose processor is a digital sstem iteded to solve a specific computatio task. While a maufacturer builds a stadard sigle-purpose processor for use i a variet of applicatios, we build a custom siglepurpose processor to eecute a specific task withi our embedded sstem. A embedded sstem desiger choosig to use a custom sigle-purpose, rather tha a geeral-purpose, processor to implemet part of a sstem s fuctioalit ma achieve several beefits, similar to some of those of the previous chapter. First, performace ma be fast, due to fewer clock ccles resultig from a customized datapath, ad due to shorter clock ccles resultig from simpler fuctioal uits, less multipleors, or simpler cotrol logic. Secod, size ma be small, due to a simpler datapath ad o program memor. I fact, the processor ma be faster ad smaller tha a stadard oe implemetig the same fuctioalit, sice we ca optimize the implemetatio for our particular task. However, because we probabl wo't maufacture as ma of the custom processor as a stadard processor, we ma ot be able to ivest as much NRE, uless the embedded sstem we are buildig will be sold i large quatities or does ot have tight cost costraits. This fact could actuall pealize performace ad size. I this chapter, we describe basic techiques for desigig custom processors. We start with a review of combiatioal ad sequetial desig, ad the describe a method for covertig programs to custom sigle-purpose processors. 4.2 Combiatioal logic desig A trasistor is the basic electrical compoet of digital sstems. Combiatios of trasistors form more abstract compoets called logic gates, which desigers primaril use whe buildig digital sstems. Thus, we begi with a short descriptio of trasistors before discussig logic desig. A trasistor acts as a simple o/off switch. Oe tpe of trasistor (CMOS -- Complemetar Metal Oide Semicoductor) is show i Figure 4.(a). The gate (ot to be cofused with logic gate) cotrols whether or ot curret flows from the source to the drai. Whe a high voltage (tpicall +5 Volts, which we'll refer to as logic ) is applied to the gate, the trasistor coducts, so curret flows. Whe low voltage (which we'll refer to as logic, tpicall groud, which is draw as several horizotal lies of decreasig width) is applied to the gate, the trasistor does ot coduct. We ca also build a trasistor with the opposite fuctioalit, illustrated i i Figure 4.(b). Whe logic is applied to the gate, the trasistor coducts, ad whe logic is applied, the trasistor does ot coduct. Give these two basic trasistors, we ca easil build a circuit whose output iverts its gate iput, as show i i Figure 4.(c). Whe the iput is logic, the top trasistor coducts (ad the bottom does ot), so logic appears at the output F. We ca also easil build a circuit whose output is logic whe at least oe of its iputs is logic, as show i Figure 4.(d). Whe at least oe of the iputs ad is logic, the at least oe of the top trasistors coducts (ad the bottom trasistors do ot), so logic appears at F. If both iputs are logic, the either of the top trasistors coducts, but both of the bottom oes do, so logic appears at F. Likewise, we ca easil build a circuit whose output is logic whe both of its iputs are logic, as illustrated i Figure 4.(e). The three circuits show implemet three basic logic gates: a iverter, a NAND gate, ad a NOR gate. Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4-2 Figure 4.: CMOS trasistor implemetatios of some basic logic gates: (a) MOS trasistor, (b) pmos trasistor, (c) iverter, (d) NAND gate, (e) NOR gate. gate gate source Coducts if gate=+5v drai (a) source Coducts if gate=v +5V F = +5V F = () +5V F = (+) drai (b) (c) (d) (e) Figure 4.2: Basic logic gates F F F = Driver F = AND F F F F = + OR F F = XOR F F F = Iverter F F F F = ( ) NAND F F F = (+) NOR F F F = ℵ XNOR F Digital sstem desigers usuall work with logic gates, ot trasistors. Figure 4.2 describes 8 basic logic gates. Each gate is represeted smbolicall, with a Boolea equatio, ad with a truth table. The truth table has iputs o the left, ad output o the right. The AND gate outputs if ad ol if both iputs are. The OR gate outputs if ad ol if at least oe of the iputs is. The XOR (eclusive-or) gate outputs if ad ol if eactl oe of its two iputs is. The NAND, NOR, ad XNOR gates output the complemet of AND, OR, ad XOR, respectivel. As ou might have oticed from our trasistor implemetatios, the NAND ad NOR gates are actuall simpler to build tha AND ad OR gates. A combiatioal circuit is a digital circuit whose output is purel a fuctio of its curret iputs; such a circuit has o memor of past iputs. We ca appl a simple techique to desig a combiatioal circuit usig our basic logic gates, as illustrated i Figure 4.3. We start with a problem descriptio, which describes the outputs i terms of the iputs. We traslate that descriptio to a truth table, with all possible combiatios of iput values o the left, ad desired output values o the right. For each output colum, we ca derive a output equatio, with oe term per row. However, we ofte wat to miimize the logic gates i the circuit. We ca miimize the output equatios b algebraicall maipulatig the equatios. Alterativel, we ca use Karaugh maps, as Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4-3 Figure 4.3: Combiatioal logic desig. (a) Problem descriptio is if a is equal to, or b ad c is equal to. z is if b or c is equal to, but ot both. (b) Truth table Iputs Outputs a b c z (d) Miimized output equatios a bc = a + bc z a bc z = ab + b c + bc (c) Output equatios a b c = a'bc + ab'c' + ab'c + abc' + abc z z = a'b'c + a'bc' + ab'c + abc' + abc show i the figure. Oce we ve obtaied the desired output equatios (miimized or ot), we ca draw the circuit diagram. Although we ca desig all combiatioal circuits i the above maer, large circuits would be ver comple to desig. For eample, a circuit with 6 iputs would have 2 6, or 64K, rows i its truth table. Oe wa to reduce the compleit is to use compoets that are more abstract tha logic gates. Figure 4.4 shows several such combiatioal compoets. We ow describe each briefl. A multipleor, sometimes called a selector, allows ol oe of its data iputs Im to pass through to the output O. Thus, a multipleor acts much like a railroad switch, allowig ol oe of multiple iput tracks to coect to a sigle output track. If there are m data iputs, the there are log 2 (m) select lies S, ad we call this a m-b- multipleor (m data iputs, oe data output). The biar value of S determies which data iput passes through;... meas I ma pass,... meas I ma pass,... meas I2 ma pass, ad so o. For eample, a 8 multipleor has 8 data iputs ad thus 3 select lies. If those three select lies have values of, the I6 will pass through to the output. So if I6 is, the the output would be ; if I6 is, the the output would be. We commol use a more comple device called a -bit multipleor, i which each data iput, as well as the output, cosists of lies. Suppose the previous eample used a 4-bit 8 multipleor. Thus, if I6 is, the the output would be. Note that does ot affect the umber of select lies. Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4-4 Figure 4.4: Combiatioal compoets. I(m-) I I I(log -) I A B A B A B S S(log m) -bit, m Multipleor log Decoder -bit Adder -bit Comparator bit, m fuctio ALU S S(log m) O O(-) O O carr sum less equal greater O O = I if S=.. I if S=.. Im if S=.. O = if I=.. O = if I=.. O = if I=.. sum = A+B (first bits) carr = (+) th bit of A+B less = if A<B equal = if A=B greater= if A>B O = A op B op determied b S. eable iput e all O s if e= carr-i iputci sum=a+b+ci status outputs carr, zero, etc. - meas wires A decoder coverts its biar iput I ito a oe-hot output O. "Oe-hot" meas that eactl oe of the output lies ca be at a give time. Thus, if there are outputs, the there must be log 2 () iputs. We call this a log 2 () decoder. For eample, a 38 decoder has 3 iputs ad 8 outputs. If the iput is, the the output O will be. If the iput is, the the output O would be, ad so o. A commo feature o a decoder is a etra iput called eable. Whe eable is, all outputs are. Whe eable is, the decoder fuctios as before. A adder adds two -bit biar iputs A ad B, geeratig a -bit output sum alog with a output carr. For eample, a 4-bit adder would have a 4-bit A iput, a 4-bit B iput, a 4-bit sum output, ad a -bit carr output. If A is ad B is, the sum would be ad carr would be. A comparator compares two -bit biar iputs A ad B, geeratig outputs that idicate whether A is less tha, equal to, or greater tha B. If A is ad B is, the less would be, equal would be, ad greater would be. A ALU (arithmetic-logic uit) ca perform a variet of arithmetic ad logic fuctios o its -bit iputs A ad B. The select lies S choose the curret fuctio; if there are m possible fuctios, the there must be at least log 2 (m) select lies. Commo fuctios iclude additio, subtractio, AND, ad OR. Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4-5 Figure 4.5: Sequetial compoets. I load clear -bit Register shift I -bit Shift register Q cout clear -bit Couter Q Q Q = if clear=, I if load= ad clock 8, I(prev) else. Q = if clear=, Q(prev)+ if cout= ad clock 8. 4.3 Sequetial logic desig A sequetial circuit is a digital circuit whose outputs are a fuctio of the curret as well as previous iput values. I other words, sequetial logic possesses memor. Oe of the most basic sequetial circuits is the flip-flop. A flip-flop stores a sigle bit. The simplest tpe of flip-flop is the D flip-flop. It has two iputs: D ad clock. Whe clock is, the value of D is stored i the flip-flop, ad that value appears at a output Q. Whe clock is, the value of D is igored; the output Q maitais its value. Aother tpe of flip-flop is the SR flip-flop, which has three iputs: S, R ad clock. Whe clock is, the previousl stored bit is maitaied ad appears at output Q. Whe clock is, the iputs S ad R are eamied. If S is, a is stored. If R is, a is stored. If both are, there s o chage. If both are, behavior is udefied. Thus, S stads for set, ad R for reset. Aother flip-flop tpe is a JK flip-flop, which is the same as a SR flip-flop ecept that whe both J ad K are, the stored bit toggles from to or to. To prevet uepected behavior from sigal glitches, flip-flops are tpicall desiged to be edgetriggered, meaig the ol pa attetio to their o-clock iputs whe the clock is risig from to, or alterativel whe the clock is fallig from to. Just as we used more abstract combiatioal compoets to implemet comple combiatioal sstems, we also use more abstract sequetial compoets for comple sequetial sstems. Figure 4.5 illustrates several sequetial compoets, which we ow describe. A register stores bits from its -bit data iput I, with those stored bits appearig at its output O. A register usuall has at least two cotrol iputs, clock ad load. For a risig-edge-triggered register, the iputs I are ol stored whe load is ad clock is risig from to. The clock iput is usuall draw as a small triagle, as show i the figure. Aother commo register cotrol iput is clear, which resets all bits to, regardless of the value of I. Because all bits of the register ca be stored i parallel, we ofte refer to this tpe of register as a parallel-load register, to distiguish it from a shift register, which we ow describe. A shift register stores bits, but these bits caot be stored i parallel. Istead, the must be shifted ito the register seriall, meaig oe bit per clock edge. A shift register has a oe-bit data iput I, ad at least two cotrol iputs clock ad shift. Whe clock is risig ad shift is, the value of I is stored i the () th bit, while the () th bit is stored i the (-) th bit, ad likewise, util the secod bit is stored i the first bit. The first bit is tpicall shifted out, meaig it appears over a output Q. A couter is a register that ca also icremet (add biar ) to its stored biar value. I its simplest form, a couter has a clear iput, which resets all stored bits to, Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4-6 ad a cout iput, which eables icremetig o the clock edge. A couter ofte also has a parallel load data iput ad associated cotrol sigal. A commo couter feature is both up ad dow coutig (icremetig ad decremetig), requirig a additioal cotrol iput to idicate the cout directio. The cotrol iputs discussed above ca be either schroous or aschroous. A schroous iput s value ol has a effect durig a clock edge. A aschroous iput s value affects the circuit idepedet of the clock. Tpicall, clear cotrol lies are aschroous. Sequetial logic desig ca be achieved usig a straightforward techique, whose steps are illustrated i Figure 4.. We agai start with a problem descriptio. We traslate this descriptio to a state diagram. We describe state diagrams further i a later chapter. Briefl, each state represets the curret "mode" of the circuit, servig as the circuit s memor of past iput values. The desired output values are listed et to each state. The iput coditios that cause a trasistio from oe state to aother are show et to each arc. Each arc coditio is implicitl AND ed with a risig (or fallig) clock edge. I other words, all iputs are schroous. State diagrams ca also describe aschroous sstems, but we do ot cover such sstems i this book, sice the are ot commo. We will implemet this state diagram usig a register to store the curret state, ad combiatioal logic to geerate the output values ad the et state. We assig each state with a uique biar value, ad we the create a truth table for the combiatioal logic. The iputs for the combiatioal logic are the state bits comig from the state register, ad the eteral iputs, so we list all combiatios of these iputs o the left side of the table. The outputs for the combiatioal logic are the state bits to be loaded ito the register o the et clock edge (the et state), ad the eteral output values, so we list desired values of these outputs for each iput combiatio o the right side of the table. Because we used a state diagram for which outputs were a fuctio of the curret state ol, ad ot of the iputs, we list a eteral output value ol for each possible state, igorig the eteral iput values. Now that we have a truth table, we proceed with combiatioal logic desig as described earlier, b geeratig miimized output equatios, ad the drawig the combiatioal logic circuit. 4.4 Custom sigle-purpose processor desig We ca appl the above combiatioal ad sequetial logic desig techiques to build datapath compoets ad cotrollers. Therefore, we have earl all the kowledge we eed to build a custom sigle-purpose processor for a give program, sice a processor cosists of a cotroller ad a datapath. We ow describe a techique for buildig such a processor. We begi with a sequetial program we must implemet. Figure 4.3 provides a eample based o computig a greatest commo divisor (GCD). Figure 4.3(a) shows a black-bo diagram of the desired sstem, havig _i ad _i data iputs ad a data output d_i. The sstem s fuctioalit is straightforward: the output should represet the GCD of the iputs. Thus, if the iputs are 2 ad 8, the output should be 4. If the iputs are 3 ad 5, the output should be. Figure 4.3(b) provides a simple program with this fuctioalit. The reader might trace this program s eecutio o the above eamples to verif that the program does ideed compute the GCD. Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4-7 (a) Problem descriptio You wat to costruct a clock divider. Slow dow our pre-eistig clock so that ou output a for ever four clock ccles. Figure 4.: Sequetial logic desig. a (e) Miimized output equatios QQ I a= (b) State diagram a= = = a= 3 a= a= a= 2 = = a= a= (c) Implemetatio model a Combiatioal logic I I Q Q State register I = Q Qa + Qa + QQ I QQ a I = Qa + Q a a = QQ I I a (f) Combiatioal Logic (d) State table (Moore-tpe) Iputs Outputs Q Q a I I Q Q I I To begi buildig our sigle-purpose processor implemetig the GCD program, we first covert our program ito a comple state diagram, i which states ad arcs ma iclude arithmetics epressios, ad these epressios ma use eteral iputs ad outputs or variables. I cotrast, our earlier state diagrams ol icluded boolea epressios, ad these epressios could ol use eteral iputs ad outputs, ot Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4-8 Figure 4.2: Templates for creatig a state diagram from a program. Assigmet statemet a = b et statmet Loop statemet while (cod) { loop-bodstatemets } et statemet Brach statemet if (c) c stmts else if c2 c2 stmts else other stmts et statemet a = b!cod C: C: cod c!c*c2!c*!c2 et statemet loop-bodstatemets c stmts c2 stmts others J: J: et statemet et statemet variables. Thus, these more comple state diagram looks like a sequetial program i which statemets have bee scheduled ito states. We ca use templates to covert a program to a state diagram, as illustrated i Figure 4.2. First, we classif each statemet as a assigmet statemet, loop statemet, or brach (if-the-else or case) statemet. For a assigmet statemet, we create a state with that statemet as its actio. We add a arc from this state to the state for the et statemet, whatever tpe it ma be. For a loop statemet, we create a coditio state C ad a joi state J, both with o actios. We add a arc with the loop s coditio from the coditio state to the first statemet i the loop bod. We add a secod arc with the complemet of the loop s coditio from the coditio state to the et statemet after the loop bod. We also add a arc from the joi state back to the coditio state. For a brach statemet, we create a coditio state C ad a joi state J, both with o actios. We add a arc with the first brach s coditio from the coditio state to the brach s first statemet. We add aother arc with the complemet of the first brach s coditio AND ed with the secod braches coditio from the coditio state to the braches first statemet. We repeat this for each brach. Fiall, we coect the arc leavig the last statemet of each brach to the joi state, ad we add a arc from this state to the et statemet s state. Usig this template approach, we covert our GCD program to the comple state diagram of Figure 4.3(c). We are ow well o our wa to desigig a custom sigle-purpose processor that eecutes the GCD program. Our et step is to divide the fuctioalit ito a datapath part ad a cotroller part, as show i Figure 4.4. The datapath part should cosist of a itercoectio of combiatioal ad sequetial compoets. The cotroller part should cosist of a basic state diagram, i.e., oe cotaiig ol boolea actios ad coditios. We costruct the datapath through a four-step process: Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4-9 Figure 4.3: Eample program -- Greatest Commo Divisor (GCD): (a) black-bo view, (b) desired fuctioalit, (c) state diagram go_i _i _i GCD d_o : 2: 2-J:!go_i!!(!go_i) (a) 3: = _i : vectorn, ; : while () { 2: while (!go_i); 3: = _i; 4: = _i; 5: while (!= ) { 6: if ( < ) 7: = - ; else 8: = - ; } 9: d_o = ; } (b) 4: 5: 6: 7: = - 8: = - 6-J: 5-J: = _i <!=!(!=)!(<) 9: d_o = -J: (c). First, we create a register for a declared variable. I the eample, these are ad. We treat a output port as havig a implicit variable, so we create a register d ad coect it to the output port. We also draw the iput ad output ports. 2. Secod, we create a fuctioal uit for each arithmetic operatio i the state diagram. I the eample, there are two subtractios, oe compariso for less tha, ad oe compariso for iequalit, ieldig two subtractors ad two comparators, as show i the figure. 3. Third, we coect the ports, registers ad fuctioal uits. For each write to a variable i the state diagram, we draw a coectio from the write s source (a iput port, a fuctioal uit, or aother register) to the variable s register. For each arithmetic ad logical operatio, we coect sources to a iput of the operatio s correspodig fuctioal uit. Whe more tha oe source is coected to a register, we add a appropriatel-sized multipleor. 4. Fiall, we create a uique idetifier for each cotrol iput ad output of the datapath compoets. Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4- Figure 4.4: Eample after splittig ito a cotroller ad a datapath. go_i _i _i Cotroller : 2: 2-J:!!(!go_i)!go_i _sel _sel _ld _ld -bit 2 -bit 2 : : Datapath 3: _sel = _ld =!= < 5:!= 6: < _eq_ subtractor 8: - subtractor 7: - 4: _sel = _ld = _lt_ d_ld 9: d 5: 6: _lt_= _eq_= _eq_= _lt_= 7: _sel = 8: _sel = _ld = _ld = 6-J: Cotroller implemetatio model go_i Combiatioal logic _sel _sel _ld _ld _eq lt_ d_o 5-J: d_ld 9: d_ld = Q3 Q2 Q Q -J: I3 State register I2 I I Now that we have a complete datapath, we ca build a state diagram for our cotroller. The state diagram has the same structure as the comple state diagram. However, we replace comple actios ad coditios b boolea oes, makig use of our datapath. We replace ever variable write b actios that set the select sigals of the multipleor i frot of the variable s register s such that the write s source passes through, ad we assert the load sigal of that register. We replace ever logical operatio i a coditio b the correspodig fuctioal uit cotrol output. We ca the complete the cotroller desig b implemetig the state diagram usig our sequetial desig techique described earlier. Figure 4.4 shows the cotroller implemetatio model, ad Figure 4.5 shows a state table. Note that there are 7 iputs to the cotroller, resultig i 28 rows for the table. We reduced rows i the state table b usig do t cares for some iput combiatios, but we Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4- Figure 4.5: State table for the GCD eample. Iputs State table Outputs d_ld _ld _ld _sel _sel I I I2 I3 go_i _lt eq_ Q Q Q2 Q3 * * * X X * * X X * * X X * * * X X * * * X * * * X * * X X * * X X * * X X * * X X * * * X * * * X * * * X X * * * X X * * * X X * * * X X * * * X X * * * X X * * * X X * - idicates all possible combiatios of s ad s X - idicates do t cares ca still see that optimizig the desig usig had techiques could be quite tedious. For this reaso, computer-aided desig (CAD) tools that automate the combiatioal as well as sequetial logic desig ca be ver helpful; we ll itroduce such CAD tools i a later chapter. Also, ote that we could perform sigificat amouts of optimizatio to both the datapath ad the cotroller. For eample, we could merge fuctioal uits i the datapath, resultig i fewer uits at the epese of more multipleors. We could also merge states i the datapath. Such optimizatios will be discussed i a later chapter. Remember that we could alterativel implemet the GCD program b programmig a microcotroller, thus elimiatig the eed for this desig process, but possibl ieldig a slower ad bigger desig. 4.5 Summar Desigig a custom sigle-purpose processor for a give program requires a uderstadig of various aspects of digital desig. Desig of a circuit to implemet boolea fuctios requires combiatioal desig, which cosists of buildig a truth table with all possible iputs ad desired outputs, optimizig, ad drawig a circuit. Desig of a circuit to implemet a state diagram requires sequetial desig, which cosists of drawig a implemetatio model with a state register ad a combiatioal logic block, assigig a biar ecodig to each state, drawig a state table with iputs ad outputs, ad repeatig our combiatioal desig process for this table. Fiall, desig of a siglepurpose processor circuit to implemet a program requires us to first schedule the program s statemets ito a comple state diagram, costruct a datapath from the diagram, create a ew state diagram that replaces comple actios ad coditios b datapath cotrol operatios, ad the desig a cotroller circuit for the ew state diagram Embedded Sstem Desig, Vahid/Givargis

Chapter 4: Custom sigle-purpose processors: Hardware 4-2 usig sequetial desig. Because processors ca be comple, CAD tools would be a great desiger s aid. 4.6 Refereces ad further readig Gajski, Daiel D. Priciples of Digital Desig. New Jerse: Pretice-Hall, 997. ISBN -3-344-5. Describes combiatioal ad sequetial logic desig, with a focus o optimizatio techiques, CAD, ad higher-levels of desig. Katz, Rad. Cotemporar Logic Desig. Redwood Cit, Califoria: Bejami/Cummigs, 994. ISBN -853-273-7. Describes combiatioal ad sequetial logic desig, with a focus o logic ad sequetial optimizatio ad CAD. 4.7 Eercises. Build a 3-iput NAND gate usig a miimum umber of CMOS trasistors. 2. Desig a 2-bit comparator (compares two 2-bit words) with a sigle output "lesstha," usig the combiatioal desig techique described i the chapter. Start from a truth table, use K-maps to miimize logic, ad draw the fial circuit. 3. Desig a 3-bit couter that couts the followig sequece:, 2, 4, 5, 7,, 2,...,. This couter has a output "odd" that is oe whe the curret cout value is odd. Use the sequetial desig techique of the chapter. Start from a state diagram, draw the state table, miimize logic, ad draw the fial circuit. 4. Compare the GCD custom-processor implemetatio to a software implemetatio. (a) Compare the performace. Assume a s clock for the microcotroller, ad a 2 s clock for the custom processor. Assume the microcotroller uses two operad istructios, ad each istructio requires 4 clock ccles. Estimates for the microcotroller are fie. (b) Estimate the umber of gates for the custom desig, ad compare this to, gates for a simple 8-bit microcotroller. (c) Compare the custom GCD with the GCD ruig o a 3 MHz processor with 2-operad istructios ad clock ccle per istructio (advaced processors use parallelism to meet or eceed ccle per istructio). (d) Compare the estimated gates with 2, gates, a tpical umber of gates for a moder 32-bit processor. 5. Desig a custom sigle-purpose processor implemetig the followig program, usig the techique of the chapter. Start with a comple state diagram, costruct a datapath ad a simplified state diagram, ad draw the truth table for the cotroller, but do ot complete the desig for the cotroller beod the truth table. iput_port U; it V; for (it i=; i<32; i++) V = V + U*V; Embedded Sstem Desig, Vahid/Givargis