ESE534: Computer Organization. Previously. Today. Preclass. Why Registers? Latches, Registers

Similar documents
ESE534: Computer Organization. Previously. Today. Preclass. Preclass. Latches, Registers. Boolean Logic Gates Arithmetic Complexity of computations

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?

Outline. EECS Components and Design Techniques for Digital Systems. Lec 11 Putting it all together Where are we now?

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL. VHDL History. Why VHDL? Introduction to Structured VLSI Design. Very High Speed Integrated Circuit (VHSIC) Hardware Description Language

2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today.

Sequential Circuits. inputs Comb FFs. Outputs. Comb CLK. Sequential logic examples. ! Another way to understand setup/hold/propagation time

Two HDLs used today VHDL. Why VHDL? Introduction to Structured VLSI Design

Lecture #1: Introduction

structure syntax different levels of abstraction

Here is a list of lecture objectives. They are provided for you to reflect on what you are supposed to learn, rather than an introduction to this

Overview. Ram vs Register. Register File. Introduction to Structured VLSI Design. Recap Operator Sharing FSMD Counters.

Verilog for Synthesis Ing. Pullini Antonio

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today.

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

Control and Datapath 8

Introduction to Digital Logic Missouri S&T University CPE 2210 Registers

ESE534 Computer Organization. Previously. Today. Spatial Programmable. Spatially Programmable. Needs?

Binary Adders. Ripple-Carry Adder

ESE532: System-on-a-Chip Architecture. Today. Message. Real Time. Real-Time Tasks. Real-Time Guarantees. Real Time Demands Challenges

Verilog for High Performance

Lecture 24: Sequential Logic Design. Let s refresh our memory.

L11: Major/Minor FSMs

ESE535: Electronic Design Automation. Today. Example: Same Semantics. Task. Problem. Other Goals. Retiming. Day 23: April 13, 2011 Retiming

ECE 545 Lecture 12. Datapath vs. Controller. Structure of a Typical Digital System Data Inputs. Required reading. Design of Controllers

Quick Introduction to SystemVerilog: Sequental Logic

Music. Numbers correspond to course weeks EULA ESE150 Spring click OK Based on slides DeHon 1. !

ESE534 Computer Organization. Today. Temporal Programmable. Spatial Programmable. Needs? Spatially Programmable

Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures

Chapter 10. case studies in sequential logic design

Synthesis of Combinational and Sequential Circuits with Verilog

In this lecture, we will go beyond the basic Verilog syntax and examine how flipflops and other clocked circuits are specified.

Design of Embedded DSP Processors

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Assignment. Last time. Last time. ECE 4514 Digital Design II. Back to the big picture. Back to the big picture

ECEU530. Schedule. ECE U530 Digital Hardware Synthesis. Datapath for the Calculator (HW 5) HW 5 Datapath Entity

Lecture 4: More Lab 1

Finite State Machines

Advanced Synthesis Techniques

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

EEL 4783: HDL in Digital System Design

Review. EECS Components and Design Techniques for Digital Systems. Lec 05 Boolean Logic 9/4-04. Seq. Circuit Behavior. Outline.

Abstraction. Levels of abstraction in a computer system

קורס VHDL for High Performance. VHDL

a, b sum module add32 sum vector bus sum[31:0] sum[0] sum[31]. sum[7:0] sum sum overflow module add32_carry assign

(Refer Slide Time 6:48)

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

Register Transfer Methodology II

Outline. Register Transfer Methodology II. 1. One shot pulse generator. Refined block diagram of FSMD

Register Transfer Level in Verilog: Part I

Lec-6-HW-2-digitalDesign

Abstraction. ENGR Michael S. McCorquodale, Peter M. Chen

I 3 I 2. ! Language of logic design " Logic optimization, state, timing, CAD tools

CS/EE Computer Design Lab Fall 2010 CS/EE T Th 3:40pm-5:00pm Lectures in WEB 110, Labs in MEB 3133 (DSL) Instructor: Erik Brunvand

CS/EE Prerequsites. Hardware Infrastructure. Class Goal CS/EE Computer Design Lab. Computer Design Lab Fall 2010

Computer Architecture. R. Poss

Announcements. Midterm 2 next Thursday, 6-7:30pm, 277 Cory Review session on Tuesday, 6-7:30pm, 277 Cory Homework 8 due next Tuesday Labs: project

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Finite-State Machine (FSM) Design

ESE 150 Lab 07: Digital Logic

Recommended Design Techniques for ECE241 Project Franjo Plavec Department of Electrical and Computer Engineering University of Toronto

CSC 220: Computer Organization Unit 12 CPU programming

Outline. Introduction to Structured VLSI Design. Signed and Unsigned Integers. 8 bit Signed/Unsigned Integers

CS 61C Summer 2012 Discussion 11 State, Timing, and CPU (Solutions)

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it.

Homework deadline extended to next friday

Lecture 5. Other Adder Issues

ESE535: Electronic Design Automation. Today. Question. Question. Intuition. Gate Array Evaluation Model

Embedded Systems Design Prof. Anupam Basu Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Synthesis of Language Constructs. 5/10/04 & 5/13/04 Hardware Description Languages and Synthesis

ESE532: System-on-a-Chip Architecture. Today. Message. Coding Accelerators. Course Hypothesis. Discussion [open]

RTL Design (Using ASM/SM Chart)

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 9 Fall 2017

EE 4755 Digital Design Using Hardware Description Languages

Today. ESE532: System-on-a-Chip Architecture. Day 3: Memory Scaling. Message. Day 3: Lesson DRAM DRAM

EECS 470 Lab 3. SystemVerilog Style Guide. Department of Electrical Engineering and Computer Science College of Engineering University of Michigan

23. Digital Baseband Design

TSEA44 - Design for FPGAs

Topics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1)

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Verilog. What is Verilog? VHDL vs. Verilog. Hardware description language: Two major languages. Many EDA tools support HDL-based design

Introduction to Digital Logic Missouri S&T University CPE 2210 Multipliers/Dividers

University of Pennsylvania Department of Electrical and System Engineering Computer Organization

Engin 100 (section 250), Winter 2015, Technical Lecture 3 Page 1 of 5. Use pencil!

Administrivia. CSE 370 Spring 2006 Introduction to Digital Design Lecture 9: Multilevel Logic

Many ways to build logic out of MOSFETs

Chapter 6. Logic Design Optimization Chapter 6

Programming with HDLs

Lecture 19 Introduction to Pipelining

Fundamentals of Digital System Design ECE 3700, CPSC 3700

EITF35: Introduction to Structured VLSI Design

Logic Synthesis. EECS150 - Digital Design Lecture 6 - Synthesis

SoC Design for the New Millennium Daniel D. Gajski

ESE535: Electronic Design Automation. Today. EDA Use. Problem PLA. Programmable Logic Arrays (PLAs) Two-Level Logic Optimization

(Refer Slide Time: 00:01:53)

ESE532: System-on-a-Chip Architecture. Today. Message. Preclass 1. Computing Forms. Preclass 1

ESE532: System-on-a-Chip Architecture. Today. Process. Message FIFO. Thread. Dataflow Process Model Motivation Issues Abstraction Recommended Approach

Lecture 7. Summary of two-level combinational-logic. Ways of specifying circuits. Solving combinational design problems. Verilog versus VHDL

Transcription:

ESE534: Computer Organization Previously Day 4: January 25, 2012 Sequential Logic (FSMs, Pipelining, FSMD) Boolean Logic Gates Arithmetic Complexity of computations E.g. area and delay for addition 1 2 Today Sequential Logic Add registers, state Finite-State Machines (FSM) Register Transfer Level (RTL) logic Datapath Reuse Pipelining Latency and Throughput Finite-State Machines with Datapaths (FSMD) Preclass Can we solve the problem entirely using Boolean logic functions? 3 4 Latches, Registers Why Registers? New element is a state element. Canonical instance is a register: remembers the last value it was given until told to change typically signaled by clock Why do we need registers? D Q > 5 6 1

Reuse In general, we want to reuse our components in time not disposable logic How do we guarantee disciplined reuse? To Reuse Logic Make sure all logic completed evaluation Outputs of gates are valid Meaningful to look at them Gates are finished with work and ready to be used again Make sure consumers get value Before being overwritten by new calculation (new inputs) 7 8 Synchronous Logic Model Data starts Inputs to circuit Registers Perform combinational (boolean) logic Outputs of logic Exit circuit Clocked into registers Given long enough clock Think about registers getting values updated by logic on each clock cycle 9 Issues of Timing... many issues in detailed implementation glitches and hazards in logic timing discipline in clocking We re going to (mostly) work above that level this term. Will talk about the delay of logic between registers Watch for these details in ESE370/570 10 Preclass How do we build an adder for arbitrary input width? Preclass What did the addition of state register(s) do for us? 11 12 2

Added Power Process unbounded input with finite logic Ratio input:gates arbitrarily large State is a finite (bounded) representation of what s happened before finite amount of stuff can remember to synopsize the past State allows behavior to depend on past (on context) 13 Finite-State Machine (FSM) (Finite Automata) Logic core Plus registers to hold state 14 FSM Model FSM a model of computations More powerful than Boolean logic functions Both Theoretically practically 15 FSM Abstraction Implementation vs. Abstraction Nice to separate out The abstract function want to achieve The concrete implementation Saw with Boolean logic There are many ways to implement function Want to select the concrete one that minimizes costs FSMs also separate out desired function from implementation 16 Finite State Machine Specifying an FSM Informally: Behavior depends not just on input (as was the case for combinational logic) also depends on state Can be completely different behavior in each state Logic/output now depends on both state and input 17 Logic becomes: if (state=s1) boolean logic for state 1 (including logic for calculate next state) else if (state=s2) boolean logic for state2 if (state=sn) boolean logic for state n 18 3

Specifying FSM What s your favorite way to specify an FSM? Another reason we need to separate the abstract operation from the Specification Implementation 19 St1: goto St2 St2: if (I==0) goto St3 else goto St4 St3: output o0=1 goto St1 St4: output o1=1 goto St2 FSM Specification Could be: behavioral language {Verilog, VHDL, Bluespec} computer language (C) state-transition graph extract from gates + registers 20 State Encoding FSM Equivalence States not (necessarily) externally visible We have freedom in how to encode them assign bits to states Usually want to exploit freedom to minimize implementation costs area, delay, energy (there are algorithms to attack ESE535) 21 Harder than Boolean logic Doesn t have unique canonical form Consider: state encoding not change behavior two equivalent FSMs may not even have the same number of states can deal with infinite (unbounded) input...so cannot enumerate output in all cases No direct correspondence of a truth table 22 FSM Equivalence What does matter? What property needs to hold for two FSMs to be equivalent? 23 FSM Equivalence What matters is external observability FSM outputs same signals in response to every possible input sequence Is it possible to check equivalence over an infinite number of input sequences? Possible? Finite state suggests there is a finite amount of checking required to verify behavior 24 4

FSM Equivalence Flavor Given two FSMs A and B consider the composite FSM AB Inputs wired together Outputs separate Ask: is it possible to get into a composite state in which A and B output different symbols? There is a literature on this 25 Systematic FSM Design Start with specification Can compute Boolean logic for each state If conversion including next state translation Keep state symbolic (s1, s2 ) Assign state encodings Then have combinational logic has current state as part of inputs produces next state as part of outputs Design comb. logic and add state registers 26 RTL Register Transfer Level description Registers + Boolean logic Datapath Reuse Most likely: what you ve written in Verilog, VHDL 27 28 Reuse: Waiting Discipline Example: 4b Ripple Adder Use registers and timing for orderly progression of data How fast can we clock this? Min Clock Cycle: 8 gates A, B to S3 29 30 5

Can we do better? Can we clock faster, reuse elements sooner? Stagger Inputs Correct if expecting A,B[3:2] to be staggered one cycle behind A,B[1:0] and succeeding stage expects S[3:2] staggered from S[1:0] 31 32 Align Data / Balance Paths Speed Good discipline to line up pipe stages in diagrams. 33 How fast can we clock this? Assuming we clock that fast, what is the delay from A,B to S3? S3 A0 34 Pipelining and Timing Pipelining and Timing Once introduce pipelining Clock cycle = rate of reuse Is not the same as the delay to complete a computation 35 Throughput How many results can the circuit produce per unit time If can produce one result per cycle, Reciprocal of clock period Throughput of this design? 36 6

Pipelining and Timing Latency How long does it take to produce one result Product of clock cycle number of clocks between input and output Latency of this design? 37 Example: 4b RA pipe 2 Latency and Throughput: Latency: 8 gates to S3 Throughput: 1 result / 4 gate delays max 38 Throughput vs. Latency Examples where throughput matters? Deeper? Can we do it again? Examples where latency matters? What s our limit? Why would we stop? 39 40 More Reuse Saw could pipeline and reuse FA more frequently Suggests we re wasting the FA part of the time in non-pipelined What is FA3 doing while FA0 is computing? More Reuse (cont.) If we re willing to take 8 gate-delay units, do we need 4 FAs? 3 2 1 0 41 42 7

Ripple Add (pipe view) Bit Serial Addition Can pipeline to FA. What if don t need the throughput? If don t need throughput, reuse FA on SAME addition. Assumes LSB first ordering of input data. 43 44 Bit Serial Addition: Pipelining Latency and throughput? Latency: 8 gate delays 10 for 5 th output bit Throughput: 1 result / 10 gate delays Registers do have time overhead setup, hold time, clock jitter Multiplication Can be defined in terms of addition Ask you to play with implementations and tradeoffs in homework 2 45 46 Compute Function Design Space for Computation Compute: y=ax 2 +Bx +C Assume 47 D(Mpy) > D(Add) E.g. D(Mpy)=24, D(Add)=8 A(Mpy) > A(Add) E.g. A(Mpy)=64, A(Add)=8 48 8

Spatial Quadratic Pipelined Spatial Quadratic Latency? Throughput? Area? Latency? Throughput? Area? A(Reg)=4 D(Quad) = 2*D(Mpy)+D(Add) = 56 Throughput 1/(2*D(Mpy)+D(Add)) = 1/56 A(Quad) = 3*A(Mpy) + 2*A(Add) = 208 49 D(Quad) = 3*D(Mpy) = 72 Throughput 1/D(Mpy) = 1/24 A(Quad) = 3*A(Mpy) + 2*A(Add)+6A(Reg) 50 = 232 Quadratic with Single Multiplier and Adder? We ve seen reuse to perform the same operation pipelining bit-serial, homogeneous datapath We can also reuse a resource in time to perform a different role. Repeated Operations What operations occur multiple times in this datapath? x*x, A*(x*x), B*x (Bx)+c, (A*x*x)+(Bx+c) 51 52 Quadratic Datapath Quadratic Datapath Start with one of each operation (alternatives where build multiply from adds e.g. homework) 53 Multiplier serves multiple roles x*x A*(x*x) B*x Will need to be able to steer data (switch interconnections) 54 9

Quadratic Datapath Quadratic Datapath Multiplier serves multiple roles x*x A*(x*x) B*x Inputs a) x, x*x b) x,a,b Multiplier serves multiple roles x*x A*(x*x) B*x Inputs a) x, x*x b) x,a,b 55 56 Quadratic Datapath Quadratic Datapath Adder serves multiple roles (Bx)+c (A*x*x)+(Bx+c) Inputs one always mpy output C, Bx+C 57 58 Quadratic Datapath Add input register for x 59 Quadratic Control Now, we just need to control the datapath What control? Control: LD x LD x*x MA Select MB Select AB Select LD Bx+C LD Y 60 10

FSMD Quadratic FSMD FSMD = FSM + Datapath Stylization for building controlled datapaths such as this (a pattern) Of course, an FSMD is just an FSM it s often easier to think about as a datapath synthesis, place and route tools have been notoriously bad about discovering/ exploiting datapath structure 61 62 Quadratic FSMD Control Quadratic FSMD Control S0: if (go) LD_X; goto S1 else goto S0 S1: MA_SEL=x,MB_SEL[1:0]=x, LD_x*x goto S2 S2: MA_SEL=x,MB_SEL[1:0]=B goto S3 S3: AB_SEL=C,MA_SEL=x*x, MB_SEL=A goto S4 S4: AB_SEL=Bx+C, LD_Y goto S0 63 S0: if (go) LD_X; goto S1 else goto S0 S1: MA_SEL=x,MB_SEL[1:0]=x, LD_x*x goto S2 S2: MA_SEL=x,MB_SEL[1:0]=B goto S3 S3: AB_SEL=C,MA_SEL=x*x, MB_SEL=A goto S4 S4: AB_SEL=Bx+C, LD_Y goto S0 64 Quadratic FSM D(mux3)=D(mux2)=1 A(mux2)=2 A(mux3)=3 A(QFSM) ~= 10 Latency/Throughput/Area? Latency: 5*(D(MPY)+D(mux3)) = 125 Throughput: 1/Latency = 1/125 Area: A(Mpy)+A(Add)+5*A(Reg) +2*A(Mux2)+A(Mux3)+A(QFSM) = 109 65 Big Ideas [MSB Ideas] Registers allow us to reuse logic Can implement any FSM with gates and registers Pipelining increases parallelism allows reuse in time (same function) Control and Sequencing reuse in time for different functions Can tradeoff Area and Time 66 11

Big Ideas [MSB-1 Ideas] Admin: Reminder RTL specification FSMD idiom HW1 due today (10pm) HW2 due next Wednesday Reading for next week online 67 68 12