ETH, Design of Digital Circuits, SS17 Practice Exercises III

Similar documents
ETH, Design of Digital Circuits, SS17 Review Session Questions I

ETH, Design of Digital Circuits, SS17 Review Session Questions I - SOLUTIONS

CMU Introduction to Computer Architecture, Spring Midterm Exam 1. Date: Wed., 3/6. Legibility & Name (5 Points): Problem 1 (65 Points):

ETH, Design of Digital Circuits, SS17 Practice Exercises II - Solutions

Design of Digital Circuits ( L) ETH Zürich, Spring 2017

Computer Architecture. Lecture 5: Multi-Cycle and Microprogrammed Microarchitectures

1 Tomasulo s Algorithm

1 Tomasulo s Algorithm

Department of Electrical and Computer Engineering The University of Texas at Austin

Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures

CS 2461: Computer Architecture I

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin

11/28/2016. ECE 120: Introduction to Computing. Register Loads Control Updates to Register Values. We Consider Five Groups of LC-3 Control Signals

Department of Electrical and Computer Engineering The University of Texas at Austin

LC-3 Instruction Processing

Department of Electrical and Computer Engineering The University of Texas at Austin

Sequential Logic Design

LC-3 Instruction Processing. (Textbook s Chapter 4)

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin

COSC121: Computer Systems: Review

EEL 5722C Field-Programmable Gate Array Design

CS/ECE 252: INTRODUCTION TO COMPUTER ENGINEERING UNIVERSITY OF WISCONSIN MADISON. Instructor: Rahul Nayar TAs: Annie Lin, Mohit Verma

Design of Digital Circuits ( L) ETH Zürich, Spring 2018

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Computing Layers

Introduction to Computer Engineering. Chapter 5 The LC-3. Instruction Set Architecture

ECE 411 Exam 1. This exam has 5 problems. Make sure you have a complete exam before you begin.

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013

Department of Electrical and Computer Engineering The University of Texas at Austin

COSC121: Computer Systems: Review

Introduction to Computer Engineering. CS/ECE 252, Fall 2016 Prof. Guri Sohi Computer Sciences Department University of Wisconsin Madison

Computing Layers. Chapter 5 The LC-3

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Chapter 5 The LC-3. ACKNOWLEDGEMENT: This lecture uses slides prepared by Gregory T. Byrd, North Carolina State University 5-2

ECE 2300 Digital Logic & Computer Organization. More Finite State Machines

The LC-3 Instruction Set Architecture. ISA Overview Operate instructions Data Movement instructions Control Instructions LC-3 data path

Instruction Set Architecture

EECS 470 Midterm Exam

8-1. Fig. 8-1 ASM Chart Elements 2001 Prentice Hall, Inc. M. Morris Mano & Charles R. Kime LOGIC AND COMPUTER DESIGN FUNDAMENTALS, 2e, Updated.

EE178 Lecture Verilog FSM Examples. Eric Crabill SJSU / Xilinx Fall 2007

15-740/ Computer Architecture, Fall 2011 Midterm Exam II

Synthesis of Language Constructs. 5/10/04 & 5/13/04 Hardware Description Languages and Synthesis

Portland State University ECE 587/687. Superscalar Issue Logic

Advanced Computer Architecture

Programmable machines

CMU Introduction to Computer Architecture, Spring Midterm Exam 2. Date: Wed., 4/17. Legibility & Name (5 Points): Problem 1 (90 Points):

Design of Digital Circuits Lecture 21: GPUs. Prof. Onur Mutlu ETH Zurich Spring May 2017

Chap 6 Introduction to HDL (d)

Computer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013

Chapter. Out of order Execution

Department of Electrical and Computer Engineering The University of Texas at Austin

Quick Introduction to SystemVerilog: Sequental Logic

CS152 Computer Architecture and Engineering. Complex Pipelines

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011

8-1. Fig. 8-1 ASM Chart Elements 2001 Prentice Hall, Inc. M. Morris Mano & Charles R. Kime LOGIC AND COMPUTER DESIGN FUNDAMENTALS, 2e, Updated.

EECS150. Implement of Processor FSMs

Hardware-based Speculation

Control Unit Implementation

Introduction to Computer Engineering. CS/ECE 252, Spring 2017 Rahul Nayar Computer Sciences Department University of Wisconsin Madison

20/08/14. Computer Systems 1. Instruction Processing: FETCH. Instruction Processing: DECODE

Intro. to Computer Architecture Homework 4 CSE 240 Autumn 2005 DUE: Mon. 10 October 2005

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Lecture 32: SystemVerilog

ECE 411, Exam 1. Good luck!

ECE 4514 Digital Design II. Spring Lecture 13: Logic Synthesis

10/31/2016. The LC-3 ISA Has Three Kinds of Opcodes. ECE 120: Introduction to Computing. ADD and AND Have Two Addressing Modes

ECE 4514 Digital Design II. Spring Lecture 15: FSM-based Control

15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University

Alexandria University

Course Topics - Outline

Multicycle Approach. Designing MIPS Processor

ECE 551: Digital System *

Computer Architecture: SIMD and GPUs (Part I) Prof. Onur Mutlu Carnegie Mellon University

LC3DataPath ECE2893. Lecture 9a. ECE2893 LC3DataPath Spring / 14

Synthesizable Verilog

Modeling Sequential Circuits in Verilog

EECS 470 Midterm Exam

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

CC 311- Computer Architecture. The Processor - Control

Computer Organization

EPC6055 Digital Integrated Circuits EXAM 1 Fall Semester 2013

ECE 2300 Digital Logic & Computer Organization. More Verilog Finite State Machines

Verilog Fundamentals. Shubham Singh. Junior Undergrad. Electrical Engineering

Special Microarchitecture based on a lecture by Sanjay Rajopadhye modified by Yashwant Malaiya

Finite-State Machine (FSM) Design

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

COMPUTER ORGANIZATION AND DESIGN

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

VHDL: RTL Synthesis Basics. 1 of 59

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

Block diagram view. Datapath = functional units + registers

Advanced Computer Architecture

Writing Circuit Descriptions 8

Parallel versus serial execution

Laboratory Exercise 3 Davide Rossi DEI University of Bologna AA

Verilog Coding Guideline

Micro-programmed Control Ch 15

Chapter 15: Design Examples. CPU Design Example. Prof. Soo-Ik Chae 15-1

Transcription:

ETH, Design of Digital Circuits, SS17 Practice Exercises III Instructors: Prof. Onur Mutlu, Prof. Srdjan Capkun TAs: Jeremie Kim, Minesh Patel, Hasan Hassan, Arash Tavakkol, Der-Yeuan Yu, Francois Serre, Victoria Caparros Cabezas, David Sommer, Mridula Singh, Sinisa Matetic, Aritra Dhar, Marco Guarnieri Note: These exercises are not graded and are optional. 1 Potpourri 1.1 Pipelining a) Circle one of A, B, C, D. As pipeline depth increases, the latency to process a single instruction: A. decreases B. increases C. stays the same D. could increase, decrease, or stay the same, depending on... b) Explain your reasoning (in no more than 20 words): Each additional pipeline latch adds processing overhead (additional latency). 1.2 Data Dependencies In a superscalar processor, the hardware detects the data dependencies between concurrently fetched instructions. In contrast, in a VLIW processor, the compiler does. 1.3 TLBs What does a TLB (Translation Lookaside Buffer) cache? Page table entries. 1.4 SIMD Processing Suppose we want to design a SIMD engine that can support a vector length of 16. We have two options: a traditional vector processor and a traditional array processor. Which one is more costly in terms of chip area (circle one)? Explain: The traditional vector processor The traditional array processor Neither An array processor requires 16 functional units for an operation whereas a vector processor requires only 1. Assuming the latency of an addition operation is five cycles in both processors, how long will a VADD (vector add) instruction take in each of the processors (assume that the adder can be fully pipelined and is the same for both processors)? 1/21

For a vector length of 1: The traditional vector processor: 5 cycles The traditional array processor: 5 cycles For a vector length of 4: The traditional vector processor: 8 cycles (5 for the first element to complete, 3 for the remaining 3) The traditional array processor: 5 cycles For a vector length of 16: The traditional vector processor: 20 cycles (5 for the first element to complete, 15 for the remaining 15) The traditional array processor: 5 cycles 2/21

2 GPUs and SIMD We define the SIMD utilization of a program run on a GPU as the fraction of SIMD lanes that are kept busy with active threads during the run of a program. The following code segment is run on a GPU. Each thread executes a single iteration of the shown loop. Assume that the data values of the arrays A, B, and C are already in vector registers so there are no loads and stores in this program. (Hint: Notice that there are 4 instructions in each thread.) A warp in the GPU consists of 64 threads, and there are 64 SIMD lanes in the GPU. for (i = 0; i < 1024768; i++) { if (B[i] < 4444) { A[i] = A[i] * C[i]; B[i] = A[i] + B[i]; C[i] = B[i] + 1; } } (a) How many warps does it take to execute this program? Warps = (Number of threads) / (Number of threads per warp) Number of threads = 2 20 (i.e., one thread per loop iteration). Number of threads per warp = 64 = 2 6 (given). Warps = 2 20 /2 6 = 2 14 (b) When we measure the SIMD utilization for this program with one input set, we find that it is 67/256. What can you say about arrays A, B, and C? Be precise (Hint: Look at the if branch, what can you say about A, B and C?). A: Nothing B: 1 in every 64 of B s elements is less than 4444 C: Nothing (c) Is it possible for this program to yield a SIMD utilization of 100% (circle one)? YES If YES, what should be true about arrays A, B, C for the SIMD utilization to be 100%? Be precise. If NO, explain why not. NO Yes. Either: (1) All of B s elements are greater than or equal to 4444, or (2) All of B s element are less than 4444. (d) Is it possible for this program to yield a SIMD utilization of 25% (circle one)? YES NO 3/21

If YES, what should be true about arrays A, B, and C for the SIMD utilization to be 25%? Be precise. If NO, explain why not. No. The smallest SIMD utilization possible is the same as part (b), 67/256, but this is greater than 25%. 4/21

3 Tomasulo s Algorithm Remember that Tomasulo s algorithm requires tag broadcast and comparison to enable wake-up of dependent instructions. In this question, we will calculate the number of tag comparators and size of tag storage required to implement Tomasulo s algorithm in a machine that has the following properties: 8 functional units where each functional unit has a dedicated separate tag and data broadcast bus 32 64-bit architectural registers 16 reservation station entries per functional unit Each reservation station entry can have two source registers Answer the following questions. Show your work for credit. (a) What is the number of tag comparators per reservation station entry? 8 2 (b) What is the total number of tag comparators in the entire machine? 16 8 2 8 + 8 32 (c) What is the (minimum possible) size of the tag? log(16 8) = 7 (d) What is the (minimum possible) size of the register alias table (or, frontend register file) in bits? 72 32 (64 bits for data, 7 bits for the tag, 1 valid bit) (e) What is the total (minimum possible) size of the tag storage in the entire machine in bits? 7 32 + 7 16 8 2 5/21

4 Out-of-Order Execution In this problem, we will give you the state of the Register Alias Table (RAT) and Reservation Stations (RS) for an out-of-order execution engine that employs Tomasulos algorithm. Your job is to determine the original sequence of five instructions in program order. The out-of-order machine in this problem behaves as follows: The frontend of the machine has a one-cycle fetch stage and a one-cycle decode stage. The machine can fetch one instruction per cycle, and can decode one instruction per cycle. The machine dispatches one instruction per cycle into the reservation stations, in program order. Dispatch occurs during the decode stage. An instruction always allocates the first reservation station that is available (in top-to-bottom order) at the required functional unit. When a value is captured (at a reservation station) or written back (to a register) in this machine, the old tag that was previously at that location is not cleared; only the valid bit is set. When an instruction in a reservation station finishes executing, the reservation station is cleared. Both the adder and multiplier are fully pipelined. An add instruction takes 2 cycles. A multiply instruction takes 4 cycles. When an instruction completes execution, it broadcasts its result. A dependent instructions can begin execution in the next cycle if it has all its operands available. When multiple instructions are ready to execute at a functional unit, the oldest ready instruction is chosen. Initially, the machine is empty. Five instructions then are fetched, decoded, and dispatched into reservation stations. When the final instruction has been fetched and decoded, one instruction has already been written back. Pictured below is the state of the machine at this point, after the fifth instruction has been fetched and decoded: RAT ADD Reg V Tag Value R0 1 13 R1 0 A 8 R2 1 3 R3 1 5 R4 0 X 255 R5 0 Y 12 R6 0 Z 74 R7 1 7 Src 1 Src2 Src 1 Src2 Tag V Value Tag V Value A - 1 5 Z 0 - B C MUL Tag V Value Tag V Value A A 1 8-1 7 B X 0 - - 1 13 C - 1 3-1 8 6/21

(a) Give the five instructions that have been dispatched into the machine, in program order. The source registers for the first instruction can be specified in either order. Give instructions in the following format: opcode destination source1, source2. ADD R1 R2, R3 MUL R4 R1, R7 MUL R5 R4, R0 MUL R6 R2, R1 ADD R1 R3, R6 (b) Now assume that the machine flushes all instructions out of the pipeline and restarts fetch from the first instruction in the sequence above. Show the full pipeline timing diagram below for the sequence of five instructions that you determined above, from the fetch of the first instruction to the writeback of the last instruction. Assume that the machine stops fetching instructions after the fifth instruction. As we saw in class, use F for fetch, D for decode, En to signify the nth cycle of execution for an instruction, and W to signify writeback. You may or may not need all columns shown. Cycle: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Instruction: F D E1 E2 W Instruction: F D E1 E2 E3 E4 W Instruction: F D E1 E2 E3 E4 W Instruction: F D E1 E2 E3 E4 W Instruction: F D E1 E2 W Finally, show the state of the RAT and reservation stations after 10 cycles in the blank figures below. ADD RAT Reg V Tag Value R0 1 13 R1 0 A 8 R2 1 3 R3 1 5 R4 1 X 56 R5 0 Y 12 R6 1 Z 24 R7 1 7 Src 1 Src2 Src 1 Src2 Tag V Value Tag V Value A - 1 5 Z 1 24 B C MUL Tag V Value Tag V Value A B X 1 56-1 13 C 7/21

5 Dataflow Consider the dataflow graph below and answer the following questions. Please restrict your answer to 10 words or less. What does the whole dataflow graph do (10 words or less)? (Hint: Identify what Module 1 and Module 2 perform.) The dataflow graph deadlocks unless the greatest common divisor (GCD) of A and B is the same as the least common multiple (LCM) of A and B. If (GCD(A, B) == LCM(A, B)) then ANSWER = 1 If you assume that A and B are fed as inputs continuously, the dataflow graph finds the least common multiple (LCM) of A and B. 8/21

A B C * C Module 1 < F T T F =0? F T T F F T =0? 1 T F + ANSWER 1 Module 2 Legend B A C A C C Copy Initially C=A Then C=B B C=A-B 9/21

6 Systolic Arrays The following diagram is a systolic array that performs the multiplication of two 4-bit binary numbers. Assume that each adder takes one cycle. (a) How many cycles does it take to perform one multiplication? 9 cycles (b) How many cycles does it take to perform three multiplications? 13 cycles. Note that some input ports have to hold their values for an extra cycle because their nodes need to wait for inputs from the previous nodes. You cannot completely overlap latencies of different nodes. However, because no one was aware of this, we accepted 11 cycles as another possible answer. 10/21

(c) Fill in the following table, which has a list of inputs (a 1-bit binary number) such that the systolic array produces the following outputs, in order: 5 5, 12 9, 11 15. Please refer to the following diagram to use as reference for all the input ports. 5x5 = 101x101 12x9 = 1100x1001 11x15 = 1011x1111 Cycles Row Inputs Column Inputs a 0 a 1 a 2 a 3 a 4 a 5 a 6 b 0 b 1 b 2 b 3 b 4 b 5 b 6 a 7 a 8 a 9 b 7 b 8 b 9 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 3 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 4 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 5 0 1 0 1 0 0 0 0 1 0 1 0 0 0 1 1 1 1 0 0 6 0 1 0 1 0 0 0 0 1 1 1 0 0 0 1 1 1 1 0 0 7 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 8 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 9 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 10 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 1 11 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 1 0 0 1 12 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 13 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11/21

7 Mystery Instruction A pesky engineer implemented a mystery instruction on the LC-3b. It is your job to determine what the instruction does. The mystery instruction is encoded as: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1010 DR SR1 0 0 0 0 0 0 The modifications we make to the LC-3b datapath and the microsequencer are highlighted in the attached figures (see the next two pages). We also provide the original LC-3b state diagram, in case you need it. (As a reminder, the selection logic for SR2MUX is determined internally based on the instruction.) The additional control signals are GateTEMP1/1: NO, YES GateTEMP2/1: NO, YES LD.TEMP1/1: NO, LOAD LD.TEMP2/1: NO, LOAD ALUK/3: OR1 (A 0x1), LSHF1 (A<<1), PASSA, PASS0 (Pass value 0), PASS16 (Pass value 16) COND/4: COND 0000 ;Unconditional COND 0001 ;Memory Ready COND 0010 ;Branch COND 0011 ;Addressing mode COND 0100 ;Mystery 1 COND 1000 ;Mystery 2 The microcode for the instruction is given in the table below. State Cond J Asserted Signals 001010 (10) COND 0000 001011 ALUK = PASS0, GateALU, LD.REG, DRMUX = DR (IR[11:9]) 001011 (11) COND 0000 101000 ALUK = PASSA, GateALU, LD.TEMP1, SR1MUX = SR1 (IR[8:6]) 101000 (40) COND 0000 110010 ALUK = PASS16, GateALU, LD.TEMP2 110010 (50) COND 1000 101101 ALUK = LSHF1, GateALU, LD.REG, SR1MUX = DR, DRMUX = DR (IR[11:9]) 111101 (61) COND 0000 101101 ALUK = OR1, GateALU, LD.REG, SR1MUX = DR, DRMUX = DR (IR[11:9]) 101101 (45) COND 0000 111111 GateTEMP1, LD.TEMP1 111111 (63) COND 0100 010010 GateTEMP2, LD.TEMP2 Describe what this instruction does. Bit-reverses the value in SR1 and puts it in DR. 12/21

Code: State 10: DR 0 State 11: TEMP1 value(sr1) State 40: TEMP2 16 State 50: DR = DR << 1 if (TEMP1[0] == 0) goto State 45 else goto State 61 State 61: DR = DR 0x1 State 45: TEMP1 = TEMP1 >> 1 State 63: DEC TEMP2 if (TEMP2 == 0) goto State 18 else goto State 50 13/21

14/21

15/21

C.4. THE CONTROL STRUCTURE 11 IR[11:9] 111 DR IR[11:9] IR[8:6] SR1 DRMUX SR1MUX (a) (b) IR[11:9] N Z P Logic BEN (c) Figure C.6: Additional logic required to provide control signals LC-3b to operate correctly with a memory that takes multiple clock cycles to read or store a value. Suppose it takes memory five cycles to read a value. That is, once MAR contains the address to be read and the microinstruction asserts READ, it will take five cycles before the contents of the specified location in memory are available to be loaded into MDR. (Note that the microinstruction asserts READ by means of three control signals: MIO.EN/YES, R.W/RD, and DATA.SIZE/WORD; see Figure C.3.) Recall our discussion in Section C.2 of the function of state 33, which accesses an instruction from memory during the fetch phase of each instruction cycle. For the LC-3b to operate correctly, state 33 must execute five times before moving on to state 35. That is, until MDR contains valid data from the memory location specified by the contents of MAR, we want state 33 to continue to re-execute. After five clock cycles, th h l t d th d lti i lid d t i MDR th 16/21

MAR <! PC PC <! PC + 2 18, 19 MDR <! M 33 R R IR <! MDR 35 To 8 RTI ADD 32 BEN<! IR[11] & N + IR[10] & Z + IR[9] & P [IR[15:12]] 1011 1010 BR To 11 To 10 DR<! SR1+OP2* set CC DR<! SR1&OP2* set CC 1 5 AND XOR TRAP SHF LEA LDB LDW STW STB JSR JMP 0 [BEN] 0 1 22 PC<! PC+LSHF(off9,1) 9 DR<! SR1 XOR OP2* set CC 12 PC<! BaseR MAR<! LSHF(ZEXT[IR[7:0]],1) 15 4 [IR[11]] R 28 MDR<! M[MAR] R7<! PC R PC<! MDR 30 0 1 20 R7<! PC PC<! BaseR 21 R7<! PC PC<! PC+LSHF(off11,1) 13 DR<! SHF(SR,A,D,amt4) set CC 14 DR<! PC+LSHF(off9, 1) set CC 2 MAR<! B+off6 6 MAR<! B+LSHF(off6,1) 7 MAR<! B+LSHF(off6,1) 3 MAR<! B+off6 29 25 23 24 NOTES B+off6 : Base + SEXT[offset6] PC+off9 : PC + SEXT[offset9] *OP2 may be SR2 or SEXT[imm5] ** [15:8] or [7:0] depending on MAR[0] MDR<! M[MAR[15:1] 0] R R 31 DR<! SEXT[BYTE.DATA] set CC MDR<! M[MAR] 27 R DR<! MDR set CC R MDR<! SR 16 M[MAR]<! MDR R R MDR<! SR[7:0] 17 M[MAR]<! MDR** R R To 19 Figure C.2: A state machine for the LC-3b 17/21

8 Finite State Machines We want to design a Finite State Machine (FSM) that has a one bit input (in) and will detect the sequence 0-1-0. If this sequence is detected, the one bit output (detected) will be set to 1, otherwise this output will remain at 0. Two of your colleagues have designed different state transition diagrams given below. in=0 reset in=0 in=1 in=0 FSM A INIT detected=0 ZERO detected=0 ZEROONE detected=0 OUT detected=1 in=1 in=1 in=1 reset in=0 / detected=0 in=0 / detected=0 in=1 / detected=0 FSM B INIT ZERO ZEROONE in=1 / detected=0 in=0 / detected=1 in=1 / detected=0 (a) Which one of the state diagrams depicts a Moore and which one a Mealy type of FSM? FSM A is a Moore type FSM, the output depends only on the state. FSM B is a Mealy type FSM, the output depends on both the state and inputs (b) For both state transition diagrams state whether or not they are correct. FSM A has a small mistake, for state ZERO it is not clear what will happen when in=0. FSM B is correct. 18/21

(c) Complete the following Verilog module that would implement the state machine as described in the question. You can implement one state transition diagram of your colleagues if that one is correct. 1 module fsm ( input in, input clk, input reset, output reg detected ); 2 3 reg [1:0] next_state, present_ state ; 4 5 parameter INIT = 2 b11 ; 6 parameter ZERO = 2 b00 ; 7 parameter ZEROONE = 2 b01 ; 8 9 always @ (*) 10 begin 11 next_ state <= present_ state ; // default 12 detected <= 1 b0; 13 case ( present_ state ) 14 INIT : next_ state <= in? INIT : ZERO ; 15 ZERO : next_ state <= in? ZEROONE : ZERO ; 16 ZEROONE : if ( in) 17 next_ state <= INIT ; 18 else 19 begin 20 next_ state <= ZERO ; 21 detected <= 1 b1; 22 end 23 default : next_ state <= present_ state ; 24 endcase 25 end 26 27 always @ ( posedge clk, posedge reset ) 28 if ( reset ) present_ state <= INIT ; 29 else present_ state <= next_ state ; 30 31 endmodule 19/21

9 Verilog There are four Verilog code snippets in this section. For each code, first state whether or not there is a mistake. If there is a mistake explain how to correct it. Note: Assume that the behavior as described, is correct (a) 1 module mux2 ( input [1:0] i, output s, input z); 2 assign s= (z)? i [1]: i [0]; 3 endmodule 4 5 module one ( input [3:0] data, input sel1, input sel2, output z ); 6 wire [1:0] temp ; 7 8 mux2 i0 ( data [1:0], sel1, temp [0]); 9 mux2 i1 ( data [3:2], sel1, temp [1]); 10 mux2 final ( temp, sel2, z); 11 endmodule This code is not correct. It uses an ordered instantiation template where the ordering of the pins should correspond to the order they are declared. This itself is not wrong, but the input signals on the second position (sel1,sel2) connect to the output of the instance mux. Either the ordering of mux2 needs to be changed, or the ordering of the elements the instantiation. The better option would be to use a connect by name. (b) 1 module two ( input [3:0] a, input [0:3] b, output reg [3:0] z); 2 always @ ( *) 3 if (a == 0) 4 z={b[0],b[1],b[2],b [3]}; 5 else 6 z=a+b; 7 endmodule The code is syntactically correct. There is a lot of unnecessary code: using one input as [0:3] and the other as [3:0] is not the smartest choice, the entire code could be written as assign z=a+b, but syntactically the code is correct. 20/21

(c) 1 module three (clk, rst, a, b, c, z); 2 input a,b,c,clk, rst ; 3 output reg z; 4 reg q; 5 6 always @ (*) 7 begin 8 q <= a ^ b; 9 if ( c) q <= ~( a^b); 10 end 11 always @ ( negedge clk ) 12 if ( rst ) z= 1 b0; 13 else z= q; 14 endmodule The code is syntactically correct. Once again, it is a bit unconventional. One always block use blocking statements and the other non-blocking statements. Both would work, although it would probably be more efficient to write them otherwise. (d) 1 module four ( input [2:0] sel, output reg [5:0] z); 2 case ( sel ) 3 0: z = 6 b00_ 0000 ; 4 1: z = 6 b00_ 0001 ; 5 2: z = 6 b00_ 0011 ; 6 3: z = 6 b00_ 0111 ; 7 4: z = 6 b00_ 1111 ; 8 5: z = 6 b01_ 1111 ; 9 6: z = 6 b11_ 1111 ; 10 default : z= 6 b00_ 0000 ; 11 endcase 12 endmodule This code is not correct. The case statement is a sequential statement that needs to be within an always statement. 21/21