Fall 2010 EE457 Instructor: Gandhi Puvvada Date: 10/1/2010, Friday in SGM123 Name:

Similar documents
Fall 2010 EE457 Instructor: Gandhi Puvvada Date: 10/1/2010, Friday in SGM123 Name:

Spring 2012 EE457 Instructor: Gandhi Puvvada

Laboratory Exercise 6

2 ( = 46 points) 30 min.

Laboratory Exercise 6

Laboratory Exercise 6

Laboratory Exercise 2

Laboratory Exercise 6

Course Project: Adders, Subtractors, and Multipliers a

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X

Laboratory Exercise 2

Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder

VLSI Design 9. Datapath Design

Lecture 14: Minimum Spanning Tree I

Fall 2016 Instructor: Gandhi Puvvada. Thursday, 9/22/2016 (A 2H 50M exam) 05:30 PM - 08:20 PM (170 min) in THH101. Student s Last Name:

Course Updates. Reminders: 1) Assignment #13 due Monday. 2) Mirrors & Lenses. 3) Review for Final: Wednesday, May 5th

Minimum congestion spanning trees in bipartite and random graphs

Karen L. Collins. Wesleyan University. Middletown, CT and. Mark Hovey MIT. Cambridge, MA Abstract

Problem Set 2 (Due: Friday, October 19, 2018)

MAT 155: Describing, Exploring, and Comparing Data Page 1 of NotesCh2-3.doc

Algorithmic Discrete Mathematics 4. Exercise Sheet

EE 457 Midterm Summer 14 Redekopp Name: Closed Book / 105 minutes No CALCULATORS Score: / 100

Operational Semantics Class notes for a lecture given by Mooly Sagiv Tel Aviv University 24/5/2007 By Roy Ganor and Uri Juhasz

EE457. Note: Parts of the solutions are extracted from the solutions manual accompanying the text book.

A SIMPLE IMPERATIVE LANGUAGE THE STORE FUNCTION NON-TERMINATING COMMANDS

1 The secretary problem

Processor (I) - datapath & control. Hwansoo Han

A note on degenerate and spectrally degenerate graphs

Contents. shortest paths. Notation. Shortest path problem. Applications. Algorithms and Networks 2010/2011. In the entire course:

Trainable Context Model for Multiscale Segmentation

Routing Definition 4.1

Lecture 8: More Pipelining

EE457 Lab 4 Part 4 Seven Questions From Previous Midterm Exams and Final Exams ee457_lab4_part4.fm 10/6/04

See chapter 8 in the textbook. Dr Muhammad Al Salamah, Industrial Engineering, KFUPM

Lecture Outline. Global flow analysis. Global Optimization. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

DWH Performance Tuning For Better Reporting

EE 457 Midterm Summer 14 Redekopp Name: Closed Book / 105 minutes No CALCULATORS Score: / 100

CORRECTNESS ISSUES AND LOOP INVARIANTS

Advanced Encryption Standard and Modes of Operation

1 ( = 80 points) 50 min. LOAD INI I <= 0; J <= 1; LSA <= 1; DONE. COMP Compare M[I] with PREV

SIMIT 7. Profinet IO Gateway. User Manual

ADAM - A PROBLEM-ORIENTED SYMBOL PROCESSOR

Midterm 2 March 10, 2014 Name: NetID: # Total Score

CS201: Data Structures and Algorithms. Assignment 2. Version 1d

ES205 Analysis and Design of Engineering Systems: Lab 1: An Introductory Tutorial: Getting Started with SIMULINK

Analyzing Hydra Historical Statistics Part 2

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 5 Solutions: For More Practice

Chapter 4. The Processor

Drawing Lines in 2 Dimensions

Spring 2013 EE201L Instructor: Gandhi Puvvada. Time: 7:30-10:20AM SGM124 Total points: Perfect score: Open-Book Open-Notes Exam

A Multi-objective Genetic Algorithm for Reliability Optimization Problem

DIGITAL LOGIC WITH VHDL (Fall 2013) Unit 4

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Universität Augsburg. Institut für Informatik. Approximating Optimal Visual Sensor Placement. E. Hörster, R. Lienhart.

Representations and Transformations. Objectives

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

arxiv: v1 [cs.ds] 27 Feb 2018

Floating Point CORDIC Based Power Operation

Exercise 4: Markov Processes, Cellular Automata and Fuzzy Logic

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. The Processor

Topics. FPGA Design EECE 277. Number Representation and Adders. Class Exercise. Laboratory Assignment #2

The MIPS Processor Datapath

Mid-term review ECE 161C Electrical and Computer Engineering University of California San Diego

Select Operation (σ) It selects tuples that satisfy the given predicate from a relation (choose rows). Review : RELATIONAL ALGEBRA

Hardware-Based IPS for Embedded Systems

Topic #6. Processor Design

ECE260: Fundamentals of Computer Engineering

Building a Compact On-line MRF Recognizer for Large Character Set using Structured Dictionary Representation and Vector Quantization Technique

Increasing Throughput and Reducing Delay in Wireless Sensor Networks Using Interference Alignment

SIMIT 7. Component Type Editor (CTE) User manual. Siemens Industrial

ECE369. Chapter 5 ECE369

CSEN 601: Computer System Architecture Summer 2014

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

(12) Patent Application Publication (10) Pub. No.: US 2011/ A1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Analysis of slope stability

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Chapter S:II (continued)

Polygon Side Lengths NAME DATE TIME

Cutting Stock by Iterated Matching. Andreas Fritsch, Oliver Vornberger. University of Osnabruck. D Osnabruck.

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Optimizing Synchronous Systems for Multi-Dimensional. Notre Dame, IN Ames, Iowa computation is an optimization problem (b) circuit

Chapter 5: The Processor: Datapath and Control

Distribution-based Microdata Anonymization

Chapter 13 Non Sampling Errors

A Practical Model for Minimizing Waiting Time in a Transit Network

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

CSE 378 Midterm 2/12/10 Sample Solution

The Association of System Performance Professionals

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline

Systems Architecture

Chapter 4. The Processor Designing the datapath

Transcription:

Fall 2010 EE457 Intructor: Gandhi Puvvada Quiz (~ 10%) Date: 10/1/2010, Friday in SGM123 Name: Calculator and Cadence Verilog guide are allowed; Cloed-book, Cloed-note, Time: 12:00-2:15PM Total point: Student ID: Do NOT write any tudent ID or SSN Perfect core: / 1 ( point) min. 1.1 A and B are negative number repreented in 2 complement notation. A i 16-bit in ize and B i 8-bit in ize. For A to be equal to B, all the 8 A bit (A[14:7]) hall be (all zero / all one) (and / or) the ret of the 7 A bit (A[6:0]) hall be equal to the correponding 7 B bit (B[6:0]). For A to be le than B (like in -3 i le than -2), it i enough if any of the 8 A bit (A[14:7]) i a (zero / one). On the other hand, if all thoe 8 A bit (A[14:7]) are (all zero / all one), then, for A to be le than B, we need the 7-bit A (A[6:0]) to be (lower / higher) than the correponding 7-bit B (B[6:0]). Here we compare the two 7-bit number treating them a (igned / unigned) 7-bit number. 1.2 Mealy machine deign: Browe through the tate diagram on the next page firt. Here, we perform erial inpection of bit of A (or bit of A and B) to compare them. A i a 16-bit number, but for thi part of the quetion, B can be any where between 8-bit to 16-bit number. Here, we are allowed to inpect at a time (in a clock) one bit of A (A[I]) and (imultaneouly, if needed) one bit of B (B[J]). The I and the J are indice into the A and B repectively. I i initialized to 15. J i initiated to Jini (Jinitial) which can be anywhere between 7 through 15 (correponding to the B ize of 8-bit to 16-bit ). You will be needing to compare I and J to ee when they are equal. Note: Your TA ay that,after STAT i given, you hould not take more than 16 clock. After all, A i 16 bit and B i at mot 16 bit. So decrement I and/or J a oon a poible! 1.2.1 Suppoe B i an 8-bit number. State an example of A and B (in binary) uch that the concluion i drawn in the leat number of clock. A = ; B = ; How many clock are pent in INS_AI_BJ tate for the above number? State an example of A and B (in binary) uch that the concluion i drawn in the mot number of clock. A = ; B = ; How many clock are pent in INS_AI_BJ tate for the above number? 1.2.2 Since A and B are negative, there i no point looking at A[15] and B[Jini]. True / Fale 1.2.3 Since A ize i fixed at 16-bit, if A i equal to B (equal in value, but not necearily in ize), it take the ame number of clock to compare A and B, irrepective of the ize of B. True / Fale October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 1 / 9 C Copyright 2010 Gandhi Puvvada

1.2.4 State diagram: Complete the 7 miing tate tranition condition for 7 tranitition arrow. Complete TL for the three tate. In tate C_I_J (compare I and J), you would like to decrement (I / J / I or J / I and J / I unconditionally and J conditionally / neither I nor J). Crucial point: You want to arrive in INS_AI_BJ with the right combination of I and J whether you arrive from C_I_J or INS_AI. If B i an 8-bit number (B[7:0]), the firt pair to be compared in INS_AI_BJ i (A[7],B[7] / A[6],B[6] / neither). And if B i a 16-bit number (B[15:0]), the firt pair to be compared in INS_AI_BJ i (A[15],B[15] / A[14],B[14] / neither). ESET I STAT Initial A <= Aini; B <= Bini; I <= 15; J <= Jini; STAT C_I_J Compare I, J INS_AI Inpect A[I] INS_AI_BJ Inpect A[I],B[J] D_AGTB Done, A greater than B D_AEQB Done, A equal to B D_ALTB Done, A le than B October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 2 / 9 C Copyright 2010 Gandhi Puvvada

2 ( point) min. State diagram coding in Verilog (you may refer to the Cadence (Eperan) Verilog guide): Conider the following partial flowchart and the correponding partial tate diagram along with the Verilog code egment written by four tudent. T T S2 T S0 A? F B? F C? if (A) tate <= ; if (B) tate <= S2; if (C) tate <= S3; tate <= S0; #1 if (!A &&!B &&!C) tate <= S0; if (!A &&!B && C) tate <= S3; if (!A && B) tate <= S2; #2 // there i no if A tate <= ; A S3 F A B C S0 A B C A B S3 S2 if (A) tate <= ; if (!A && B) tate <= S2; if (!A &&!B && C) tate <= S3; if (!A &&!B &&!C) tate <= S0; #3 #4 if (A) tate <= ; if (!A && B) tate <= S2; if (!A &&!B && C) tate <= S3; if (!A &&!B &&!C) tate <= S0; 2.1 Notice that code #3 i imilar to code #1, except that code #3 i perhap unnecearily (but harmlely) more verboe (like my wife! don t tell her). Code #4 i formed by removing the three occurrence of "" in code #3. Code #2 i eentially the revere ordering of code #3. Write "ight" or "Wrong" below for each. Code #1 ; Code #2 ; Code #3 ; Code #4 ; 2.2 Now conider the incomplete Code #5 on the ide along with the Karnaugh map repreentation of the deired tate tranition. If all the three, A, B, C, are true, tate get aigned with S3, get reaigned with S2 and further reaigned with. Since the lat aignment prevail over the prior aignment, in thi cae, tate finally goe to. Note that there i no if claue leading back to S0. Complete the "if" condition in code #5. Ele tate reaon why it can not be completed. #5 if ( ) // write either B or C tate <= S3; if ( ) // write either B or C tate <= S2; if (A) tate <= ; A BC 00 01 B 11 10 0 A 1 S0 S3 S2 S2 C October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 3 / 9 C Copyright 2010 Gandhi Puvvada

2.3 Combinational logic coding: The reult i either the um SUM of A and B (A+B) or the difference DIFF X minu Y (X-Y), deping on which ever i greater. Auming that all needed declaration are already made appropriately (a reg or wire), complete the alway block below. A B X Y + - SUM DIFF P Q Q>P I1S Y GT_Int GT internal ignal Out of the three SUM, DIFF,, we need to aign uing (blocking / non-blocking) aignment operator only, where a we can aign uing any one of the two operator. DIFF_GT alway @( ) SUM A + B DIFF X - Y GT_Int DIFF_GT GT_Int if (GT_Int) Sequential logic coding: The above deign i modified to have regitered output. Thee regiter hall be updated only in COMP tate. Complete the TL (datapath operation) in the COMP tate (COMP cae branch). The cae tatement i in an alway block( alway @ (poedge ) ). A B X Y + - SUM DIFF GT_Int I1S Y I1S Y D Q [7:0] [7:0] P Q Q>P I1S Y D Q QCOMP (STATE == COMP) one-hot CU notation S DIFF_GT Out of the three SUM, DIFF,,we need to aign uing (blocking / non-blocking) aignment operator only, where a we S COMP:// COMP tate cae branch SUM A + B DIFF X - Y GT_Int DIFF_GT GT_Int if (GT_Int) 3 ( 10 point) 7 min. eproduced below i a Fall 2008 Quiz quetion together with it anwer. Fall 2008 Quetion Number ytem, adder deign: You are looking for a 3-bit adder/ubtractor, which can perform addition or ubtraction of igned or unigned 3-bit number and produce appropriate um/difference together with overflow information. You are given the following 4-bit adder/ubtractor chip. Your lab partner connected it to A[2:0], B[2:0], and SUM[2:0] a hown below. He i not ure whether thi i o far correct and alo he doe not know how to proceed with X0 and Y0 (i.e. whether to connect 0,0 or 0,1, or 1,0, or 1,1). October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 4 / 9 C Copyright 2010 Gandhi Puvvada

Fall 2008 Q & A Carry Carry awcarry B2 B1 B0 Y3 Y2 Y1 Y0 A2 A1 A0 X3 X2 X1 X0 S3 S2 S0 SUM2 SUM1 SUM0 N.C no connection V C0 ADD/SUB Do you agree with your partner partial deign? Agree / Diagree If you agree, tate what choice work for (X0,Y0) among {(0,0), (0,1), (1,0), (1,1)}. If you diagree, tate reaon. Anwer: {(0,0),(1,0)}. Note: (0,1) doe not convey the needed 1 to C1 for ubtraction. (1,1) convey unneceary 1 to C1 during addition. Same quetion a above, but ince 2 pin i broken, we are forced to connect a hown below. awcarry A2 B2 Y3 Y2 Y1 Y0 X3 X2 X1 X0 S3 S2 S0 SUM2 V N.C no connection A1 B1 SUM1 A0 B0 SUM0 C0 ADD/SUB Baically, the full-adder #2 hould act like a tranparent medium conveying the incoming C2 a outgoing C3. Select one or more choice of connecting to (X2, Y2). (i) (0, 0) (ii) (1, 0) (iii) (0, ADD/SUB) (iv) (0, inverted ADD/SUB) (v) ( ) // fill-in 4 ( point) min. Performance: 4.1 We know frequency of occurrence in the dynamic trace can eaily be different from percentage of time pent. For example, floating point intruction may occur only 10% in frequency but may take 20% of the execution time a they are long. In the following table, there are two unknown, F and T. Find them. Hint: Conider 100 intruction, clock pent on them, percentage of clock pent on C category. Category CPI Frequency Time Spent A B C 2 5 10 50% (50-F)% F % T % (75-T)% 25% 4.2 Without changing ISA or frequency, by improving non-floating point intruction by reducing the number of clock taken to execute them, you will improve (circle all applicable): (A) ET (Execution Time) (B) elative MIPS (C) Native MIPS (D) MFLOP Without changing ISA or frequency, by improving floating point intruction by reducing the number of clock taken to execute them, you will improve (circle all applicable): (A) ET (Execution Time) (B) elative MIPS (C) Native MIPS (D) MFLOP October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 5 / 9 C Copyright 2010 Gandhi Puvvada

Without changing frequency, by plitting a non-floating point intruction, which wa taking 12 clock, into two non-floating point intruction taking 6 clock each, you will caue (i) ET (Execution Time) (to go up/to go down/to remain the ame) (ii) elative MIPS (to go up/to go down/to remain the ame) (iii) Native MIPS (to go up/to go down/to remain the ame) (iv) MFLOP (to go up/to go down/to remain the ame) Without changing frequency, by plitting a floating point intruction, which wa taking 12 clock, into two floating point intruction taking 6 clock each, you will caue (i) ET (Execution Time) (to go up/to go down/to remain the ame) (ii) elative MIPS (to go up/to go down/to remain the ame) (iii) Native MIPS (to go up/to go down/to remain the ame) (iv) MFLOP (to go up/to go down/to remain the ame) 5 ( point) min. Single-cycle CPU: eproduced on the next two page i the ingle-cycle CPU block diagram, unmodified on the firt page for your reference and partially modified for your completion on the next page to upport thi new load word intruction, lw_addu, which i defined a follow. regular lw: lw rt, offet(r) ; (rt) <= M[offet = (r)] new lw_add: lw_addu rt, offet(r) ; (rt) <= (rt) + M[offet = (r)] After reading the content of the memory, intead of imply depoiting the read-content into rt, it add the read-content to rt and depoit the um into rt. An additional adder after the memory could be ued but Mr. Trojan told u to ue the BTA (Branch Target Addre) calculating adder a it i not required to calculate BTA when thi lw_add i executing. Ue two multiplexer to deliver the memory content and the rt content to the BTA_adder and another multiplexer to deliver the BTA adder reult a WD (write data) to the regiter file. Control all the three multiplexer uing a new control line called lau (hort for lw_addu). 5.1 Complete the block diagram on next to next page and the control ignal table below. Intruction egdt ALUSrc Memtoreg egwrite Memead MemWrite Branch ALUOp1 ALUop0 lau -format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 w X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 v-lw lw_addu 1 0 1 1 1 0 0 0 1 5.2 If the original lw intruction needed 40n (10n to fetch the lw intruction, 5n to fetch ource regiter, 10n to calculate effective addre, 10n to read the memory, and 5n to write back to the regiter file). What i your etimate of time needed by lw_addu? October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 6 / 9 C Copyright 2010 Gandhi Puvvada

5.3 The hardware engineer implemented thi new lw_addu intruction in the new verion of the ingle-cycle CPU but compiler wa not redeigned to utilize thi new intruction. So frequency of occurrence of thi intruction in the dynamic execution trace i 0% (zero percent). The ET (execution time) of the bench mark (goe up / goe down / remain the ame). If you aid, goe up or goe down, do you have adequate data to calculate the factor of performance change? Ye / No. If ye, the performance ha changed by 5.4 new lw_add:lw_addu rt, offet(r) ; (rt) <= (rt) + M[offet = (r) How many ource regiter? how many detination regiter? Intercept Single Cycle CPU (unmodified) No need to write on thi page Intercept October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 7 / 9 C Copyright 2010 Gandhi Puvvada

Single Cycle CPU (modify thi a needed) October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 8 / 9 C Copyright 2010 Gandhi Puvvada

6 ( point) min. Multi-cycle CPU: 6.1 While PCWrite (i needed / in t needed) in the ingle-cycle CPU, it (i needed / in t needed) in the multi-cycle CPU. Chooe an appropriate PC-egiter deign from the following 3 choice. Choice 1 i the cheapet and #3 i the mot expenive. You hould chooe cheaper deign if it atifie your need. #1 PC #2 PC #3 D Q [31:0] [31:0] D Q [31:0] [31:0] I1S Y D Q [31:0] [31:0] PC Level-enitive Latch Edge-enitive bare regiter (no recirculating mux) Edge-enitive regiter with recirculating mux Your choice of PC for the ingle-cycle CPU : Your choice of PC for the multi-cycle CPU : 6.2 Temporary regiter are needed if information i produced in one clock and conumed in a later clock. However, we could avoid temporary regiter (for example, MD Memory Data egiter) by Temporary regiter have Temp_egWrite control input o that the CU can tell regiter when to write. The exception to thi rule i 6.3 Computing BTA (for an aumed beq intruction), during the decode tate, i (better/wore) than computing EA (effective addre) (for an aumed lw/w intruction) becaue Blank Space (cratch pad area) October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 9 / 9 C Copyright 2010 Gandhi Puvvada