EECS 151/251A: SPRING 17 MIDTERM 2 SOLUTIONS

Similar documents
EECS 151/251A: SRPING 2017 MIDTERM 1

ECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

University of California, Berkeley College of Engineering Department of Electrical Engineering and Computer Science

CS 351 Exam 2 Mon. 11/2/2015


ENCM 369 Winter 2013: Reference Material for Midterm #2 page 1 of 5

Agenda. Recap: Adding branches to datapath. Adding jalr to datapath. CS 61C: Great Ideas in Computer Architecture

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

ICS 233 COMPUTER ARCHITECTURE. MIPS Processor Design Multicycle Implementation

Programmable Machines

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

CS3350B Computer Architecture MIPS Instruction Representation

Programmable Machines

Chapter 4. The Processor

MIPS ISA and MIPS Assembly. CS301 Prof. Szajda

B649 Graduate Computer Architecture. Lec 1 - Introduction

Computer Architecture. The Language of the Machine

TSK3000A - Generic Instructions

Concocting an Instruction Set

Computer Architecture

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

CS61CL Machine Structures. Lec 5 Instruction Set Architecture

Chapter 4. The Processor

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

EE108B Lecture 3. MIPS Assembly Language II

A Processor. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter , 4.1-3

MOTIVATION. B649 Parallel Architectures and Programming

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

Grading: 3 pts each part. If answer is correct but uses more instructions, 1 pt off. Wrong answer 3pts off.

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07

CS654 Advanced Computer Architecture. Lec 1 - Introduction

CS Computer Architecture Spring Week 10: Chapter

Anne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Laboratory Exercise 6 Pipelined Processors 0.0

Lecture 2. Instructions: Language of the Computer (Chapter 2 of the textbook)

A General-Purpose Computer The von Neumann Model. Concocting an Instruction Set. Meaning of an Instruction. Anatomy of an Instruction

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

MIPS Reference Guide

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Reduced Instruction Set Computer (RISC)

M2 Instruction Set Architecture

The MIPS Instruction Set Architecture

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

Midterm. Sticker winners: if you got >= 50 / 67

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

Mips Code Examples Peter Rounce

ECE 154A Introduction to. Fall 2012

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

CMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM

Instructions: MIPS ISA. Chapter 2 Instructions: Language of the Computer 1

Concocting an Instruction Set

Format. 10 multiple choice 8 points each. 1 short answer 20 points. Same basic principals as the midterm

MIPS Instruction Set

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont )

A Processor! Hakim Weatherspoon CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter , 4.1-3

Reduced Instruction Set Computer (RISC)

Binvert Operation (add, and, or) M U X

Reminder: tutorials start next week!

1/26/2014. Previously. CSE 2021: Computer Organization. The Load/Store Family (1) Memory Organization. The Load/Store Family (2)

Instruction Set Architecture part 1 (Introduction) Mehran Rezaei

A Model RISC Processor. DLX Architecture

Computer Architecture Instruction Set Architecture part 2. Mehran Rezaei

MIPS%Assembly% E155%

3. Instruction Set Architecture The MIPS architecture

ECE 2035 Programming HW/SW Systems Fall problems, 7 pages Exam Two 23 October 2013

Chapter 4. The Processor Designing the datapath

Overview. Introduction to the MIPS ISA. MIPS ISA Overview. Overview (2)

Instruction Set Principles. (Appendix B)

CS 61C: Great Ideas in Computer Architecture Introduction to Hardware: Representations and State

Processor (I) - datapath & control. Hwansoo Han

More CPU Pipelining Issues

Review: Abstract Implementation View

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Concocting an Instruction Set

CSEN 601: Computer System Architecture Summer 2014

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Question 0. Do not turn this page until you have received the signal to start. (Please fill out the identification section above) Good Luck!

CSCI 402: Computer Architectures. Instructions: Language of the Computer (3) Fengguang Song Department of Computer & Information Science IUPUI.

ECE Exam I February 19 th, :00 pm 4:25pm

Topic #6. Processor Design

Many ways to build logic out of MOSFETs

CPU Pipelining Issues

CS 4200/5200 Computer Architecture I

Introduction. Datapath Basics

Instructions: Language of the Computer

101 Assembly. ENGR 3410 Computer Architecture Mark L. Chang Fall 2009

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

Midterm I March 3, 1999 CS152 Computer Architecture and Engineering

Lecture 11: MOS Memory

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

EE108B Lecture 2 MIPS Assembly Language I. Christos Kozyrakis Stanford University

CPU Organization (Design)

Lecture 2: RISC V Instruction Set Architecture. Housekeeping

Chapter 4. The Processor

Midterm I October 6, 1999 CS152 Computer Architecture and Engineering

Transcription:

University of California College of Engineering Department of Electrical Engineering and Computer Sciences J. Rabaey G. Alexandrov, N. Narevsky, V. Iyer MoWe 4-5:30pm Mo, Oct. 2, 6:00-7:30pm EECS 151/251A: SPRING 17 MIDTERM 2 SOLUTIONS NAME Last First SID Problem 1 (16): Problem 2 (17): Problem 3 (15): Total (48) EECS 151/251A: FALL 2017 MIDTERM 2 1

[PROBLEM 1] Logic and Wire optimization (16 + 1 Pts) a) A designer at a memory company is in charge of developing the circuitry to drive the wordline of an SRAM module as fast as possible. An initial design is shown below. It consists of an inverting driver and a wordline wire connecting to 256 SRAM cells. The contribution of each cell to the wordline can be modeled as a series resistance R C of 5 W and a load capacitance C C of 2fF. Assume at first that the driver is a minimum-sized inverter with an input capacitance C D of 5fF and a driver resistance R D of 10 KW. You may assume that g = 1. In a first step, determine the worst-case propagation delay of the circuit (from In to any cell connected to the wordline) (5 Pts) Driver In WL cell cell cell cell cell cell R C C C tp = 0.69 Rd * (Cd + 256 Cc) + 0.38 * (256 * Rc) * (256*Cc) = 3.57 + 0.25 nsec = 3.82 nsec 1 pt ln(2) 1 pt (Cd+256Cc) 1 pt distributed wire model (e.g. 0.38) 2 pt correct wire delay (proportional to 256 2 ) b) One way to reduce the delay is to optimize the driver. Determine the optimal number of stages to minimize the delay. The function of the driver cannot be changed that is, it has to remain inverting. (2 Pts) F = (256 * 2 ff) / (5 ff) = 102.4 Optimal # of stages = log4(f) = 3.34 Given that the circuit has to remain inverting, 3 stages is the optimal answer 1 pt log4(f) calculation 1 pt 3 stages EECS 151/251A: FALL 2017 MIDTERM 2 2

c) For the chosen number of stages, size each stage so that the delay is minimized, and determine the new value of the delay. (3 Pts) Optimal sizing factor: f = F^(1/3) = 4.68 Hence: stage 1: 1; stage 2: 4.68; stage 3: 21.88 Delay: Term 2 (disturbed array) is unchanged. The first term (driver) is reduced to: 0.69* 3 * 10kOhm * (5 + 4.68 * 5) ff = 0.59 nsec The total delay is: 0.59 nsec + 0.25 nsec = 0.84 nsec While the first factor still dominates the are now almost equal 2 pt f and sizing 1 pt - delay d) If you did the sizing right, you may observe that the overall delay is now dominated by the memory array. Our designer has come up with two approaches to address this. In a first approach, he decides to drive the WL form both sides (see Figure). Determine how this impacts the worst-case delay and determine the resulting value. To make things easy, keep the drivers the same as in part c) (3 Pts) Driver Driver In cell WL cell cell cell cell cell In R C C C Driving from both sides reduces the effective distributed length of the memory array by a factor of 2. In a distributed system this effectively reduces the delay by a factor of 4. The overall delay is now: tp = 0.59 nsec + 0.25nsec/4 = 0.65 nsec 1 pt driver delay 2 pt wire delay reduced by factor of 4 EECS 151/251A: FALL 2017 MIDTERM 2 3

e) Another approach that he is considering is to put a metal wire (called a bypass) with lower resistance (located on the higher levels of the interconnect stack) in parallel with the wordline (see Figure). The bypass wire connects to the wordline every 64 cells (4 connections in total, or one connection every 64 cells). You may assume that the capacitance of bypass wire per cell is equal to the wordline, but that its resistance is 10 times lower. Determine again the impact on delay. (3 Pts) Driver Bypass wire In WL cell cell cell cell cell cell The longest connection now would be 3 * 64 cells over the bypass wire + 32 memory cells. This translates into a delay: 0.38 * ( 192 * (Rc/10) * 192* Cc) + (32* Rc) * 32*Cc) = 0.018 nsec The total delay is now : 0.61 nsec 2 pt correct critical path 1 pt delay calculation f) Bonus question: Since most of the wordlines of a memory are always at 0, and only one line goes high with very cycle, it seems that the low-to-high transition is the most important one and making that one faster is the primary goal. Discuss qualitatively how you would change the driver characteristics to exploit this feature. (1 Pt) Speed up the 0->1 transition by making the PMOS of the final driver stage larger than the normal ratio, actually reducing the switching threshold of the device. EECS 151/251A: FALL 2017 MIDTERM 2 4

[Problem 2] Logical Effort (17 Pts) Problem 2: Logical Effort Solutions a) (4 pts) Implement the function Out = A B + C D + A D E + F with a complex static CMOS gate. Assuming that for this process R. = / 0 R 2 (i.e. for the same width, a PMOS has one third the resistance of an NMOS). Size your gate such that the worst-case pull up resistance is equal to the worst-case pull-down resistance, assuming that the minimum transistor width is 1. 1 pt correct pull-up functionality 1 pt correct pull-down functionality 1 pt correct pull-up sizing (multiple answers) 1 pt correct pull-down sizing (multiple answers) ½ pt taken off if inverter had transistor size 1/3 1 pt taken off if p-n ratio was flipped b) (3 pts) What is the logical effort of this gate from the A, C and E inputs? For input A: LE 4 = 12 C R 4 C R = 21 4 = 5.25 For input C: LE 4 = 10 C R 4 C R = 10 4 = 2.5 For input E: LE 4 = 13 C R 4 C R = 13 4 = 3.25 1 pt each LE EECS 151/251A: FALL 2017 MIDTERM 2 5

c) (2 pts) Using the same gate as previous, connect the A and B inputs together without resizing any of the transistors. What is the logical effort for this new input? Calculate the logical effort for both a rising transition as well as a falling transition Rising transition: LE = 31 C R 4 C R = 31 4 = 7.75 Falling transition: LE = 31 C (7R/8) 4 C R = 15 4 = 6.78125 1 pt correct capacitance 1 pt correct pull-up resistance d) (4 pts) Using the gate from part a) as well as other standard logic gates, what is the path effort of the following chain of gates? All gates are also implemented in this same technology with R. = / 0 R 2. LE C = 3.25, LE 242DE = 7/4, LE 242DF = 13/4, LE 2GHE = 5/4, LE 2GH0 = 6/4, B = 4 PE = LE C LE 242DE B LE 2GHE LE J2K LE 242DF LE 2GH0 F = 3.25 7 4 4 5 4 1 13 4 6 4 120 = 16635.6 1 pt all LE. -1/4 for each one wrong 1 pt correct branching 1 pt correct PE 1 pt - calculation e) (1 pts) What is the EF/stage that minimizes the delay? L EF = 16635.6 = 4.008 4 EECS 151/251A: FALL 2017 MIDTERM 2 6

f) (3 pts) Compute the sizes for the gates in the chain to minimize the delay Size Value a 1.23 b 0.702 c 2.24 d 8.96 e 11.02 f 29.33 a 1 LE C = EF => a = 4 3.25 = 1.23 b a 4 LE 242DE = EF => b = 4 1.23 1.75 4 = 0.702 c b LE 2GHE = EF => c = 4.702 1.25 = 2.24 d c LE J2K = EF => d = 4 2.24 = 8.96 1 e d LE 242DF = EF => e = 4 8.96 3.25 = 11.02 f e LE 2GH0 = EF => f = 4 11.02 = 29.33 1.5 C VWXY f LE J2K = EF => 120 29.33 1 4 ½ pt each size EECS 151/251A: FALL 2017 MIDTERM 2 7

[Problem 3] Processor Architecture (15 pts) a) (3 pts) Translate the following snippet of C into RISC-V assembly. Assume that str is stored in memory and register x10 contains its base address. Recall that the char datatype is 1 byte. It is OK if you manipulate the str pointer itself. char* str; for (i = 0; str[i]!= 0; i++) { str[i] = str[i] - 32; } There are a few ways to compile this snippet; here's one. loop: lb x1, 0(x10) ; str[i] beq x1, x0, done ; str[i]!= 0 addi x1, x1, -32 ; str[i] - 32 sb x1, 0(x10) ; str[i] = str[i] - 32 addiu x10, x10, 1 ; advance str pointer jal loop done: nop 1 pt branch on loaded byte = 0, jump for loop 1 pt addi -32 and store back into mem 1 pt pointer arithmetic b) (2 pts) Assume these instructions are executed on a single cycle RISC-V processor with asynchronous read IMEM, DMEM, and register file. The DMEM and register file have synchronous writes. You can assume that char* str = {'a', '\0'} (i.e. the string is just a single character followed by a null terminator). How many clock cycles does your code from part a) take to execute? What is the CPI (clock cycles per instruction)? Since this is a single-cycle processor each instruction takes a single cycle. Since str is only 1 character long, lines 1-6 will be executed, then lines 1-2, after which the snippet is complete. The sequence of instructions takes 8 clock cycles to complete and the CPI is 1.0. 1 pt # cycles match code in (a) 1 pt cpi = 1 EECS 151/251A: FALL 2017 MIDTERM 2 8

c) (3 pts) Let's extend the RISC-V ISA with a custom instruction called sbi. sbi rs1, rs2, imm will perform Mem[Reg[rs2]] <= Reg[rs1] + imm What additional hardware and control signals are needed to implement this instruction? (describe verbally, diagram not needed) The added hardware should be minimal. Please be concise. The ALU can be used to compute the DMEM write data (Reg[rs1] + imm) The DMEM address can be fed straight from Reg[rs2] The only new hardware needed are two muxes o To drive the DMEM write data from the ALU output rather than Reg[rs2] o To drive the DMEM address from Reg[rs2] rather than the output of the ALU Need two 1-bit control signals for these muxes which go high when sbi is being executed 1 pt no new adder HW 2 pt 1 for each mux d) (2 pts) Write the C snippet in assembly again, but using the sbi custom instruction. You can just rewrite the lines of your original code that would change. Replace lines 3 and 4 from part a) with sbi x1, x10, -32 e) (5 pts) In the fabrication of any digital circuit, there may be manufacturing defects. One type of defect involves a signal being shorted to GND or VDD (stuck-at-zero or stuck-at-one). For the following stuck signals, specify the RISC-V instructions that will no longer work. Assume the following datapath and control signals. EECS 151/251A: FALL 2017 MIDTERM 2 9

i. WBSel[1:0] = 2 d1 Memory read operations won't work: lw, lh, lhu, lb, lbu. Also the link operation of jumps won't work: jal, jalr. 1 pt loads 1 pt jumps ii. ASel is stuck-at-zero auipc will no longer work. branch and jal address calculations won't work. jalr will still work since the new PC is calculated as rs1 + imm. 1 pt auipc 1 pt branch and jal ½ pt taken off if jalr isn t specified as working iii. BSel is stuck-at-one Any instructions that use rs2 in the ALU won't work. This includes all R-type instructions. The explicit list is: add, slt, sltu, and, or, xor, sll, srl, sub, sra. Note that branches will still work as the branch computation unit is separate from the ALU. Also note stores will still work since Reg[rs2] is directly passed to the DMEM. To get credit for this question, you should specify the explicit instruction list or claim that R-type instructions in general won t work. 1 pt R-type instructions EECS 151/251A: FALL 2017 MIDTERM 2 10