University of California, Berkeley College of Engineering Department of Electrical Engineering and Computer Science

Similar documents
Midterm I March 3, 1999 CS152 Computer Architecture and Engineering

The MIPS Processor Datapath

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

Processor (I) - datapath & control. Hwansoo Han

CPU Organization (Design)

CSEN 601: Computer System Architecture Summer 2014

CENG 3420 Lecture 06: Datapath

Midterm I October 6, 1999 CS152 Computer Architecture and Engineering

Chapter 4. The Processor

CS3350B Computer Architecture Quiz 3 March 15, 2018

Chapter 4. The Processor

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

CS Computer Architecture Spring Week 10: Chapter

CS 351 Exam 2 Mon. 11/2/2015

Review: Abstract Implementation View

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

Computer Organization and Structure

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

ECE260: Fundamentals of Computer Engineering

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

LECTURE 5. Single-Cycle Datapath and Control

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. The Processor Designing the datapath

CS61C : Machine Structures

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

ECE232: Hardware Organization and Design

Computer Architecture I Midterm I (solutions)

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007

RISC Processor Design

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

The Processor: Datapath & Control

ECE331: Hardware Organization and Design

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

Systems Architecture

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

Midterm I March 1, 2001 CS152 Computer Architecture and Engineering

Midterm Questions Overview

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

MIPS-Lite Single-Cycle Control

Lecture #17: CPU Design II Control

EECS 151/251A: SPRING 17 MIDTERM 2 SOLUTIONS

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

Prerequisite Quiz January 23, 2007 CS252 Computer Architecture and Engineering

ECE369. Chapter 5 ECE369

Chapter 4. The Processor

University of California, Berkeley College of Engineering

Midterm Questions Overview

Machine Organization & Assembly Language

Laboratory 5 Processor Datapath

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

CSE 378 Midterm 2/12/10 Sample Solution

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

Review of Last Lecture. CS 61C: Great Ideas in Computer Architecture. MIPS Instruction Representation II. Agenda. Dealing With Large Immediates

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Chapter 5 Solutions: For More Practice


COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Pipeline design. Mehran Rezaei

UC Berkeley CS61C : Machine Structures

The overall datapath for RT, lw,sw beq instrucution

Grading: 3 pts each part. If answer is correct but uses more instructions, 1 pt off. Wrong answer 3pts off.

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Chapter 5: The Processor: Datapath and Control

Lecture 7 Pipelining. Peng Liu.

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

CS222: Processor Design

Final Exam Spring 2017

T F The immediate field of branches is sign extended. T F The immediate field of and immediate (andi) and or immediate (ori) is zero extended.

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

University of California, Berkeley College of Engineering

Introduction. Datapath Basics

Ch 5: Designing a Single Cycle Datapath

University of California, Berkeley College of Engineering

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

A Processor. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter , 4.1-3

Single Cycle CPU Design. Mehran Rezaei

Transcription:

University of California, Berkeley College of Engineering Department of Electrical Engineering and Computer Science Spring 2000 Prof. Bob Brodersen Midterm 1 March 15, 2000 CS152: Computer Architecture This midterm consists of four problems, each of which has multiple parts, so budget your time accordingly. The exam is closed-book, but calculators and one sheet of notes are allowed. Good luck! Name SOLUTIONS SID Discussion 1 2 3 4 Total

Name Page 1 of 8 Problem 1: Critical Path and Delay (25 points) Throughout this problem, use the simple linear delay model presented in class. For the circuit below, assume the following delay parameters: NAND: t plh = 0.5ns, t phl = 0.5ns, t plhf = 0.002ns/fF, t phlf = 0.002ns/fF Input capacitance: 100fF Inverter: t plh = 0.2ns, t phl = 0.2ns, t plhf = 0.001ns/fF, t phlf = 0.001ns/fF Input capacitance: 50fF Wiring Capacitance: (Equal for all nodes) 5fF X Y F Z a) What is the worst case delay? Assume there is no delay at the inputs X, Y and Z. The equation for the worst case delay is as follows:.2ns+105ff*.001ns/ff (INVERTER).5ns+205fF*.002ns/fF (NAND1).5ns+205fF*.002ns/fF (NAND2).5ns+105fF*.002ns/fF (NAND3 +.5ns+5fF*.002ns/fF (NAND4) = 3.345 ns Note: There is no delay at the input nodes, and remember to include fan-out and wiring delay! b) Now assume that you want to generate a symbol for the circuit in part (a). Determine the following parameters for your symbol: t plh, t phl, and the load dependant delay (in ns/ff). X Y F Z First the propagation delays: t plh = t phl, = 3.345 ns. This is the same as the critical path from the last part. For the load dependent delay, since we only have a single NAND driving the output, it is the same as the NAND itself: 0.002 ns/ff.

Name Page 2 of 8 Now consider the following circuit and the following parameters: Register: t clk-to-q = 0.6ns, t setup = 0.5ns, t hold = 0.2ns NAND: t plh = 0.5ns, t phl = 0.5ns c) What is the maximum frequency at which this circuit will operate correctly? Ignore any load dependent delay. The critical path is from either register through all three NANDs, which corresponds to the clock-to-q of the first registers plus the propagation delay of the three NANDs plus the setup time of the last register. Therefore, F max = 1/(0.6ns+1.5ns+0.5ns) = 385 MHz d) If the clock signals are skewed so that φ 1 arrives 0.2ns before φ 2 which arrives 0.2ns before φ 3 (such that φ 1 - φ 2 = φ 2 -φ 3 = 0.2ns), what is the maximum frequency at which this circuit will operate correctly? The key is the direction of the skew. In this case, since φ 2 comes after φ 1, the critical path needs to wait for φ 2. However, since the skew is set up such that the clock reaches the inputs earlier than the output register, we can actually increase our F max, since the skew buys us more time. (Draw the timing diagram to prove for yourself.) Therefore, F max = 1/(0.6ns+1.5ns+0.5ns-0.2ns) = 416 MHz e) Now assume that φ 1 and φ 2 arrive at the same time, and that φ 3 arrives later. What is the maximum tolerable skew to ensure that there are no hold time violations? We must have clock-to-q + shortest delay > skew + t hold To prevent hold time violations, we need to make sure that the data on the output of register 3 is stable for at least 0.2ns. This means that if data from registers 1 or 2 changes at the early clock edge, we have to guarantee that no matter what values occur, the output of register 3 does not change, thus the skew needs to be less than 1.4ns.

Name Page 3 of 8 Problem 2: Single-cycle Processors (25 points) The following MIPS code finds the maximum integer within a bounded array, where $4 contains a pointer to the beginning of the array, $5 contains the length of the array and $3 contains the pointer to store the result at the end. (Assuming there is no branch delay slot.) LW $2, 0($4) // assume the first number is the largest ADDI $4, $4, 4 ADDI $5, $5, -1 max: LW $6, 0($4) // load array element and increment pointer ADDI $4, $4, 4 SLT $7, $2, $6 // update $2 if $6 is larger BEQ $7, $0, next ADD $2, $0, $6 next: ADDI $5, $5, -1 // continue the search until end of array BEQ $5, $0, finish J max finish: SW $2, 0($3) // store result The single-cycle datapath and control unit are shown on the next page. Assume that the delay and energy consumption per operation for each functional unit is as follows: Memory (read or write): 3 ns, 3 pj ALU and adder: 2 ns, 2 pj Register file (read or write): 1 ns, 1 pj All other units: 0 ns, 0 pj a) What is the minimum clock cycle time for this processor? The minimum cycle time of the processor is 10ns (or a frequency of 100MHz). b) For an array of length N, what is the range of execution time for this program (e.g. the minimum possible execution time and the worst case execution time)? We asked for a range on this problem since the exact number of instructions is data dependent (a result of the first branch). The range of execution times is a minimum of 10(7N-4) ns and a maximum of 10(8N-5) ns. c) What is the energy consumption (per instruction) for each type of instruction in the program? Assume that components are completely turned off and do not consume energy when they are not needed. The energy consumption of each type of instruction is listed below. LW: 12 pj ADDI/ADD/SLT: 9 pj BEQ: 10 pj SW: 11 pj

Name Page 4 of 8 Diagram and scratch space for Problem 2:

Name Page 5 of 8 Problem 3: Single-cycle Datapath Design (25 points) The task is to design a single cycle processor with the minimum number of functional units that can perform the following standard MIPS instructions plus a new rotate instruction. The rotate instruction does a rotate of $RS to the right by the IMMED value and stores it in $RD (e.g. a 2 shift rotate of a 5 byte word would turn [a b c d e] into [d e a b c]). The blocks that you can use are given below along with their control signals and delay values. All blocks are similar to those we used in class, except for the addition of a 64 bit shifter. Instructions: ADDIU $RD $RS IMMED ADD $RD $RS $RT SRL $RD $RS IMMED Rotate $RD $RS IMMED Components (and control signals): ALU (ALUcontrol, Zero) => 32 bit ALU with Zero status bit ALUcontrol = 00 for ADD, 01 for AND, 10 for SUB, 11 for OR Delay = 4 EXTENDER (Sign/Zero) => Sign extender Delay = 1 MUX (Select) => 2 input mux Delay = 1 MEMORY (WrEnable, Addr) => Ideal memory Delay = 1 REGISTER (Enable) => Clocked register Clk-to-Q Delay = 1 REGISTER FILE (RD, RS, RT, WrEnable) => Register file Read delay = 1, Setup time = 1, Hold time = 1 CONSTANTS => A 32 bit constant can be defined as an input to any block (no delay) SHIFTER (ShAmt) => 64 bit shifter (see symbol below) Input and output is through 2 buses which connect to the upper and lower 32 bits of a 64 bit word. Delay = 1 32 32 Shifter Upper Lower 6 ShAmt 32 32

Name Page 6 of 8 a) Draw the datapath showing all interconnections and components (including the controller). b) What is the critical path? The critical path is stressed on the ADD and ADDIU instructions. It includes the PC, instruction memory, register file, the ALUSrc mux, the ALU, the WrSrc mux, and the setup time for writing to the register file. c) What is the delay of the critical path? The sum of all the delays above is 10. Don t forget the clock-to-q of the PC and the setup time of the register file! d) Show the values of all the control points for each instruction. (The Enable for the PC is given as an example) PCEnable ALUSrc ALUOp Sign/Zero Rotate WrSrc ADDIU 1 0 00 Sign X 0 ADD 1 1 00 X X 0 SRL 1 X XX X 0 1 Rotate 1 X XX X 1 1

Name Page 7 of 8 Problem 4: Multi-cycle Processors (25 points) For this problem you will be working with the multi-cycle datapath components on the next page. All inputs for the functional units are labeled, and registers only have one data input and one data output (you should not draw the clock lines). You will not need to deal with control in this problem, so the control inputs to each block are not shown. a) Given the datapath components on the next page, determine the register transfer language description for each of the standard MIPS instructions in the table below. You do not need to fill in every row. Hint: Do not write to the register file at the end of a cycle (i.e. only write directly from a register, not a functional unit). ADDU LW SW JAL IR Mem(PC); PC PC + 4; IR Mem(PC); PC PC + 4; IR Mem(PC); PC PC + 4; IR Mem(PC); S PC + 4; A Reg(IR[rs]); B Reg(IR[rt]); S A + B; Reg(IR[rd]) S; A Reg(IR[rs]); S A + Ext(imm); M Mem(S); Reg(IR[rt]) M; A Reg(IR[rs]); B Reg(IR[rt]); S A + Ext(imm); Mem(S) B; Reg(31) S; PC PC[31:26] IR[j] 00; b) You ll notice that some components need to be reused during execution of an instruction. Wire the datapath to support all four instructions, adding only muxes as needed. You may provide constants as inputs to any component. Be sure to label special buses, such as instruction fields. You do not need to draw any control signals (including mux select signals) just assume they will be correctly generated in all cases. c) For each instruction in part (a), calculate the CPI and indicate on the table above which operations occur during each cycle. The register transfers in part (a) above are already separated into the operations that can be performed in each clock cycle. This corresponds to the following CPI: ADDU 4, LW 5, SW 4, and JAL 2. d) The table below indicates the worst case delay through each of the functional units used in the datapath. Given these delays, calculate the execution time of this processor for a program consisting of 400,000 adds, 250,000 loads, 250,000 stores, and 100,000 branches. Functional Unit Memory Register File (read) Register File (write) ALU All others Worst-case Delay 50ns 25ns 15ns 30ns 0ns The trick to this question was realizing that cycle time is only affected by the longest path between registers in the datapath. In this case, since there is no setup time or clock-to-q delay, the cycle time will be 50ns. There is no need to have two functional units in series in this datapath, and doing so will only reduce performance. The execution time of the processor will be the total number of cycles required multiplied by the cycle time: Time = (400,000 4 + 250,000 5 + 250,000 4 + 100,000 2) 50ns = 202.5ms

Name Page 8 of 8 PC j 00 PC[31: 28] Addr Memory Out Data IR M rs rt rt rd 31 RegA DataA RegB RegFile RegW DataB DataW imm SignExt A B 4 A B Zero ALU Out S