Name: University of Michigan uniqname: (NOT your student ID number!)

Similar documents
The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 1 February 17, 2011

1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11

CS 2506 Computer Organization II Test 1. Do not start the test until instructed to do so! printed

McGill University Faculty of Engineering FINAL EXAMINATION Fall 2007 (DEC 2007)

3. Instruction Set Architecture The MIPS architecture

CS 2506 Computer Organization II

Computer Architecture I Midterm I

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:

Chapter 4. The Processor

Grading: 3 pts each part. If answer is correct but uses more instructions, 1 pt off. Wrong answer 3pts off.

Chapter 4. The Processor

CS 2506 Computer Organization II Test 1

Processor (I) - datapath & control. Hwansoo Han

ECE331: Hardware Organization and Design

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Stack Memory. item (16-bit) to be pushed. item (16-bit) most recent

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

CS 2506 Computer Organization II

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. The Processor. Computer Architecture and IC Design Lab

CSE 378 Midterm 2/12/10 Sample Solution

Machine Organization & Assembly Language

CS2100 COMPUTER ORGANISATION

CS 2504 Intro Computer Organization Test 1

Chapter 4. The Processor Designing the datapath

Chapter 4. The Processor

COS 471A,COS 471B/ELE 375 Midterm

Unsigned Binary Integers

Unsigned Binary Integers

CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING 2014

The RiSC-16 Instruction-Set Architecture

The LC3's micro-coded controller ("useq") is nothing more than a finite-state machine (FSM). It has these inputs:

/ : Computer Architecture and Design Fall 2014 Midterm Exam Solution

Department of Electrical Engineering and Computer Science Spring 2004 Instructor: Dan Garcia CS61C Midterm

Computer Architecture. The Language of the Machine

CS 61c: Great Ideas in Computer Architecture

ECE 2035 Programming HW/SW Systems Fall problems, 6 pages Exam Two 23 October Your Name (please print clearly) Signed.

MIPS Assembly Programming

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 2A Instructions: Language of the Computer

EC 413 Computer Organization

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam Two 11 March Your Name (please print) total

ECE 2030D Computer Engineering Spring problems, 5 pages Exam Two 8 March 2012

MIPS Functions and Instruction Formats

Chapter 3: Arithmetic for Computers

Page 1. Structure of von Nuemann machine. Instruction Set - the type of Instructions

RiSC-16 Sequential Implementation

Course Administration

Lecture 4: Instruction Set Architecture

Computer Organization MIPS ISA

Computer Organization EE 3755 Midterm Examination

University of California, Berkeley College of Engineering

CSE Lecture In Class Example Handout

King Fahd University of Petroleum and Minerals College of Computer Science and Engineering Computer Engineering Department

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

University of California, Berkeley College of Engineering

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

bits 5..0 the sub-function of opcode 0, 32 for the add instruction

CSCE 5610: Computer Architecture

11/22/1999 7pm - 9pm. Name: Login Name: Preceptor Name: Precept Number:

Solution printed. Do not start the test until instructed to do so! CS 2504 Intro Computer Organization Test 2 Spring 2006.

Lecture 2. Instructions: Language of the Computer (Chapter 2 of the textbook)

Chapter 1. Computer Abstractions and Technology. Lesson 3: Understanding Performance

MIPS Instruction Set

Single cycle MIPS data path without Forwarding, Control, or Hazard Unit

EEM 486: Computer Architecture. Lecture 2. MIPS Instruction Set Architecture

Arithmetic for Computers

ANNA Guide. by Dr. Eric Larson Seattle University. Acknowledgments... 2

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Winter 2006 FINAL EXAMINATION Auxiliary Gymnasium Tuesday, April 18 7:00pm to 10:00pm

CS61C Machine Structures. Lecture 13 - MIPS Instruction Representation I. 9/26/2007 John Wawrzynek. www-inst.eecs.berkeley.

EE 361 University of Hawaii Fall

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

Computer Architecture

ECE 2035 Programming HW/SW Systems Fall problems, 7 pages Exam Two 23 October 2013

CMPSCI 201 Fall 2004 Midterm #2 Answers

NAME: 1a. (10 pts.) Describe the characteristics of numbers for which this floating-point data type is well-suited. Give an example.

ECE 313 Computer Organization EXAM 2 November 11, 2000

Review Topics. Midterm Exam Review Slides

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

CSE 351 Midterm - Winter 2015 Solutions

CSE 351 Midterm - Winter 2015

ECE Exam I - Solutions February 19 th, :00 pm 4:25pm

CSE351 Spring 2018, Midterm Exam April 27, 2018

CMPSCI 145 MIDTERM #1 Solution Key. SPRING 2017 March 3, 2017 Professor William T. Verts

LECTURE 5. Single-Cycle Datapath and Control

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

MIPS%Assembly% E155%

CS 351 Exam 2 Mon. 11/2/2015

Slide Set 5. for ENCM 369 Winter 2014 Lecture Section 01. Steve Norman, PhD, PEng

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Midterm I October 6, 1999 CS152 Computer Architecture and Engineering

RECITATION SECTION: YOUR CDA 3101 NUMBER:

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

EC-801 Advanced Computer Architecture

Transcription:

The University of Michigan - Department of EECS EECS370 Introduction to Computer Organization Midterm Exam 1 October 22, 2009 Name: University of Michigan uniqname: (NOT your student ID number!) Open book, open notes. No laptops, PDAs, cell phones, etc. (calculators are ok). This exam has 6 sets of questions, 15 pages, and 65 points. Questions vary in difficulty; it is strongly recommended that you do not spend too much time on any one question. For questions where a box is provided, please put your final answer in the box. Question Points 1 Short questions /6 2 Floating Point Arithmetic /12 3 Single Cycle Datapath /12 4 ISA Design /15 5 MIPS /10 6 Caller/Callee /10 Total /65 The rules of the Honor Code of the University of Michigan - College of Engineering apply for this exam. Honor code pledge: I have neither given nor received aid on this examination, nor have I concealed any violations of the Honor Code. Signature: (Exams without a signed pledge will not be graded) Page 1/15

1. Short Answer Questions [6 points] a) [3 points] The LC2K instruction set lacks a subtract instruction. Show in LC2K assembly how to subtract an operand in register 2 from an operand in register 1, with the result of the subtraction placed in register 3. lw 0 4 neg1 nand 2 2 2 add 2 4 2 add 1 2 3 b) [3 points] Which addressing mode does the following sequence of LC-2K instructions emulate? Assume the initial value of register 0 is zero. a) Register lw 0 3 100 lw 3 3 0 lw 3 4 0 b) Base + displacement c) Indirect d) Double indirect e) PC relative Page 2/15

2. Floating Point Arithmetic [12 points] We have constructed a new 10-bit floating point format: 1 bit sign 3 bit exponent with a bias of 3 6 bit mantissa (a.k.a. significand) All other aspects of this format are exactly the same as the standard IEEE floating point studied in class. You must show your work on this problem to be eligible for partial credit. If you need more space, attach another sheet, but label it clearly. a) [2 points] What is the largest number that can be represented exactly in this format? Give both its floating point and decimal representations Floating point: S E E E M M M M M M 0 1 1 1 1 1 1 1 1 1 Decimal: 1.111111 * 2^4 = 1 1111.11 == 31.75 b) [4 points] Convert the following two floating point values to decimal: S E E E M M M M M M 0 1 1 0 0 1 0 0 0 1 Decimal: 1.010001 * 2^3 = 1010.001 = 10.125 S E E E M M M M M M 1 0 1 0 0 1 0 0 0 0 Page 3/15

Decimal: - 1.01 * 2^-1 =.101 = -0.625 c) [4 points] Multiply the two floating point numbers given in part (b) and report your result in the floating point format: Product: S E E E M M M M M M 1 1 0 1 1 0 0 1 0 1 Sign is negative Exponent is 6 + 2-3 = 5 == 2^2 Mantissa multiplication: 1.01 0001 x 1.01 101 0001 0 1 0100 0100 1.1001 0101 Note that the low order two bits are truncated because of limited mantissa space. Hence, the product is 1.100101 * 2^2 == -110.0101 == -6.3125 d) [2 points] Because of limitations on the number of bits in the mantissa, floating point calculations often lose precision. What is the absolute difference between your answer for part (c) and the exact product of the two numbers given in part (b). The exact product is 6.328125. The difference is 0.015625 or 1/64 th. This can easily be determined by examining the value of the bits truncated during the multiplication. Page 4/15

3. Single-Cycle Datapath [12 points] The figure on page 9 illustrates the single-cycle LC2K architecture discussed in class. a) [4 points] Assume we want to have a combinational circuit (the box labeled 0? inside the dashed circle) that takes the (32-bit) result of ALU as input and outputs a single bit, Z, which is 1 if and only if the result of the operation done by ALU is zero. For example if you execute add 1 2 3 and registers 1 and 2 respectively hold values 5 and -5 then Z will be 1 in that cycle. Sketch the design for this circuit. You may use basic logic gates (e.g., NAND, AND, OR, Inverter) of any number of inputs. All 32 bits of ALU result go into OR gate, then invert the result b) [8 points] Suppose we want to add a new instruction, lbr (loop branch), to LC2K and assign it opcode 7 (111 binary), which was previously unused. lbr has two operands: R and a 16-bit displacement, which are stored in bits 21-19 and 15-0 respectively. (Similar to I-type instructions, except bits 18-16 are not used.) lbr decrements register R by 1 and if the new value of register R is not 0 then it branches to PC + 1 + displacement. Note that regardless of whether or not the branch is taken, R should be updated to contain the decremented value. i. Modify the figure on page 9 to show what extra circuitry must be added to implement this instruction. ii. The following table shows the contents of the control ROM. Fill in the values for line 7. If you need to add control signals in your design, add a new column for each and fill in the entries of the new column for every line of the ROM. If you include multiplexers (MUX's) with more than one select line, be sure to indicate which line is the high-order and which line is the low-order select input. Page 5/15

Use these columns for new control signals (if needed.) C 0 C 1 C 2 C 3 C 4 C 5 C 6 C 7 CC 8 CC 9 C 1 C 1 Line 0 0 0 Line 1 0 0 Line 2 0 0 Line 3 This part of ROM is not important to us. 0 0 Line 4 0 0 Line 5 0 0 Line 6 0 0 Line 7 0 1 0 1 0 0 0 X 1 1 iii. Provide a brief explanation of the sequence of events that take place in your design to execute lbr? Add -1 as an input to MUX before ALU; new control signal C8 for MUX. Add an AND gate with inputs being the three opcode bits and the output of Z inverted. Connect output of this gate to an OR gate along with the BEQ AND gate s output, and loop the output of the OR gate to the MUX control signal that the BEQ AND gate s output originally was connected to. Connect bits 21-19 to write MUX for register file, add control signal C9 for that MUX. Assume: new inputs to MUX go to the bottom, new control signals go to the left (and are most significant) With the proper control signals: R is added to -1 and Z determines if the result is 0. If not, and the instruction is lbr, this sends PC + 1 + offset to the PC. R 1 is sent back to the register file. Page 6/15

Page 7/15

4. ISA Design [15 points] You are the Chief Architect at Broken Arrows, a company specializing in low-overhead, processor design. The ISA for the company s flagship processor, is documented in the tables below. Instructions are 11 bits long. Data and memory addresses are all 8 bits long. The design is a CISC, byte-addressable instruction set. There is one single register, the AllAlone register (called AA) and a stack, to assist with all computation. There are 3 instruction formats, explained below. R-type Instructions Bits 10-3 Bits 2-0 Instruction Opcode Action unused opcode Pop 001 Pop the [top] value from the stack and stores it in AA, resulting in one less value on the stack. Push 010 Push the value of AA onto the top of stack, resulting in one more value on the stack. Halt 011 Halt the processor. I-type Instructions Bits 10-3 Bits 2-0 Instruction Opcode Action 2 s complement immed (IMM) opcode Pushi 000 Push the signextended immediate value onto the stack. Beqnz 100 Pop the [top] entry from the stack. If it is not zero, start executing at PC+1+IMM, where IMM is the sign-extended immediate value. Otherwise, execute the next instruction at PC+1. LoadAdd 101 Form a memory address by popping the [top] stack entry and adding it to the sign-extended immediate value IMM. Now add 1 to the word loaded from memory and then push it onto the stack. StoreSub 110 Form a memory address by popping the [top] stack entry and adding it to the sign-extended immediate value IMM. Pop the [top- 1] stack entry, subtract one and store it to that address in memory. Tadt 111 Tadt stands for Test-And-Divide-by-Two. Form a memory address by popping the [top] stack entry and adding it to the sign-extended immediate value IMM. Compares the value of AA with the value stored at the memory location specified by the address. If the value in AA is less than or equal to the memory value, the value in AA is divided by 2. Q-type Instructions Instruction Opcode ALUOp Bits 10-9 Bits 8-3 Bits 2-0 ALU op unused opcode StAdd 001 11 Remove the [top] value from the stack, add one and push onto the top of stack. StSub 001 01 Remove the [top] value from the stack, subtract one and push onto the top of stack. Page 8/15

StNand 001 10 Remove the [top] value from the stack, NAND with the value stored in AA and push the result onto the top of stack. a) [3 points] Translate the following instructions into machine code. Assembly Binary Hexadecimal LoadAdd -4 0111 1110 0101 0x 7e5 Tadt -20 0111 0110 0111 0x 767 StNand 0100 0000 0001 0x 401 b) [4 points] The loadadd instruction uses base + displacement memory addressing mode. You are asked to design a new pseudo-instruction loaddir, which uses a direct memory addressing mode. The memory address for loaddir is specified using the 8-bit IMM field as an unsigned address. (i) What is the range of values that the immediate field (IMM) can encode for the loadadd instruction? Solution: -128 127 [min value] [max value] (ii) What is the range of values that the immediate field (IMM) can encode for the new loaddir pseudo-instruction? Solution: 0 255 [min value] [max value] Page 9/15

c) [8 points] Suppose a number (not zero) in 2 s complement form is stored at memory address 100. Write a short assembly program using the new ISA design described before to find out if the number is even or odd. If the number is even, do nothing. If the number is odd, shift it right once. Store the result back(in either case) at the memory location 101. [Note: You cannot assume any data stored in memory, unless specified. Also you cannot use the pseudo instruction loaddir] Solution : 0 Pushi 100 // block 1 1 LoadAdd 0 2 StSub 3 Pop 4 Pushi 1 // block 2 5 StNand 6 Pushi 1 7 StNand // block 2 8 Pop store one copy of value in AA 9 StNand 10 Pushi 100 // block 1 11 LoadAdd 0 12 StSub 13 Pop //restore the original value back in AA 14 Beqnz 2 15 Pushi 1 16 Beqnz 2 17 Pushi100 18 Tadt 0 19 Push 20 StAdd 21 Pushi 101 22 StoreSub 0 23 Halt Page 10/15

5. MIPS [10 points] [10 points] Your friend has asked you to debug his MIPS code. His professor has asked him to implement the SAXPY code. SAXPY (Scalar Alpha X Plus Y) is one of the functions in the Basic Linear Algebra Subprograms (BLAS) package, and is a common operation in computations with vector processors. SAXPY is a combination of scalar multiplication and vector addition, as defined by the algorithm below: //x starts at mem address = 500 int x[10] = { 100, 122, 58, 123, 91, 110, 86, 54, 37, 42}; int y[10] = { 120, 16, 83, 130, 71, 10, 99, 78, 32, 63};... int a = 150; // mem address = 700 main(void) { int i; for (i=0; i < 10; i++) { y[i] = a*x[i] + y[i]; } } start: li $r5,0 li $r10,700 lw $r6, 0($r10) li $r7,500 li $r1,510 li $r4,10 loop: lw $r2,0($r7) mult $r2,$r6 mflo $r2 lw $r3,0($r1) add $r3,$r2,$r2 sw $r2,0($r1) addi $r7,$r7,1 addi $r1,$r1,1 addi $r5,$r5,1 beq $r4,$r5,loop halt Page 11/15

What is wrong with your friend s code? Write down which instruction or instructions are causing the code to fail. Explain what needs to be changed or what needs to be added. Notes: there may be more than one problem with the above code. The li instruction is a load immediate in which the register gets loaded with an immediate value. When the mult instruction is run, assume no overflow of the 32 bit register. The first incorrect instruction is the li $r1, 510. We know that x starts at 500 and goes to 539 (40 bytes). Thus y should start at 540 and go till 579, thus the instruction should read li $r1,540 The second error is in the add $r3, $r2, $r2 instruction. In MIPS, the destination register is the first register specified. In this code, we are trying to add the a*x[i] term to y[i] and store the result in $r2 which gets saved in the next line to memory. However, this assembly line is saving the value of $r2 + $r2 into $r3, which is incorrect. We should change this to: add $r2, $r3, $r2 OR add $r2, $r2, $r3 Two more errors exist in the addi $r7,$r7,1 and addi $r1,$r1,1 lines. These lines of assembly tell us where we need to load the instructions from memory. Since the data is bytes, we need to increment the memory address by 4, not by 1. Thus the instructions should be: addi $r7,$r7,4 addi $r1,$r1,4 The final error is the beq instruction. This should read bne since we are comparing the value to 10, which is the number of elements in the array. If it was beq, the program would halt after one iteration. Thus it should read: bne $r4, $r5, loop Page 12/15

6. Caller/Callee [10 points] Suppose you are given the following code: int foo() { int a = 20, b = 10, c, i, j = 3; char *p = Hello World\n ; i = 0; bar(b); for (i = 0; i < 17; ++i) { c += a; b = bar(b); } c += j; } return c; int bar(int j) { int a = j; int b = a + 5; while (b > a) { printf("%d\n", b); --b; } } return a; The architecture that you are using contains 3 caller-saved registers ($1, $2, and $3) and 3 callee-saved registers ($4, $5, and $6). Page 13/15

a) [2 points] Assume the following register assignments for bar( ): a -> $1, b -> $2. What register should be assigned to each variable in foo( ) to minimize the total number of executed save/restore instructions? (NOTE: a single save or restore instruction in code can be executed multiple times if, for example, it resides within a loop) Fill in the table below. Variable Register in foo( ) a $4 b $1 c $5 i $6 j $2 p $3 b) [2 points] How many save/restore instruction pairs are executed during a single call to foo( ) using the assignments in part a)? Note that you will need to consider bar( ) as well, since we are looking for the total number (ignore printf for the purpose of counting). For function foo: a needs to be saved and restored once. (+1) b doesn t need to be saved or restored. c needs to be saved and restored once. (+1) i needs to be saved and restored once. (+1) j needs to be saved and restored once. (+1) p doesn t need to be saved or restored. For function bar, which is called once outside the loop and 17 times inside the loop: a needs to be saved and restored 18*5 times (+90) b needs to be saved and restored 18*5 times (+90) A total of 184 save/restore pairs are needed, or 368 total instructions. Page 14/15

c) [2 points] Now assume the following register assignments are made for bar( ): a->$4, b->$5. Repeat part a) for this configuration. Variable Register in foo ( ) a $4 b $1 c $5 i $6 j $2 p $3 No change is needed from part a). It may be tempting to try to re-assign callee-saved registers to unused variables like p so that bar doesn t need to preserve them, but remember that bar can potentially be called from another function that requires those registers to be preserved. d) [2 points] Repeat part b) with the configuration given in part c). Function foo is unchanged (4 total save/restore pairs). For function bar, which is called once outside the loop and 17 times inside the loop: a needs to be saved and restored 18*1 times (+18) b needs to be saved and restored 18*1 times (+18) A total of 40 save/restore pairs are needed, or 80 total instructions. e) [2 points] What can you conclude about bar s register assignments and the effect they have on optimizing foo? Regardless of how bar assigns its variables to registers, the optimal register assignment for foo remains the same. Page 15/15