Last Name (in case pages get detached): UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING FINAL EXAMINATION, APRIL 2011

Similar documents
Last Name (in case pages get detached): UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING MIDTERM EXAMINATION, MARCH 2011

University of Toronto Faculty of Applied Science and Engineering

Student # (use if pages get separated)

University of Toronto Faculty of Applied Science and Engineering Department of Electrical and Computer Engineering Final Examination

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

CS232 Final Exam May 5, 2001

Problem 1 (logic design)

CS232 Final Exam May 5, 2001

1. Number Conversions [8 marks]

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Faculty of Science FINAL EXAMINATION

Machine Organization & Assembly Language

6.823 Computer System Architecture Datapath for DLX Problem Set #2

Mapping Control to Hardware

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Topic #6. Processor Design

CSE 378 Final Exam 3/14/11 Sample Solution

CS 351 Exam 2 Mon. 11/2/2015

Computer Architecture V Fall Practice Exam Questions

Winter 2002 FINAL EXAMINATION

CS/CoE 1541 Mid Term Exam (Fall 2018).

Winter 2009 FINAL EXAMINATION Location: Engineering A Block, Room 201 Saturday, April 25 noon to 3:00pm

Initial Representation Finite State Diagram. Logic Representation Logic Equations

Programmable Machines

Department of Electrical Engineering and Computer Sciences Fall 2017 Instructors: Randy Katz, Krste Asanovic CS61C MIDTERM 2

Chapter 4. The Processor

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

Midterm #2 Solutions April 23, 1997

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

CS Computer Architecture

Programmable Machines

RISC Processor Design

Topic Notes: MIPS Instruction Set Architecture

Chapter 4. The Processor

Chapter 4. The Processor

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Processor (I) - datapath & control. Hwansoo Han

Computer Architecture CS372 Exam 3

Question 1: (20 points) For this question, refer to the following pipeline architecture.

ECE 30, Lab #8 Spring 2014

Major CPU Design Steps

I expect you to understand everything discussed prior to this page. In particular:

CS 351 Exam 2, Fall 2012

CPE 335. Basic MIPS Architecture Part II

Lectures 3-4: MIPS instructions

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Systems Architecture

Chapter 5: The Processor: Datapath and Control

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Computer Architecture, IFE CS and T&CS, 4 th sem. Single-Cycle Architecture

LECTURE 6. Multi-Cycle Datapath and Control

ECE Sample Final Examination

Points available Your marks Total 100

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

The Evolution of Microprocessors. Per Stenström

RISC Design: Multi-Cycle Implementation

Computer System Architecture Midterm Examination Spring 2002

Processor design - MIPS

University of Toronto Faculty of Applied Science and Engineering

CSE 2021 COMPUTER ORGANIZATION

CC 311- Computer Architecture. The Processor - Control

ECE550 PRACTICE Final

COSC 6385 Computer Architecture - Pipelining

University of Toronto Faculty of Applied Science and Engineering

Do-While Example. In C++ In assembly language. do { z--; while (a == b); z = b; loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero

Cache Organizations for Multi-cores

Introduction to Computers & Programming

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

Machine Organization & Assembly Language

ECE 3056: Architecture, Concurrency and Energy of Computation. Single and Multi-Cycle Datapaths: Practice Problems

CSE 2021 COMPUTER ORGANIZATION

UNIT- 5. Chapter 12 Processor Structure and Function

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

CS305, Computer Architecture, End Sem, Sat 21/11/09, 2:30 05:30pm, Max marks: 45

Name: University of Michigan uniqname: (NOT your student ID number!)

a) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage.

A First Look at Microprocessors

Q.1 Explain Computer s Basic Elements

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

This exam is worth 40 points, or 30% of your total course grade. The exam contains eight

Initial Representation Finite State Diagram Microprogram. Sequencing Control Explicit Next State Microprogram counter

Systems Architecture I

Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University. See P&H 2.8 and 2.12, and A.

COMPUTER ORGANIZATION AND DESIGN

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

Microprogrammed Control Approach

CSCE 5610: Computer Architecture

Computer Organization MIPS ISA

CS 152 Computer Architecture and Engineering

Chapter 4. The Processor

Lecture1: introduction. Outline: History overview Central processing unite Register set Special purpose address registers Datapath Control unit

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary

(1) Using a different mapping scheme will reduce which type of cache miss? (1) Which type of cache miss can be reduced by using longer lines?

Transcription:

Page 1 of 17 UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING FINAL EXAMINATION, APRIL 2011 ECE243H1 S COMPUTER ORGANIZATION Exam Type: D Duration: 2.5 Hours Prof.s Anderson, Enright Jerger, and Steffan This is a type D exam. You are allowed to use any book/notes and a non-programmable calculator as allowed by the University regulations. Last Name (Print): First Name: Student Number: Marks 1 15 2 20 3 15 4 20 5 20 6 20 7 30 Total 140 Max. Marks Please: State your assumptions. Show your work. Comment your code. Use your time wisely. The mark value of each question is roughly equivalent to how many minutes it should take to answer. If you think that assumptions must be made to answer a question, state them clearly. If there are multiple possibilities, comment that there are, explain why and then provide at least one possible answer and state the corresponding assumptions.

Page 2 of 17 Part 1. [15] Short Answer a) Give the immediate 16bit value that is needed to encode the beq r8, r9, branch2 instruction in the following NIOS code: branch1: branch2: beq r8, r9, branch2 blt r8, r9, branch1 add r8, r8, r8 add r8, r8, r8 add r9, r9, r9 add r8, r8, r0 Answer: 16 (0x0010) b) Consider the following simplified datapath (as discussed in class) with no forwarding lines: Write STALL between instructions that contain a data hazard so that the code produces the correct result in the above datapath: add r8, r8, r9 STALL sub r9, r8, r10 add r10, r10, r10 STALL ldw r11, 0(r10) c) True or False: there are instructions found in CISC machines that are also found in RISC machines [true]

Page 3 of 17 d) The following code sums the top two words on the stack, pops one of the words, and overwrites the other word with the sum- - - however, the code does not do these operations in that order. In general the code works, but in what case would this code lead to an incorrect result? (NOTE: This is a hard question!) addi sp,sp,4 #move the stack pointer by 4bytes ldw r8,-4(sp) #get first operand ldw r9,0(sp) #get second operand add r8,r9,r8 #calculate the sum stw r8,0(sp) #store the sum [moving the stack pointer before loading from it means an interrupt between the addi and first ldw could change the value on the stack at - 4(sp)] e) Fill in the table below, showing the result of each instruction. Assume that : R8 = 0xBADACABA slli r9,r8,4 srli r9,r8,8 srai r9,r8,8 R9=0xADACABA0 R9=0x00BADACA R9=0xFFBADACA f) Caches are organized into blocks to exploit what form of locality? Why does it make sense for an instruction- cache to have blocks (instead of single- instruction cache entries)? Blocks exploit spatial locality; instructions are often executed sequentially. g) In virtual memory implementations, why are pages normally so much larger (e.g., 4KB) than cache blocks (eg., 64B)?

Page 4 of 17 Hard drives are much slower than main memory, so you must transfer larger chunks to be worth the overhead. Also to reduce the page table size. Part 2. [20] C to Assembly The structure used to represent a binary tree are shown in part (a) of the figure below. Part (b) shows the sum_tree function, a C function that accepts a pointer, tree. The tree pointer points to an element in the tree. The function traverses the tree recursively to sum all the elements. Ints and pointers are 4B each. Implement sum_tree in NIOS II assembly. Please use callee- save registers for any temporary registers you may need (and save/restore them). struct node_t { int item; struct node_t *left; struct node_t *right; } (a) void sum_tree (struct node_t *tree) { if (tree!= NULL) { sum += tree- >item; sum_tree(tree- >left); sum_tree(tree- >right); } } (b) COPY1: please cross out the copy you do not want graded.section.data.align 2 sum:.word 0 sum_tree: # prologue.section.text add sp, sp, - 20 stw ra, 16(sp) stw r16, 12(sp) stw r17, 8(sp) stw r18, 4(sp) movia r16, sum # if (tree!= NULL) { beq r4, r0, epi # sum += tree- >item; ldw r17, 0(r4) ldw r18, 0(r16) epi: # sum_tree(tree- >left); stw r4, 0(sp) ldw r4, 4(r4) call sum_tree ldw r4, 0(sp) # sum_tree(tree- >right); ldw r4, 8(r4) call sum_tree # epilogue ldw r18, 4(sp) ldw r17, 8(sp) ldw r16, 12(sp) ldw ra, 16(sp) add sp, sp, 20 ret

Page 5 of 17 add r18, r18, r17 stw r18, 0(r16) (this is repeated here for your convenience) struct node_t { int item; struct node_t *left; struct node_t *right; } (a) void sum_tree (struct node_t *tree) { if (tree!= NULL) { sum += tree- >item; sum_tree(tree- >left); sum_tree(tree- >right); } } (b) COPY2: please cross out the copy you do not want graded.section.data.align 2 sum:.word 0 sum_tree: # prologue.section.text # sum_tree(tree- >left); # sum_tree(tree- >right); # epilogue # if (tree!= NULL) { # sum += tree- >item;

Page 6 of 17 Part 3. [15] Stack / Procedure calls a) Fill in the stack operations to save registers as required by convention in the following subroutine: RS232Out: # put stack operations here addi sp,sp,-8 stw r19,0(sp) stw r21,4(sp) movia r13, 0x10001010 /* r13 now contains the base addr of UART*/ ldhio r19, 4(r13) /* Load from the UART */ beq r19, r0, Done /* this branch is taken if no room for data*/ ldbio r8, 0(r4) /* get value to output */ stwio r21, 0(r13) Done: # put stack operations here ldw r21,4(sp) ldw r19,0(sp) addi sp,sp,8 ret b) If this subroutine were to be called from code within an interrupt handler, what registers (in addition to those you saved above) would also have to be saved (if any)? Where in the code would you perform this additional saving of registers? ra, r4, r8, r13 saved in preamble of ISR

Page 7 of 17 Part 4. [20] Interrupts The program and data below have been loaded on a NIOS II system. Assume that as soon as interrupts are enabled, that it is possible that some device might request an interrupt. A 5-element array is stored in memory at label array..section.data index:.byte 4 array:.byte 1, 2, 3, 4, 5.section.exceptions, ax h0: movia r8, index h1: stbu r0, 0(r8) h2: addi ea, ea, - 4 h3: eret.section.text main: # interrupts are disabled here movia r16, index movia r17, array movia r18, sum # interrupts are enabled here f0: ldbu r19, 0(r16) f1: add r20, r17, r19 f2: ldbu r21, 0(r20) f3: addi r21, r21, 1 f4: addi r19, r19, - 1 f5: stbu r19, 0(r16) f6: stbu r21, 0(r20) f7: bgt r19,r0,f0 f8: Give the final contents of array for each of the following scenarios: a) No interrupt occurs: array = { 2, 3, 4, 5, 6 } b) One interrupt occurs before f0: c) below, put check marks next to each valid final state for array (for any occurrence of interrupts including (a) and (b) above): array = {2, 3, 4, 5, 6} _X array = {1, 2, 3, 4, 5} array = {2, 3, 4, 5, 5} array = {2, 2, 3, 4, 5} _X array = {1, 2, 3, 4, 6} array = {2, 2, 3, 4, 6} _X array = {2, 2, 3, 5, 6}_X array = {2, 2, 2, 4, 6} array = {2, 2, 4, 5, 6}_X array = {1, 2, 3, 5, 6} array = {1, 2, 4, 5, 6}

Page 8 of 17 array = { 2, 2, 3, 4, 5 } Part 5. [20] CPU Design You have the following computer structure as described in class: The datapath is controlled by the following control signals: PCout PC to bus PCwrite MDRBuswrite MDR updated from bus MEMoutBus Memory data value to bus MARwrite Ywrite Zwrite ZoutBus Z value to bus IRwrite RFsel Select which register is output to bus or written, as determined by RFout and RFwrite RFout Allow output from selected register to bus RFwrite Write to selected register from bus ra_outbus ra register value to bus ra_write write to ra register from bus ALUop Add, subtract, shift, etc. Select Chooses one of the ALU inputs MEMRead The data from the memory address specified is output to Dout MEMWrite The data on Din is written to the memory address specified

Page 9 of 17 a) Show a cycle- by- cycle timing diagram of how the memory works for a single read. Assume that control signals change only on the falling edge of the clock. Also assume that MDR data is available on the rising edge of the clock directly after the cycle in which the MEMRead is initiated. NOTE: this is a very easy question, we essentially describe the answer in the question- - - its purpose is to help you get the answer to (b) correct. CPU Clock MARwrite MEMRead MDR data available for bus b) Fill in the table below to specify the proper control signals to implement a JSR (jump to subroutine) instruction that performs: ra pc+2, pc mem[pc+1]. Include all cycles, including those to fetch the instruction. State all assumptions. For ALUop and Select, just specify the values. Ex: Select Y, ALUop=Subtract. Assume that memory read information is available the cycle directly after the cycle in which the MEMRead is initiated. The first cycle is started for you (but not necessarily completed). Put an X only where signals are active during the cycle. Some columns and rows may be unused. For full marks your implementation should be as fast as possible.

Page 10 of 17 COPY1: please cross out the copy you do not want graded Cycle Signal 1 2 3 4 5 6 7 8 9 PCout X xxxxxx xxxxxx xxxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx PCwrite X X MDRBuswrite MEMoutBus X X MARwrite X X Ywrite Zwrite X X ZoutBus X X IRwrite X RFsel RFout RFwrite ra_outbus ra_write X ALUop ADD ADD Select 1 1 MEMRead X X MEMWrite

Page 11 of 17 COPY2: please cross out the copy you do not want graded Cycle Signal 1 2 3 4 5 6 7 8 9 PCout X PCwrite MDRBuswrite MEMoutBus MARwrite Ywrite Zwrite ZoutBus IRwrite RFsel RFout RFwrite ra_outbus ra_write ALUop Select 1 MEMRead MEMWrite

Page 12 of 17 Part 6. [20] Memory Design Consider the ROM chip above that has the following pins of interest: A5..0: address pins D15..0: data pins, which include built-in tri-state buffers enabled by BE1 and BE0 appropriately BE1: byte enable for the upper byte of output (D15..8) BE0: byte enable for the lower byte of output (D7..0) You are to build a ROM device out of a number of these ROM chips by completing one copy of the diagrams on the next two pages. Your ROM device and solution should have the following features: 256bytes of ROM storage The first byte (eg., byte0) of the ROM should be mapped to address 0xFA000 All 256bytes of the ROM should be mapped to consecutive byte addresses o Eg., byte1 is mapped to 0xFA001, byte2 is mapped to 0xFA002, etc. Your ROM device should respond properly to byte, half-word, and word loads You may use only decoders, AND gates, OR gates, NOT gates, and tri-state-buffers You do not have to handle illegal combinations of the BE3..0 signals You can ignore the ACK line of the bus (it is not shown) COPY1: please cross out the copy you do not want graded.

Page 13 of 17 COPY2: please cross out the copy you do not want graded.

Page 14 of 17 Part 7. [30] Caches

Page 15 of 17 a) fill in the blanks in the table below; each row describes a certain cache design by giving the total cache capacity, the cache block size (data only), associativity, and number of tag, set-index, and offset bits. For all rows assume a 32-bit address space. Show your work on this page below for possible part marks. Total Capacity Block Size Associativity Tag bits Set-index bits Cache1: 1MB 128B 2-way 13 12 7 Cache2: 2KB 32B 1-way (directmapped) 21 6 5 Cache3: 8KB 64B 4-way 21 5 6 Offset bits b) assuming a 512B, 2- way set- associative cache with 16B cache blocks and LRU replacement, and the address- trace from a set of accesses below:

Page 16 of 17 What is the number total number of hits: 8 What is the hit rate: 50% Show your work below for possible part marks. ADDRESS TRACE: 0xFA23 0xFA29 0xFADE 0xEBD0 0xFB2E 0xFAD3 0xFA2A 0xFAD9 0xFBDE 0xEB23 0xFA21 0xFA20 0xEBDF 0xEB29 0xFBDC 0xFAD1 c) For a cache, is there a sequence of addresses that will perform better if the cache is directmapped rather than 2-way set associative? If so, describe the sequence. This is assuming that the capacity of the cache and the size of the blocks is the same in both cases.

Page 17 of 17 (CEDOMIR S ANSWER) Cache has four 1-byte blocks (Blocks 0-3, Sets 0-1) and LRU policy. The sequence of addresses is 0, 2, 6, 0: Direct 0 MISS, BRING 0 to Block 0 2 MISS, BRING 2 to Block 2 6 MISS, BRING 6 to Block 2 0 HIT 2-way 0 MISS, BRING 0 to Set 0 2 MISS, BRING 2 to Set 0 6 MISS, 6 REPLACES 0 in Set 0 0 MISS