CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

Similar documents
CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2019, Assignment % of course mark

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

Pipelining. CSC Friday, November 6, 2015

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

LECTURE 10. Pipelining: Advanced ILP

Final Exam Fall 2007

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

Computer Architecture CS372 Exam 3

2B 52 AB CA 3E A1 +29 A B C. CS120 Fall 2018 Final Prep and super secret quiz 9

Final Exam Fall 2008

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Final Exam Spring 2017

1 Hazards COMP2611 Fall 2015 Pipelined Processor

ECE 313 Computer Organization FINAL EXAM December 13, 2000

Computer Organization and Structure

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

CSE 378 Midterm Sample Solution 2/11/11

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 5 Solutions

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Lecture 4: MIPS Instruction Set

Comprehensive Exams COMPUTER ARCHITECTURE. Spring April 3, 2006

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

CS 2506 Computer Organization II Test 2

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Computer Architecture. Lecture 6.1: Fundamentals of

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

CS3350B Computer Architecture Quiz 3 March 15, 2018

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

1 5. Addressing Modes COMP2611 Fall 2015 Instruction: Language of the Computer

Pipelining and Caching. CS230 Tutorial 09

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Written Exam / Tentamen

CS/COE1541: Introduction to Computer Architecture

COMP2611: Computer Organization. The Pipelined Processor

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

EE 4980 Modern Electronic Systems. Processor Advanced

Chapter 4. The Processor

CS 351 Exam 2 Mon. 11/2/2015

CSE 378 Midterm 2/12/10 Sample Solution

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

CS 2506 Computer Organization II

CSC258: Computer Organization. Microarchitecture

ECE 154A Introduction to. Fall 2012

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Sets: Pipelining

Lecture 10: Simple Data Path

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

CMSC411 Fall 2013 Midterm 1

ECE260: Fundamentals of Computer Engineering

Processor (II) - pipelining. Hwansoo Han

ECE 30 Introduction to Computer Engineering

Grading: 3 pts each part. If answer is correct but uses more instructions, 1 pt off. Wrong answer 3pts off.

ECE Sample Final Examination

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Pipelined Processor Design

Chapter 4. The Processor

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Chapter 4 The Processor

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

ECE550 PRACTICE Final

Full Datapath. Chapter 4 The Processor 2

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

COMPUTER ORGANIZATION AND DESIGN

Instructions: MIPS arithmetic. MIPS arithmetic. Chapter 3 : MIPS Downloaded from:

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

2 GHz = 500 picosec frequency. Vars declared outside of main() are in static. 2 # oset bits = block size Put starting arrow in FSM diagrams

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

CS/CoE 1541 Mid Term Exam (Fall 2018).

Patrick Murray 4/17/2012 EEL4713 Assignment 5

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:

Cache introduction. April 16, Howard Huang 1

1 /10 2 /16 3 /18 4 /15 5 /20 6 /9 7 /12

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CENG 5133 Computer Architecture Design Spring Sample Exam 2

ECE 2300 Digital Logic & Computer Organization. Caches

Stored Program Concept. Instructions: Characteristics of Instruction Set. Architecture Specification. Example of multiple operands

Department of Electrical Engineering and Computer Sciences Fall 2017 Instructors: Randy Katz, Krste Asanovic CS61C MIDTERM 2

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Binvert Operation (add, and, or) M U X


Review: Abstract Implementation View

LECTURE 3: THE PROCESSOR

Department of Electrical Engineering and Computer Sciences Fall 2003 Instructor: Dave Patterson CS 152 Exam #1. Personal Information

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

Transcription:

Part I: Assembly and Machine Languages (22 pts) 1. Assume that assembly code for the following variable definitions has already been generated (and initialization of A and length). int powerof2; /* powerof2 = $t0 */ int decimal; /* decimal = $t1 */ char A[] = 1011011 ; /* A = $s0 */ int length = strlen (A); /* length = $t2 */ char * p; /* p = $t3 */ Write MIPS assembly code for the following code segment, using the registers specified in the comments above (to make it easier for us to go over in class). Include comments. (10 points) powerof2 = 1; decimal = 0; for ( p = A + length 1; p >= A; p-- ) { if ( *p == '1' ) decimal += powerof2; powerof2 *= 2; } addi $t9, $zero, 49 # t9 = 49 ('1') addi $t0, $zero, 1 # powerof2 = 1 move $t1, $zero # decimal = 0 add $t3, $s0, $t2 # p = A + length LOOP: addi $t3, $t3, -1 # p-- (orig A + length 1) slt $t5, $t3, $s0 # t5 = 1 if p < A bne $t5, $zero, ENDLP # goto ENDLP if p < A lw $t6, 0($t3) # t6 = *p bne $t6, $t9, ENDIF # goto ENDIF if *p!= '1' add $t1, $t1, $t0 # decimal += powerof2 ENDIF: sll $t0, $t0, 1 # powerof2 *= 2 j LOOP # goto top of LOOP for p-- ENDLP: Could also implement this in other ways!

2. Assemble the following MIPS assembly language code to machine language. Write the machine language instructions in decimal, not in binary. I ve done the first line as an example. (8 pts) swap: add $t1, $a1, $a1 0 5 5 9 0 32 add $t1, $t1, $t1 0 9 9 9 0 32 add $t1, $a0, $t1 0 4 9 9 0 32 lw $t0, 0($t1) 35 9 8 0 lw $t2, 4($t1) 35 9 10 4 sw $t2, 0($t1) 43 9 10 0 sw $t0, 4($t1) 43 9 8 4 jr $ra 0 31 0 0 0 8 3. What is the difference between a Reduced Instruction Set Computer (RISC) and a Complex Instruction Set Computer (CISC)? What are some benefits and disadvantages of this architectural design? Part II: Numbers (12 pts) 4. Explain why the IEEE single precision representation for 1 is 3F800000. 00 11 1111 1000 0000 0000 0000 0000 0000 = 0 01111111 000000000000000... = + 1.0 * 2 ^(127-127) = + 1.0 * 2^0 = 1.0 5. What is the IEEE single precision representation for 2 3/16? -2 3/16 = 10.0011 = 1.00011 * 2^1 = 1.00011 * 2^(128-127) (so exp = 128) 1 10000000 00011000000... = C 0 0 C 0 0 0 0 6. The way a computer represents real numbers is called floating point representation. Why is this term used rather than real number representation? Real number implies the full, infinite set of real numbers, but we can only represent a limited number of those in a computer. We cannot accurately or precisely represent all real numbers.

Part III: Logic Gates (10 pts) 7. Show the truth table for a two-input exclusive-or function and implement it using AND, OR, and NOT gates. (Reminder: The output of the exclusive-or function is true if and only if exactly one of its inputs is true.) (5 pts) 8. Show the gate diagram for a 1-bit ALU that supports the AND and OR functions. There should be two inputs and one output. Explain how this could be expanded to construct an ALU that acts on a full word. (5 pts)

Part IV: Datapaths and Pipelining (10 pts) 9. Using the datapath diagram provided for the homework assignment, illustrate the data and control paths used by the instruction lw $s0, 8($sp). (8 pts) 10.In a single-cycle datapath, instructions are executed one at a time, one instruction in each clock tick. In other words, the clock cycle time is set to the time it takes any instruction to fully execute. In both a multi-cycle datapath and a pipelined datapath, a clock tick is the time an instruction spends in any given stage of the datapath. Each stage has a latency time associated with it, which is the amount of time required for the stage to do its job. An example set of stages, with latencies, might be: IF: Instruction fetch (200ps) ID: Instruction decode and register file read (100ps) EX: Execution or address calculation (e.g., time spent in ALU) (200ps) MEM: Data memory access (225ps) WB: Write back to register file (100ps) In a multi-cycle datapath, instructions are executed one at a time, but only go through the stages that they need to. (E.g., R-format instructions do not have to go through MEM, because they don't read or write to memory.) In a pipelined architecture, as each instruction moves to the next stage the following instruction moves into the stage behind it. (8 pts) a) A stage's latency describes the minimum amount of time an instruction would have to spend in that stage, but, depending on the datapath type, an instruction might spend longer in a stage than its latency requires. Given the latencies described above, what would be the clock cycle time for a singlecycle datapath? For a multi-cycle datapath? For a pipelined datapath? Single-cycle: 825ps (sum of all stages) Multi-cycle: 225 ps (slowest stage) Pipelined: 225 ps ( slowest stage) b) How long would a sw instruction take in each of these three cases? Single-cycle: 825ps (sum of all stages) Multi-cycle: 1225 ps (5 * clock cycle because sw requires all stages) Pipelined: 1225 ps (5 * clock cycle) c) How long would a beq instruction take in each of these three cases? Single-cycle: 825ps (sum of all stages) Multi-cycle: 675 ps (3 * clock cycle because sw requires all stages) Pipelined: 1225 ps (5 * clock cycle) d) In a pipelined datapath, why does it make sense to require all instructions to go through all the stages of the datapath, whether they are necessary or not? (E.g., R-format instructions have to go through MEM, even though they don't read or write to memory.)

Consider an lw (uses all 5 stages) followed by an add (which doesn't seem to need the MEM stage. They would both be doing their WB at the same time if the add skipped the MEM stage. e) If we could split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what would be the new clock cycle time of the processor? Split the slowest stage (MEM). The clock cycle could now be sped up to 200 ps.

11. Describe each of the following hazard types: (6 pts) structural hazards control hazard hardware-related hazards hazard relating to branching or jumps (start executing one instr, but beq/bne/j elsewhere) data hazard need data from prev. instr whose results have not been stored in registers yet 12.Describe two methods for handling hazards: one at the hardware level and one at the software level. stalls (need to describe them) no-ops (or nop or bubble) (need to describe them) (You could also describe fancier software methods, such as re-ordering instructions.)

Part V: Memory (24 pts) 13.Which code fragment is a better example of temporal locality? Which is a better example of spatial locality? Consider data only and not the fetching of instructions. for ( k = 0; k < 100; k++ ) for ( k = 0; k < y; k++ ) a[k] = b[k] * 2.5; sum = sum + z; spatial: goes through A & B temporal: references sum & z in sequential order over and over again 14.What are a cache "hit" and a cache "miss"? How are they detected? hit: the data you're looking for is in the cache miss: data is not in cache, must go out to memory to retrieve it detection: the high-order bits of the memory location are compared to the tag in the cache. If they match, the data in the cache is the data you want.

15.Consider a 2-way set-associative 8-line cache with 1 4-byte word per line. Assume, for illustration purposes, that each byte address contains its own value as data, e.g., address 3 contains a 3, address 16 contains a 16, etc. Show the evolving cache state for the sequence of (byte) address references below, crossing out values as they are replaced with other values. Circle the references that result in a cache hit: 3, 16, 7, 6, 3, 5, 15, 16, 17, 20, 19, 22, 48 (8 pts) Binary version of those addresses: (I'm only showing low-order 16 bits of each address for simplicity) 3: 0000 0011 miss 16: 0001 0000 miss 7: 0000 0111 miss 6: 0000 0110 hit (because it was brought in with address 7) 3: 0000 0011 hit 5: 0000 0101 hit (because it was brought in with address 7) 15: 0000 1111 miss 16: 0001 0000 hit 17: 0001 0001 hit (because it was brought in with address 16) 20: 0001 0100 miss 19: 0001 0011 hit (because it was brought in with address 16) 22: 0001 0110 hit (because it was brought in with address 20) 48: 0011 0000 miss Remember, according to the instructions above, we're pretending that each byte in memory holds the value of its own address; i.e., byte 0 holds int 0, byte 1 holds int 1, byte 33 holds int 33, etc. That way, looking at the data we can see what address it corresponds to. 00 01 10 11 v Tag 00 01 10 11 Y 0000 0011 0 48 1 49 2 50 3 51 Y 0001 16 17 18 19 Y 0000 4 5 6 7 Y 0001 20 21 22 23 N N Y 0000 12 13 14 15 N

16.Describe two replacement algorithms for set-associative memory. LRU: least-recently-used (need to describe) Random (need to describe) alt 16. (From an older practice exam): Describe two write policies for writing from cache to memory. write-back (need to describe) write-through (need to describe) 17.What are two reasons for using virtual memory? We have not talked about this in class.

Part VI: Parallel Processing (8 pts) 18.What is a SIMD parallel processor? Single-Instruction-Multiple-Data (also called a vector processor): applies the same instruction to multiple data values in parallel. 19.How do the processors in a MIMD distributed memory parallel computer communicate? We have not talked about this at all in class.