ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

Similar documents
Homework 1. Problem 1. The following code fragment processes an array and produces two values in registers $v0 and $v1:

Final Exam Fall 2008

Grading: 3 pts each part. If answer is correct but uses more instructions, 1 pt off. Wrong answer 3pts off.

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Topic Notes: MIPS Instruction Set Architecture

COMP MIPS instructions 2 Feb. 8, f = g + h i;

Solutions for Chapter 2 Exercises

Final Exam Fall 2007

ECE 30 Introduction to Computer Engineering

Chapter 2. Instructions:

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Computer Architecture CS372 Exam 3

Lecture 4: MIPS Instruction Set

ICS 233 COMPUTER ARCHITECTURE. MIPS Processor Design Multicycle Implementation

Lecture 4: Instruction Set Architecture

CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING 2014

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Programmable Machines

CSCI 402: Computer Architectures. Instructions: Language of the Computer (4) Fengguang Song Department of Computer & Information Science IUPUI

Lecture 6 Decision + Shift + I/O

Instructions: MIPS ISA. Chapter 2 Instructions: Language of the Computer 1

Computer Architecture

CS/COE1541: Introduction to Computer Architecture

Programmable Machines

Processor design - MIPS

CENG3420 Lecture 03 Review

RISC-V Assembly and Binary Notation

CS3350B Computer Architecture Quiz 3 March 15, 2018

5/17/2012. Recap from Last Time. CSE 2021: Computer Organization. The RISC Philosophy. Levels of Programming. Stored Program Computers

Recap from Last Time. CSE 2021: Computer Organization. Levels of Programming. The RISC Philosophy 5/19/2011

CSCE 212: FINAL EXAM Spring 2009

Please put your cellphone on vibrate

Chapter 2A Instructions: Language of the Computer

Computer Architecture EE 4720 Midterm Examination

COMPSCI 313 S Computer Organization. 7 MIPS Instruction Set

Lecture 7: Examples, MARS, Arithmetic

Thomas Polzer Institut für Technische Informatik

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

Mips Code Examples Peter Rounce

MIPS Coding Continued

Control Unit. Simulation. Result

1 5. Addressing Modes COMP2611 Fall 2015 Instruction: Language of the Computer

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

Homework 4 - Solutions (Floating point representation, Performance, Recursion and Stacks) Maximum points: 80 points

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

University of California at Santa Barbara. ECE 154A Introduction to Computer Architecture. Quiz #1. October 30 th, Name (Last, First)

Part 1 (70% of grade for homework 2): MIPS Programming: Simulating a simple computer

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Points available Your marks Total 100

Performance Evaluation CS 0447

RECITATION SECTION: YOUR CDA 3101 NUMBER:

CS 251, Winter 2018, Assignment % of course mark

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: MIPS Programming

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points]

CS 2506 Computer Organization II

MIPS ISA and MIPS Assembly. CS301 Prof. Szajda

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015

CS61C - Machine Structures. Lecture 6 - Instruction Representation. September 15, 2000 David Patterson.

CSCI 402: Computer Architectures. Instructions: Language of the Computer (3) Fengguang Song Department of Computer & Information Science IUPUI.

CDA3100 Midterm Exam, Summer 2013

MIPS Assembly Programming

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro

Laboratory Exercise 6 Pipelined Processors 0.0

Midterm. CS64 Spring Midterm Exam

Stack Memory. item (16-bit) to be pushed. item (16-bit) most recent

EE 3613: Computer Organization Homework #2

ENCM 369 Winter 2013: Reference Material for Midterm #2 page 1 of 5

Chapter 2. Instruction Set Architecture (ISA)

CS3350B Computer Architecture MIPS Instruction Representation

Chapter 3. Instructions:

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont )

Midterm I October 6, 1999 CS152 Computer Architecture and Engineering

Computer Organization MIPS ISA

MIPS Assembly: More about Memory and Instructions CS 64: Computer Organization and Design Logic Lecture #7

ISA and RISCV. CASS 2018 Lavanya Ramapantulu


Final Exam Spring 2017

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

ECE 473 Computer Architecture and Organization Project: Design of a Five Stage Pipelined MIPS-like Processor Project Team TWO Objectives

Computer Organization and Components

2) Using the same instruction set for the TinyProc2, convert the following hex values to assembly language: x0f

ECE260: Fundamentals of Computer Engineering

CS 2506 Computer Organization II

Chapter 2. Instructions: Language of the Computer. HW#1: 1.3 all, 1.4 all, 1.6.1, , , , , and Due date: one week.

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

October 24. Five Execution Steps

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro

Designing for Performance. Patrick Happ Raul Feitosa

Problem 3: Theoretical Questions: Complete the midterm exam taken from a previous year attached at the end of the assignment.

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

COMP 303 MIPS Processor Design Project 3: Simple Execution Loop

Lecture 14: Recap. Today s topics: Recap for mid-term No electronic devices in midterm

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:

Transcription:

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010 This homework is to be done individually. Total 9 Questions, 100 points 1. (8 pts.) Fill in the empty slots in the table by converting numbers into different representations. All numbers are 8 bits in length. Binary Hexadecimal Unsigned Decimal Signed Decimal 0010 0001 21 33 33 0100 1111 6F 111 111 1010 0100 A6 166-90 1110 0111 E7 231-25 2. (6 pts.) Translate MIPS assembly language into machine language and specify each operation in the empty slots of Comments Assembly Language Machine Language Comments add $t0, $s1, $s2 000000 10001 10010 01000 # $t0 = $s1 + $s2 00000 100000 sll $t2, $s7, 4 000000 00000 10111 01010 # $t2 = $s7 << 4 00010 000000 sw $t4, 0 ($t2) 101011 01010 01100 0000000000000000 # store the value in $t4 to memory address ($t2+0) j 0x092A 000010 0000000000 0000 1001 0010 1010 # PC = {(PC+4)[31:28], 16 h092a, 2 b00}

3. (6 pts.) In this exercise, you will evaluate the performance difference between two CPU architectures, CSIC (complex instruction set computing) and RISC (reduced instruction set computing). Generally speaking, CISC CPUs have more complex instructions than RISC CPUs and therefore need fewer instructions to perform the same tasks. However, typically one CISC instruction, since it is more complex, takes more time to complete than a RISC instruction. Assume that a certain task needs P CISC instructions and 2P RISC instructions, and that one CISC instruction takes 8T ns to complete, and one RISC instruction takes 2T ns. Under this assumption, which one has the better performance? Compare the runtime of the task: CISC: P * 8T = 8PT RISC: 2P * 2T = 4PT Therefore in this case, RISC has the better performance.

4. (15 pts.) Consider the following fragment of C code: for ( i=0; i<=100; i=i+1) a [i] = b[i] + c; Assume that a and b are arrays of words and that the base address of a is in $a0 and the base address of b is in $a1. Register $t0 is associated with variable I and register $s0 with the value of c. You may also assume that any address constants you need are available to be loaded from memory. (1) (10 pts.) Write the code for MIPS. (2) (5 pts.) How many instructions are executed during the running of this code if there are no array out-of-bounds exceptions thrown? (3) How many memory data references will be made during execution? (1) One example of MIPS code: clear $t0; addi $s0, $zero, 100 loop: lw $t1, 0($a1) # $t1 = b[i] add $t1, $t1, $s0 # $t1 = a[i] sw $t1, 0($a0) # store $t1 to address of a[i] addi $a0, $a0, 4 # $a0 = address of a[i+1] addi $a1, $a1, 4 # $a0 = address of a[i+1] addi $t0, $t0, 1 # $t0 = $t0 + 1 beq $t0, $s0, finish # if ($t0 = 100) finish j loop finish: (Different program with same behavior can get full grade, too.) (2) 2 instructions before loop are executed 1 time; 7 instructions between loop and beq are executed 101 times; instruction j loop executed 100 times. Therefore: Total instructions executed (in this case): 2*1 + 7*101 + 1*100 = 809. (3) Memory data reference: 101 * 2 = 202.

5. (16 pts.) Pseudoinstructions are not part of the MIPS instruction set but often appear in MIPS programs. For each pseudoinstruction in the following table, produce a minimal sequence of actual MIPS instructions to accomplish the same thing. You may need to use $at for some of the sequences. In the following table, imm refers to a specific number that requires 32 bits to represent. Pseoduinstructions What it accomplishes move $t1, $t2 $t1 = $t2 clear $t0 $t0 = 0 beq $t1, imm, L If ($t1 == imm ) go to L bge $t5, $t3, L If ($t5 >= $t3) go to L (examples, not the only solution) move $t1, $t2: add $t1, $t2, $zero clear $t0: and $t0, $zero, $zero beq $t1, imm, L: lui $at, imm [31:16] addi $at, $at, imm[15:0] / ori $at, $at, imm[15:0] beq $t1, $at, L bge $t5, $t3, L: sub $at, $t5, $t3 and $at, $at, 0x8000 (or -32768 in decimal) beq $at, $zero, L (Or students can use slt to achieve same functionality. Slightly longer sequence of instructions is fine, too) (The students cannot use other temporary registers except $at)

6. (10 pts.) Consider two different implementations, P1 and P2, of the same instruction set. There are five classes of instructions (A, B, C, D, and E) in the instruction set. P1 has a clock rate of 4 GHz. P2 has a clock rate of 6 GHz. The average number of cycles for each instruction class for P1 and P2 is as follows: Class CPI on P1 CPI on P2 A 1 2 B 2 2 C 3 2 D 4 4 E 3 4 (1) (5 pts.) Assume that peak performance is defined as the fastest rate that a computer can execute any instruction sequence. What are the peak performances of P1 and P2 expressed in instruction per second? (2) (5 pts.) If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class A, which occurs twice as often as each of the others, how much faster is P2 than P1? (1) For both P1 and P2, the maximum IPC are 1 and 0.5, respectively. IPS = IPC * # of cycles in 1s Therefore IPS for P1 = 1 * 4G = 4e9; IPS for P2 = 0.5 * 6G = 3e9; (2) Average CPI = (2A+B+C+D+E) / (2+1+1+1+1). Here A, B, C, D, E refers to the CPI of each instruction on one of the implementation. On P1, CPI = (2+2+3+4+3)/6 = 14/6; Avg. time to execute 1 instruction: (14/6)/4GHz On P2, CPI = (4+2+2+4+4)/6 = 16/6; Avg. time to execute 1 instruction: (16/6)/6GHz (Speed of P2) / (Speed of P1) = (Avg. time per instruction on P1) / (Avg. time per instruction on P2) = 21/16 or 1.31 Therefore P2 is 1.31 times faster than P1.

7. (14 pts.) (1) (6 pts.) Assume the following instruction mix for a MIPS-like RISC instruction set: 20% stores, 20% loads, 20% branches, and 25% integer arithmetic, 7% integer shift, and 8% integer multiply. Given a program with 200 instructions and that load instructions require two cycles, branches require 4 cycles, integer ALU and store instructions require one cycle and integer multiplies require 10 cycles, compute the overall cycles-per-instruction or CPI. (2) (8 pts.) Strength reduction is common compiler optimization that converts multiplies by a constant integer into a sequence of shift and add instructions with equivalent semantics. Given the same parameters of (1), consider such a strength-reducing optimization that converts multiplies by a compile-time constant into a sequence of shifts and adds. For this instruction mix, 65% of the multiplies can be converted into shift-add sequences with an average length of 3.5 instructions. Assuming a fixed frequency, compute the new CPI and overall program speedup. (1) Overall CPI = 200*(20%*1 + 20%*2 + 20%*4 + 25%*1 + 7%*1 + 8%*10) / 200 = 2.52 (2) Old # of multiplies: 8% * 200 = 16 Now, 65% of 16 multiplies are replaced by 3.5 ALU instructions each. Therefore, the new # of multiplies is 16 * 35% = 5.6 (on average) The additional ALU instructions # is: 16 * 65% * 3.5 = 36.4 Now the total # of instructions is: 200-16+5.6+36.4= 226 New CPI = [200 * (20%*1 + 20%*2 + 20%*4 + 25%*1 + 7%*1) + (5.6 * 10 + 36.4 *1)] / 226 = 1.93 Speedup = Old CPU Time / New CPU Time = 504 / 436.4 = 1.15

8. (15 pts.) (1) (10 pts.) You are going to enhance a computer, and there are two possible improvements: either make multiply instructions run four times faster than before, or make memory access instructions run two times faster than before. You repeatedly run a program that takes 100 seconds to execute. Of this time, 20% is used for multiplication, 50% for memory access instructions, and 30% for other tasks. What will the speedup be if you improve only multiplication? What will the speedup be if you improve only memory access? What will the speedup be if both improvements are made? (2) (5 pts.) You are going to change the program described in (1) so that the percentages are not 20%, 50%, and 30% anymore. Assuming that none of the new percentages is 0, what sort of program would result in a tie (with regard to speedup) between the two individual improvements? Provide both a formula and some examples. (1) The times used for multiply and memory access are 20s and 50s, respectively. Accelerate only multiply: (100-20)+20/4 = 85 sec; Accelerate only memory access: (100-50)+50/2 = 75 sec; Accelerate multiply and memory access: (100-20-50) +20/4+50/2 = 50 sec. (2) Suppose the percentage of time used by multiply is a%, the percentage of memory access is b%. To make a tie between two individual improvements, we will have (100-a) + a/4 = (100-b) + b/2 => b = 1.5a (0<a, b<100) For example, if multiply consumes 20% of time, then only when memory references instruction consumes 30% of time will the improvements have equal effect.

9. (10 pts.) The following code fragment processes two arrays and produces an important value in register $v0. Assume that each array consist of 2500 words indexed 0 through 2499, that the base addresses of the array are stored in $a0 and $a1, respectively, and their size (2500) are stored in $a2 and $a3, respectively. Add comments to the code and describe in one sentence what this code does. Specifically, what will be returned in $v0? sll $a2, $a2, 2 # $a2 and $ a3 are shift left by 4 so the - sll $a3, $a3, 2 # - memory references will be aligned by word add $v0, $zero, $zero # clear $v0 add $t0, $zero, $zero # clear $t0. $t0 is the word # of array 1 outer: add $t4, $a0, $t0 # compute the address of current word lw $t4, 0 ($t4) # load value of the word in array 1 add $t1, $zero, $zero # clear $t1. $t1 is the word # of array 2 inner: add $t3, $a1, $t1 # compute the address of current word lw $t3, 0($t3) # load value of the word in array 2 bne $t3, $t4, skip # if the two words are equal, - addi $v0, $v0, 1 # - increment $v0; otherwise skip. skip: addi $t1, $t1, 4 # move to next word in array 2 bne $t1, $a3, inner # if the boundary of array 2 is not reached, - # continue the inner loop addi $t0, $t0, 4 # move to next word in array 1 bne $t0, $a2, outer # if the boundary of array 2 is not reached, - # continue the outer loop Comments are added on the right of the program. This code iteratively compares the two arrays and returns the number of same words which appear in both arrays. This value is stored in $v0.