s complement 1-bit Booth s 2-bit Booth s

Similar documents
/ : Computer Architecture and Design Fall 2014 Midterm Exam Solution

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

Tutorial 11. Final Exam Review

ECE 313 Computer Organization FINAL EXAM December 13, 2000

Hardware-Based Speculation

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Organisasi Sistem Komputer

PART A (22 Marks) 2. a) Briefly write about r's complement and (r-1)'s complement. [8] b) Explain any two ways of adding decimal numbers.

INTELLIGENCE PLUS CHARACTER - THAT IS THE GOAL OF TRUE EDUCATION UNIT-I

Computer Architecture Review. Jo, Heeseung

Cache Organizations for Multi-cores

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ECE 154A Introduction to. Fall 2012

CMSC411 Fall 2013 Midterm 2 Solutions

Good luck and have fun!

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

Computer Organisation CS303

Keywords and Review Questions

CS433 Homework 3 (Chapter 3)

E0-243: Computer Architecture

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

HW1 Solutions. Type Old Mix New Mix Cost CPI

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

JNTUWORLD. 1. Discuss in detail inter processor arbitration logics and procedures with necessary diagrams? [15]

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

CS152 Exam #2 Fall Professor Dave Patterson

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

CPE300: Digital System Architecture and Design

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

Divide: Paper & Pencil

CMSC411 Fall 2013 Midterm 1

COMP2611: Computer Organization. Data Representation

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

ECE331: Hardware Organization and Design

231 Spring Final Exam Name:

TDT4255 Computer Design. Review Lecture. Magnus Jahre. TDT4255 Computer Design

CO212 Lecture 10: Arithmetic & Logical Unit

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

EECC551 Exam Review 4 questions out of 6 questions

Computer Organization and Architecture (CSCI-365) Sample Final Exam

Computer Architecture V Fall Practice Exam Questions

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Chapter 5 : Computer Arithmetic

Introduction to Computers and Programming. Numeric Values

ECE Sample Final Examination

Structure of Computer Systems

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Course Description: This course includes concepts of instruction set architecture,

Handout 2 ILP: Part B

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

COMPUTER ORGANIZATION AND ARCHITECTURE

Data Representation and Architecture Modeling Year 1 Exam preparation

Final Jeopardy. CS356 Unit 15. Binary Brainteaser 100. Binary Brainteaser 200. Review

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

Signed Multiplication Multiply the positives Negate result if signs of operand are different

Module 2: Computer Arithmetic

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Mikko Lipasti Spring 2002 ECE/CS 552 : Introduction to Computer Architecture IN-CLASS MIDTERM EXAM March 14th, 2002

Instruction Pipelining Review

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

ISA Instruction Operation

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

Floating Point/Multicycle Pipelining in DLX

CS/CoE 1541 Mid Term Exam (Fall 2018).

UNIT I BASIC STRUCTURE OF COMPUTERS Part A( 2Marks) 1. What is meant by the stored program concept? 2. What are the basic functional units of a

Chapter 3: Arithmetic for Computers

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS433 Homework 2 (Chapter 3)

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

IEEE Standard for Floating-Point Arithmetic: 754

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8

Hardware-based Speculation

CS146 Computer Architecture. Fall Midterm Exam

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25)

Number Systems and Binary Arithmetic. Quantitative Analysis II Professor Bob Orr

COSC4201 Instruction Level Parallelism Dynamic Scheduling

ECE 3055: Final Exam

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Chapter 4. The Processor

Instruction-Level Parallelism and Its Exploitation

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

COSC 243. Data Representation 3. Lecture 3 - Data Representation 3 1. COSC 243 (Computer Architecture)

Superscalar Processor

Chapter 4. The Processor

Pipelining. CSC Friday, November 6, 2015

RECAP. B649 Parallel Architectures and Programming

Lecture 10: Floating Point, Digital Design

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Multiple Instruction Issue. Superscalars

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Transcription:

ECE/CS 552 : Introduction to Computer Architecture FINAL EXAM May 12th, 2002 NAME: This exam is to be done individually. Total 6 Questions, 100 points Show all your work to receive partial credit for incorrect solutions 1. (15 Points) Integer Multiplication and Division a) (2 points) What is the simple Booth encoding of the two s complement number 10011101 2? Fill in the table below to encode the number with digits (1,0,-1) in the same manner that the simple Booth multiplier would. b) (3 Points) Fill in the table to encode the number using Modified Booth s algorithm. I.e encode with the digits (-2,-1,0,+1,+2) 1 0 0 1 1 1 0 1 2 s complement 1-bit Booth s 2-bit Booth s

Restoring (5 Points) Number of additions: Mikko Lipasti c) (10 points) Using 4-bit unsigned division, 15 divided by 3 gives a quotient of 5 and a remainder of 0. i.e. 1111 / 0011 = 0101 For the restoring and non restoring division algorithms, how many separate additions and subtractions are required? Number of subtractions: Non-Restoring (5 Points) Number of additions: Number of subtractions:

2. (20 Points) Floating Point Representation and Arithmetic a) (5 points) Consider FPS-10, a 10-bit low-cost floating-point representation that has a sign bit, a 4-bit biased or excess exponent with a bias of 7, and 5 significand bits. Values are normalized to have an implicit leading 1 to the left of the floating point (just as in IEEE 754). With this standard, represent (-0.2) 10. Sign Exponent Significand b) (5 Points) Fill in the following table to specify the range of values that can be represented with FPS-10 when only normalized numbers are allowed, and also when denormalized numbers are allowed. Specify min/max values in both binary and decimal scientific notation. Minimum of Range Maximum of Range Binary Decimal Binary Decimal Case Normalized Numbers Only Denormalized numbers also

c) (10 points) Given A and B below, compute R = A * B using the FPS-10 FP standard. Show the sign bit and exponent of the result, and fill in the table below assuming a 1 bit per cycle multicycle multiplier for computing the significand (you may use a Booth multiplier if you wish, but show your work). A=1.11010 2 x 2 4 (FP binary representation: 0101111010) B=1.00101 2 x 2 2 (FP binary representation: 0100100101) Sign bit of product: Exponent of product: Significand computation: Multiplicand: Multiplier: Step Explanation Product Register 1 2 3 4 5 6 7 8 9 10 Record the properly normalized and rounded result in binary format below: Sign Exponent Significand Show any additional work here:

3. (20 Points) Consider a processor with 32-bit virtual addresses, 4KB pages and 36-bit physical addresses. Assume memory is byte-addressable (i.e. the 32-bit VA specifies a byte in memory). L1 instruction cache: 64 Kbytes, 128 byte blocks, 4-way set associative, indexed and tagged with virtual address. L1 data cache: 32 Kbytes, 64 byte blocks, 2-way set associative, indexed and tagged with physical address, write-back. 4-way set associative TLB with 128 entries in all. Assume the TLB keeps a dirty bit, a reference bit, and 3 permission bits (read, write, execute) for each entry. a) (10 points) Specify the number of offset, index, and tag bits for each of these structures in the table below. Also, compute the total size in number of bit cells for each of the tag and data arrays. Structure Offset bits Index bits Tag bits Size of tag array I-cache D-cache TLB Size of data array

b) (5 Points) Explain why accesses to the data cache would take longer than accesses to the instruction cache. Suggest a lower-latency data cache design with the same capacity and describe how the organization of the cache would have to change to achieve the lower latency. c) (5 points) Assume the architecture requires writes that modify the instruction text (i.e. self-modifying code) to be reflected immediately if the modified instructions are fetched and executed. Explain why it may be difficult to support this requirement with this instruction cache organization.

4. (5 Points) Bypass Network Design Given the following ID, EX, MEM, and WB pipeline configuration, draw all necessary Mux0 and Mux1 bypass paths to resolve RAW data hazards. Assume that load instructions are always separated by at least one independent instruction (possibly a NOP) from any instruction that reads the loaded register (hence you never stall due to a RAW hazard). Src0 Src1 OP ReadReg0 ReadReg1 WrReg ID EX Mux0 Mux1 ALU StData ALUout OP ReadReg0 ReadReg1 WrReg EX MEM Data Cache MemData MEMALUout OP ReadReg0 ReadReg1 WrReg MEM WB

5. (20 points) Given the forwarding paths in problem 4, draw a detailed design for Mux0 and Mux1 that clearly identifies which bypass paths are selected under which control conditions. Identify each input to each mux by the name of the pipeline latch that it is bypassing from. Specify precisely the Boolean equations that are used to control Mux0 and Mux1. Possible inputs to the Boolean equations are: ID.OP, EX.OP, MEM.OP = { load, store, alu, other } ID.ReadReg0, ID.ReadReg1 = [0..31,32] where 32 means a register is not read by this instruction EX.ReadReg0, etc. as in ID stage MEM.ReadReg0, etc. as in ID stage ID.WriteReg, EX.WriteReg, MEM.WriteReg = [0..31,33] where 33 means a register is not written by this instruction Draw Mux0 and Mux1 with labeled inputs; you do not need to show the controls using gates. Simply write out the control equations using symbolic OP comparisons, etc. (e.g. Ctrl1 = (ID.op == load ) & (ID.WriteReg==MEM.ReadReg0)).

6. (20 points) Match the following terms or concepts to the best definition. Term or Concept Best ID Definition Match VLIW t a Dynamic multiple arbitration SIMD b Caused by inadequate space in a cache CC-NUMA c Resolves WAR and WAW hazards in an outof-order processor Cold misses d Broadcasts all cache misses on a bus shared with all other processors SECDED ECC e Resolved when a dirty copy of a cache block is found in the cache of another processor Directory protocol f Size of this structure is proportion to amount of physical memory instead of size of virtual address space Address translation g Requires handshaking to communicate Reorder or commit buffer h Used by the operating system to store address translations in a sparse tree structure Branch predictor i Only stalls execution of data-dependent instructions vs. all instructions Seek and rotational j Machine organization where memory latency varies with the location of memory Capacity misses k Caused by a program s first reference to a memory location DMA l Enables controlled sharing and protection of physical memory pages Hashed page table m Applies the same operation to many data operands at the same time Snoopy coherence n Special-purpose cache for virtual address translations Out-of-order execution core o Occur when multiple addresses map to the same set Asynchronous bus p Resolves control dependences speculatively Conflict misses q Delays caused by mechanical components in a storage device Amdahl s Law r Enables precise exceptions in an out-of-order processor Register renaming s Can maintain coherent caches in a machine with hundreds of CPUs Coherence misses t Executes parallel instructions in lock step u Used by input/output device to transfer data v Illustrates the performance bottleneck caused by serial phases of computation w Makes it possible to build reliable memory even when individual storage cells can fail.