CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Similar documents
DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

COMPUTER ORGANIZATION AND DESI

Hardware-based Speculation

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Chapter 4. The Processor

UNIT I (Two Marks Questions & Answers)

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

CS 1013 Advance Computer Architecture UNIT I

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced Instruction-Level Parallelism

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

Keywords and Review Questions

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Exploitation of instruction level parallelism

Hardware-Based Speculation

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Computer Architecture Review. Jo, Heeseung

Processor (IV) - advanced ILP. Hwansoo Han

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

The Processor: Instruction-Level Parallelism

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Full Datapath. Chapter 4 The Processor 2

Lecture-13 (ROB and Multi-threading) CS422-Spring

CS 2410 Mid term (fall 2018)

Summary of Computer Architecture

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Cycle Time for Non-pipelined & Pipelined processors

Instruction Level Parallelism (ILP)

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

UNIT I 1.What is ILP ILP = Instruction level parallelism multiple operations (or instructions) can be executed in parallel

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Chapter 4. The Processor

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

In embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.

CS 341l Fall 2008 Test #2

Chapter 4 The Processor (Part 4)

What is Pipelining? RISC remainder (our assumptions)

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Advanced Computer Architecture

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Copyright 2012, Elsevier Inc. All rights reserved.

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

ECE 154A Introduction to. Fall 2012

COMPUTER ORGANIZATION AND DESIGN

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction-Level Parallelism and Its Exploitation

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Pipelining. CSC Friday, November 6, 2015

EITF20: Computer Architecture Part2.2.1: Pipeline-1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Handout 2 ILP: Part B

Advanced issues in pipelining

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

1 Hazards COMP2611 Fall 2015 Pipelined Processor

CMSC411 Fall 2013 Midterm 2 Solutions

Multiple Instruction Issue. Superscalars

Advanced processor designs

Instruction Pipelining Review

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Pipelining and Vector Processing

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07

Preventing Stalls: 1

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Superscalar Processors Ch 14

PREPARED BY: S.SAKTHI, AP/IT

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Large and Fast: Exploiting Memory Hierarchy

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

CSE 490/590 Computer Architecture Homework 2

Dynamic Control Hazard Avoidance

LECTURE 3: THE PROCESSOR

CS6303 COMPUTER ARCHITECTURE QUESTION BANK

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Tutorial 11. Final Exam Review

Instr. execution impl. view

Tutorial 4 KE What are the differences among sequential access, direct access, and random access?

Main Points of the Computer Organization and System Software Module

CO Computer Architecture and Programming Languages CAPL. Lecture 15

TDT 4260 lecture 7 spring semester 2015

Transcription:

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight great ideas in computer architecture are: Design for Moore s Law Use Abstraction to Simplify Design Make the Common Case Fast Performance via Parallelism Performance via Pipelining Performance via Prediction Hierarchy of Memories Dependability via Redundancy 2. What are the five classic components of a computer? The five classic components of a computer are input, output, memory, datapath, and control, with the last two sometimes combined and called the processor. 3. Define ISA The instruction set architecture, or simply architecture of a computer is the interface between the hardware and the lowest-level software. It includes anything programmers need to know to make a binary machine language program work correctly, including instructions, I/O devices, and so on. 4. Define ABI Typically, the operating system will encapsulate the details of doing I/O, allocating memory, and other low-level system functions so that application programmers do not need to worry about such details. The combination of the basic instruction set and the operating system interface provided for application programmers is called the application binary interface (ABI). 5. What are the advantages of network computers? Networked computers have several major advantages: Communication: Information is exchanged between computers at high speeds. Resource sharing: Rather than each computer having its own I/O devices, computers on the network can share I/O devices. Nonlocal access: By connecting computers over long distances, users need not be near the computer they are using.

6. Define Response Time Response time is also called execution time. The total time required for the computer to complete a task, including disk accesses, memory accesses, I/O activities, operating system overhead, CPU execution time, and so on is called response time. 7. Define Throughput Throughput or bandwidth is the total amount of work done in a given time. 8. Write the CPU performance equation. The Classic CPU Performance Equation in terms of instruction count (the number of instructions executed by the program), CPI, and clock cycle time: 9. If computer A runs a program in 10 seconds, and computer B runs the same program in 15 seconds, how much faster is A over B.

10. What are the basic components of performance? The basic components of performance and how each is measured are: Components of Performance Units of measure CPU execution time for a program Seconds for the program Instruction count Instruction executed for the Clock cycles per instruction(cpi) program Average number of clock cycles per instruction Clock cycle time Seconds per clock cycle 11. Define MIPS Million Instructions Per Second (MIPS) is a measurement of program execution speed based on the number of millions of instructions. MIPS is computed as: 12. Define Addressing Modes The different ways in which the operands of an instruction are specified are called as addressing modes. The MIPS addressing modes are the following: 1. Immediate addressing 2. Register addressing 3. Base or displacement addressing 4. PC-relative addressing 5. Pseudo direct addressing

UNIT-II ARITHMETIC OPERATIONS 1. Define Moore s Law Moore s Law has provided so much more in resources that hardware designers can now build much faster multiplication and division hardware. Whether the multiplicand is to be added or not is known at the beginning of the multiplication by looking at each of the 32 multiplier bits. 2. What are the floating point instructions in MIPS? MIPS supports the IEEE 754 single precision and double precision formats with these instructions: Floating-point addition Floating-point subtraction Floating-point multiplication Floating-point division Floating-point comparison Floating-point branch 3. Define Guard and Round Guard is the first of two extra bits kept on the right during intermediate calculations of floating point numbers. It is used to improve rounding accuracy Round is a method to make the intermediate floating-point result fit the floating-point format; the goal is typically to find the nearest number that can be represented in the format. IEEE 754, therefore, always keeps two extra bits on the right during intermediate additions, called guard and round, respectively. 4. Define ULP Units in the Last Place is defined as the number of bits in error in the least significant bits of the significant between the actual number and the number that can be represented 5. What is meant by sub-word parallelism? Given that the parallelism occurs within a wide word, the extensions are classified as sub-word parallelism. It is also classified under the more general name of data level parallelism. They have been also called vector or SIMD, for single instruction, multiple data. The rising popularity of multimedia applications led to arithmetic instructions that support narrower operations that can easily operate in parallel. 7. Multiply 100010 * 100110.

8. Divide 1,001,010 10 by 1000 10. 9. What are the steps in the floating-point addition? The steps in the floating-point addition are 1. Align the decimal point of the number that has the smaller exponent. 2. Addition of the significands. 3. Normalize the sum. 4. Round the result. 10. Write the IEEE 754 floating point format. The IEEE 754 standard floating point representation is almost always an approximation of the real number.

UNIT-III PROCESSOR AND CONTROL UNIT 1. What is meant by data path element? A data path element is a unit used to operate on or hold data within a processor. In the MIPS implementation, the data path elements include the instruction and data memories, the register file, the ALU, and adders. 2. What is the use of PC register? Program Counter (PC) is the register containing the address of the instruction in the program being executed. 3. What is meant by register file? The processor s 32 general purpose registers are stored in a structure called a register file. A register file is a collection of registers in which any register can be read or written by specifying the number of the register in the file. The register file contains the register state of the computer. 4. What are the two state elements needed to store and access an instruction? 5. Draw the diagram of portion of datapath used for fetching instruction.

6. Define Sign Extend Sign-extend is used to increase the size of a data item by replicating the high- order sign bit of the original data item in the high order bits of the larger, destination data item. 7. What is meant by branch target address? Branch target address is the address specified in a branch, which becomes the new program counter (PC) if the branch is taken. In the MIPS architecture the branch target is given by the sum of the off set field of the instruction and the address of the instruction following the branch. 8. Differentiate branch taken frombranch not taken. Branch taken is a branch where the branch condition is satisfied and the program counter (PC) becomes the branch target. All unconditional jumps are taken branches. Branch not taken or (untaken branch) is a branch where the branch condition is false and the program counter (PC) becomes the address of the instruction that sequentially follows the branch. 9. What is meant by delayed branch? Delayed branch is a type of branch where the instruction immediately following the branch is always executed, independent of whether the branch condition is true or false 10. What are the three instruction classes and their instruction formats? The three instruction classes (R-type, load and store, and branch) use two different instruction formats. 11. Write the instruction format for the jump instruction. The destination address for a jump instruction is formed by concatenating the upper 4 bits of the current PC + 4 to the 26-bit address field in the jump instruction and adding 00 as the 2 low-order bits. 12. What is meant by pipelining? Pipelining is an implementation technique in which multiple instructions are overlapped in execution. Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the execution time of an individual instruction. 13. What is meant by forwarding? Forwarding, also called bypassing, is a method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer visible registers or memory.

14. What is pipeline stall? Pipeline stall, also called bubble, is a stall initiated in order to resolve a hazard. They can be seen elsewhere in the pipeline. 15. What is meant by branch prediction? Branch prediction is a method of resolving a branch hazard that assumes a given outcome for the branch and proceeds from that assumption rather than waiting to ascertain the actual outcome.

UNIT-IV PARALLELISM 1. What is meant by ILP? Pipelining exploits the potential parallelism among instructions. This parallelism is called instruction-level parallelism (ILP). There are two primary methods for increasing the potential amount of instruction-level parallelism. 1. Increasing the depth of the pipeline to overlap more instructions. 2. Multiple issue. 2. What is multiple issue? Write any two approaches. Multiple issue is a scheme whereby multiple instructions are launched in one clock cycle. It is a method for increasing the potential amount of instruction-level parallelism. It is done by replicating the internal components of the computer so that it can launch multiple instructions in every pipeline stage. The two approaches are: 1. Static multiple issue (at compile time) 2. Dynamic multiple issue (at run time) 3. What is meant by speculation? One of the most important methods for finding and exploiting more ILP is speculation. It is an approach whereby the compiler or processor guesses the outcome of an instruction to remove it as dependence in executing other instructions. For example, we might speculate on the outcome of a branch, so that instructions after the branch could be executed earlier. 4. Define Static Multiple Issue Static multiple issue is an approach to implement a multiple-issue processor where many decisions are made by the compiler before execution. 5. Define Issue Slots and Issue Packet Issue slots are the positions from which instructions could be issued in a given clock cycle. By analogy, these correspond to positions at the starting blocks for a sprint. Issue packet is the set of instructions that issues together in one clock cycle; the packet may be determined statically by the compiler or dynamically by the processor. 6. Define VLIW Very Long Instruction Word (VLIW) is a style of instruction set architecture that launches many operations that are defined to be independent in a single wide instruction, typically with many separate opcode fields. 7. Define Superscalar Processor Superscalar is an advanced pipelining technique that enables the processor to execute more than one instruction per clock cycle by selecting them during execution. Dynamic multiple-issue processors are also known as superscalar processors, or simply superscalars. 8. What is meant by loop unrolling? An important compiler technique to get more performance from loops is loop unrolling, where multiple copies of the loop body are made. After unrolling, there is more ILP available by overlapping instructions from different iterations. 9. What is meant by anti-dependence? How is it removed? Anti-dependence is an ordering forced by the reuse of a name, typically a register, rather than by a true dependence that carries a value between two instructions. It is also called as name dependence. Register renaming is the technique used to remove anti-dependence in

which the registers are renamed by the compiler or hardware. 10. What is the use of reservation station and reorder buffer? Reservation station is a buffer within a functional unit that holds the operands and the operation. Reorder buffer is the buffer that holds results in a dynamically scheduled processor until it is safe to store the results to memory or a register. 11. Differentiate in-order execution from out-of-order execution. Out-of-order execution is a situation in pipelined execution when an instruction is blocked from executing does not cause the following instructions to wait. It preserves the data flow order of the program. In-order execution requires the instruction fetch and decode unit to issue instructions in order, which allows dependences to be tracked, and requires the commit unit to write results to registers and memory in program fetch order. This conservative mode is called in-order commit. 12. What is meant by hardware multithreading? Hardware multithreading allows multiple threads to share the functional units of a single processor in an overlapping fashion to try to utilize the hardware resources efficiently. To permit this sharing, the processor must duplicate the independent state of each thread. It Increases the utilization of a processor. 13. What are the two main approaches to hardware multithreading? There are two main approaches to hardware multithreading. Fine-grained multithreading switches between threads on each instruction, resulting in interleaved execution of multiple threads. This interleaving is often done in a round-robin fashion, skipping any threads that are stalled at that clock cycle. Coarse-grained multithreading is an alternative to fine-grained multithreading. It switches threads only on costly stalls, such as last-level cache misses.

UNIT-VMEMORY AND I/O SYSTEMS 1. What are the temporal and spatial localities of references? Temporal locality (locality in time): if an item is referenced, it will tend to be referenced again soon. Spatial locality (locality in space): if an item is referenced, items whose addresses are close by will tend to be referenced soon. 2. Write the structure of memory hierarchy. 3. What are the various memory technologies? The various memory technologies are: 1. SRAM semiconductor memory 2. DRAM semiconductor memory 3. Flash semiconductor memory 4. Magnetic disk 4. Differentiate SRAM from DRAM. SRAMs are simply integrated circuits that are memory arrays with a single access port that can provide either a read or a write. SRAMs have a fixed access time to any datum.srams don t need to refresh and so the access time is very close to the cycle time. SRAMs typically use six to eight transistors per bit to prevent the information from being disturbed when read. SRAM needs only minimal power to retain the charge in standby mode. In a dynamic RAM (DRAM), the value kept in a cell is stored as a charge in a capacitor. A single transistor is then used to access this stored charge, either to read the value or to overwrite the charge stored there. Because DRAMs use only a single transistor per bit of storage, they are much denser and cheaper per bit than SRAM. 6. Define Rotational Latency Rotational latency, also called rotational delay, is the time required for the desired sector of a disk to rotate under the read/write head, usually assumed to be half the rotation time. 7. What is direct-mapped cache? Direct-mapped cache is a cache structure in which each memory location is mapped to exactly one location in the cache. For example, almost all direct- mapped caches use this mapping to find a block, (Block address) modulo (Number of blocks in the cache)

8. Consider a cache with 64 blocks and a block size of 16 bytes. To what block number does byte address 1200 map? 10. What are the writing strategies in cache memory? Write-through is a scheme in which writes always update both the cache and the next lower level of the memory hierarchy, ensuring that data is always consistent between the two. Write-back is a scheme that handles writes by updating values only to the block in the cache, then writing the modified block to the lower level of the hierarchy when the block is replaced. 11. What are the steps to be taken in an instruction cache miss? The steps to be taken on an instruction cache miss are Send the original PC value to the memory. Instruct main memory to perform a read and wait for the memory to complete its access 12. What is meant by virtual memory? Virtual memory is a technique that uses main memory as a cache for secondary storage. Two major motivations for virtual memory: to allow efficient and safe sharing of memory among multiple programs, and to remove the programming burdens of a small, limited amount of main memory. 13. Differentiate physical address from logical address. Physical address is an address in main memory. Logical address (or) virtual address is the CPU generated addresses that corresponds to a location in virtual space and is translated by address mapping to a physical address when memory is accessed. 14. Define Page Fault Page fault is an event that occurs when an accessed page is not present in main memory.