CS 1013 Advance Computer Architecture UNIT I

Size: px
Start display at page:

Download "CS 1013 Advance Computer Architecture UNIT I"

Transcription

1 CS 1013 Advance Computer Architecture UNIT I 1. What are embedded computers? List their characteristics. Embedded computers are computers that are lodged into other devices where the presence of the computer is not immediately obvious. These devices range from everyday machine to handheld digital devices. They have a wide range of processing power and cost. 2. Define Response time and Throughput. Response time is the time between the start and the completion of the event. Also referred to as execution time or latency. Throughput is the total amount of work done in a given amount of time. 3. Mention the use of transaction processing benchmarks It measures the ability of system to handle transactions, which consists of database accesses and updates. An airline reservation system and bank ATM are the examples of TP system. 4. State Amdalh s law. Amdalh s law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of time the faster mode can be used. 5. What are toy benchmarks? Toy benchmarks are typically between 10 and 100 lines of code and produce result the user already knows before running the toy program. E,g puzzle. 6. What is profile based static modeling? In this technique, a dynamic execution profile of the program, which indicates how often each instruction is executed, is maintained. 7. Suppose that we are considering an enhancement to the processor of a server system used for web serving. The new CPU is 10 times faster on computation in the web serving application than the original processor. Assuming that the original CPU is busy with computation 40% of the time and is waiting for I/O 60% of the time. What is the overall speedup gained by incorporating the enhancement? Fraction enhanced = 0.4 Speedup enhanced = 10 Speedup overall = 1/( /10) =1/0.64 = Explain the different types of locality. Temporal locality, states that recently accessed items are likely to be accessed in the near future.

2 Spatial locality, says that items whose addresses are near one another tend to be referenced close together in time. 9. Name the addressing modes used for signal processing? Modulo or circular addressing mode Bit reverse addressing mode. 10. Specify the CPU performance equation. CPU time = Instruction Count x Clock cycle Time x cycles per instruction 11. Explain the hybrid approach for encoding an instruction set? The hybrid approach reduces the variability in size and work of the variable architecture but provide multiple instruction lengths to reduce code size. 12. What are the registers used for MIPS processors. MIPS has 34, 64-bit general purpose registers (GPRs), named R0, R1 R31. GPRs are sometimes called as integer registers. There are also a set of 32 floating point registers (FPRs), named F0,F1.F31, which can hold 32 single precision values and 32 double precision values. 13. Explain the concept behind pipelining. Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. It takes advantage of parallelism that exists among actions needed to execute an instruction. 14. Write about pipe stages and processor cycle. Different steps in an instruction are completed in different parts of different instruction is parallel. Each of these steps is called a pipe stage or pipe segment. The time required between moving an instruction one step down the pipeline is called processor cycle. 15. Explain pipeline hazard and mention the different hazards in pipeline. Hazards are situations that prevent the next instruction in the instruction stream from executing during its designated clock cycle. Hazards reduce the overall performance from the ideal speedup gained by pipelining. The three classes of hazards are, Structural hazards. Data hazards. Control hazards. 16. Explain the concept of forwarding. Forwarding can be generalized to include passing a result directly to the functional unit that fetches it. The result is forwarded from the pipeline register corresponding to the output of one unit to the input of the same unit. 17. Mention the different schemes to reduce pipeline branch penalties. a. Freeze or flush the pipeline b. Treat every branch as not taken

3 c. Treat every branch as taken d. Delayed branch 18. Consider an unpipelined processor. Assume that it has a 1ns clock cycle and that it uses 4 cycles for ALU operations and branches and 5 cycles for memory operations. Assume that the relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose that due to clock skew and setup, pipelining the processor adds 0.2 ns of overhead to the clock. Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline? The average instruction execution time on an unpipelined processor is = clock cycle x Average CPI = 1 ns x ((40% x 4)+(20 x 4)+(40 x 5)) = 4.4 ns. The average instruction execution time on an pipelined processor is = 1+0.2ns = 1.2ns Speedup = Avg. instruction time unpipelined/ Avg. instruction time pipelined = 4.4/1.2 = 3.7 times. 19. Briefly explain the different conventions for ordering the bytes within a larger object? Little endian byte order puts the byte whose address is x.x000 at the least significant position in the double word. Big endian byte order puts the byte whose address is x.x000 at the most significant position in the double word. 20. When do data hazards arise? Data hazards arise when an instruction depends on the results of a previous instruction in a way that is expressed by the overlapping of instructions in the pipeline. 1. List the various data dependence. Data dependence Name dependence Control Dependence UNIT II 2. What is Instruction Level Parallelism? Pipelining is used to overlap the execution of instructions and improve performance. This potential overlap among instructions is called instruction level parallelism (ILP) since the instruction can be evaluated in parallel. 3. Give an example of control dependence? if p1 {s1;} if p2 {s2;}

4 S1 is control dependent on p1, and s2 is control dependent on p2. 4. What is the limitation of the simple pipelining technique? These technique uses in-order instruction issue and execution. Instructions are issued in program order, and if an instruction is stalled in the pipeline, no later instructions can proceed. 5. Briefly explain the idea behind using reservation station? Reservation station fetches and buffers an operand as soon as available, eliminating the need to get the operand from a register. 6. Give an example for data dependence. Loop: L.D F0,0(R1) ADD.D F4,F0,F2 S.D F4,0(R1) DADDUI R1,R1,#-8 BNE R1,R2, loop 7. Explain the idea behind dynamic scheduling? In dynamic scheduling the hardware rearranges the instruction execution to reduce the stalls while maintaining data flow and exception behavior. 8. Mention the advantages of using dynamic scheduling? It enables handling some cases when dependences are unknown at compile time and it simplifies the compiler. It allows code that was compiled with one pipeline in mind run efficiently on a different pipeline. 9. What are the possibilities for imprecise exception? The pipeline may have already completed instructions that are later in program order than instruction causing exception. The pipeline may have not yet completed some instructions that are earlier in program order than the instructions causing exception. 10. What are multilevel branch predictors? These predictors use several levels of branch-prediction tables together with an algorithm for choosing among the multiple predictors. 11. What are branch-target buffers? To reduce the branch penalty we need to know from what address to fetch by end of IF (instruction fetch). A branch prediction cache that stores the predicted address for the next instruction after a branch is called a branch-target buffer or branch target cache. 12. Briefly explain the goal of multiple-issue processor? The goal of multiple issue processors is to allow multiple instructions to issue in a clock cycle. They come in two flavors: superscalar processors and VLIW processors.

5 13. What is speculation? Speculation allows execution of instruction before control dependences are resolved. 14. Mention the purpose of using Branch history table? It is a small memory indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not. 15. What are super scalar processors? Superscalar processors issue varying number of instructions per clock and are either statically scheduled or dynamically scheduled. 16. Mention the idea behind hardware-based speculation? It combines three key ideas: dynamic branch prediction to choose which instruction to execute, speculation to allow the execution of instructions before control dependences are resolved and dynamic scheduling to deal with the scheduling of different combinations of basic blocks. 17. What are the fields in the ROB? Instruction type Destination field Value field Ready field 18. How many branch selected entries are in a (2,2) predictors that has a total of 8K bits in a prediction buffer? 2 2 x 2 x number of prediction entries selected by the branch = 8K number of prediction entries selected by the branch = 1K 19. What is the advantage of using instruction type field in ROB? The instruction field specifies whether instruction is a branch or a store or a register operation 20. Mention the advantage of using tournament based predictors? The advantage of tournament predictor is its ability to select the right predictor for right branch. UNIT III 1. What is loop unrolling? A simple scheme for increasing the number of instructions relative to the branch and overhead instructions is loop unrolling. Unrolling simply replicates the loop body multiple times, adjusting the loop termination code. 2. When static branch predictors are used?

6 They are used in processors where the expectation is that the branch behavior is highly predictable at compile time. Static predictors are also used to assists dynamic predictors. 3. Mention the different methods to predict branch behavior? Predict the branch as taken Predict on basis of branch direction (either forward or backward) Predict using profile information collected from earlier runs. 4. Explain the VLIW approach? They uses multiple, independent functional units. Rather than attempting to issue multiple, independent instructions to the units, a VLIW packages the multiple operations into one very long instruction. 5. Mention the techniques to compact the code size in instructions? Using encoding techniques Compress the instruction in main memory and expand them when they are read into the cache or are decoded. 6. Mention the advantage of using multiple issue processor? They are less expensive. They have cache based memory system. More parallelism. 7. What are loop carried dependence? They focuses on determining whether data accesses in later iterations are dependent on data values produced in earlier iterations; such a dependence is called loop carried dependence. e.g for(i=1000;i>0;i=i-1) x[i]=x[i]+s; 8. Mention the tasks involved in finding dependences in instructions? Good scheduling of code. Determining which loops might contain parallelism Eliminating name dependence 9. Use the G.C.D test to determine whether dependence exists in the following loop: for(i=1;i<=100;i=i+1) X[2*i+3]=X[2*i]*5.0; Solution: a=2,b=3,c=2,d=0 GCD(a,c)=2 and d-b=-3 Since 2 does not divide -3, no dependence is possible. 10. What is software pipelining? Software pipelining is a technique for reorganizing loops such that each iteration in the software pipelined code is made from instruction chosen from different iterations of the original loop.

7 11. What is global code scheduling? Global code scheduling aims o compact code fragment with internal control structure into the shortest possible sequence that preserves the data and control dependence. Finding a shortest possible sequence is finding the shortest sequence for the critical path. 12. What is trace? Trace selection tries to find a likely sequence of basic blocks whose operations will be put together into a smaller number of instructions; this sequence is called trace. 13. Mention the steps followed in trace scheduling? Trace selection Trace compaction 14. What is superblock? Superblocks are formed by a process similar to that used for traces, but are a form of extended basic block, which are restricted to a single entry point but allow multiple exits. 15. Mention the advantages of predicated instructions? Remove control dependence Maintain data flow enforced by branch Reduce overhead of global code scheduling 16. Mention the limitations of predicated instructions? They are useful only when the predicate can be evaluated early. Predicated instructions may have speed penalty. 17. What is poison bit? Poison bits are a set of status bits that are attached to the result registers written by the speculated instruction when the instruction causes exceptions. The poison bits cause a fault hen a normal instruction attempts to use the register. 18. What are the disadvantages of supporting speculation in hardware? Complexity Additional hardware resources required 19. Mention the methods for preserving exception behavior? Ignore Exception Instructions that never raise exceptions are used Using poison bits Using hardware buffers 20. What is an instruction group? It is a sequence of consecutive instructions with no register data dependence among them. All the instructions in the group could be executed in parallel. An instruction group can be arbitrarily long.

8 UNIT IV 1. What is cache miss and cache hit? When the CPU finds a requested data item in the cache, it is called cache miss. When the CPU does not find that data item it needs in the cache, a cache miss occurs. 2. What is write through and write back cache? Write through- the information is written to both the block in the cache and to the block in the lower level memory. write back- The information is written only to the block in the cahce. The modified cache block is written to main memory only when it is replaced. 3. What is miss rate and miss penalty? Miss rate is the fraction of cache access that result in a miss. Miss penalty depends on the number of misses and clock per miss. 4. Give the equation for average memory access time? Average memory access time= Hit time + Miss rate x Miss penalty 5. What is striping? Spreading multiple data over multiple disks is called striping, which automatically forces accesses to several disks. 6. Mention the problems with disk arrays? When devices increases, dependability increases Disk arrays become unusable after a single failure 7. What is hot spare? Hot spares are extra disks that are not used in normal operation. When failure occurs, an idle hot spare is pressed into service. Thus, hot spares reduce the MTTR. 8. What is mirroring? Disks in the configuration are mirrored or copied to another disk. With this arrangement data on the failed disks can be replaced by reading it from the other mirrored disks. 9. Mention the drawbacks with mirroring? Writing onto the disk is slower Since the disks are not synchronized seek time will be different Imposes 50% space penalty hence expensive. 10. Mention the factors that measure I/O performance measures? Diversity Capacity

9 Response time Throughput Interference of I/o with CPU execution 11. What is transaction time? The sum of entry time, response time and think time is called transaction time. 12. State little s law? Littles law relates the average number of tasks in the system. Average arrival rate of new asks. Average time to perform a task. 13. Give the equation for mean number of tasks in the system? Mean number of arrival in the system = Arrival rate x Mean response time. 14. What is server utilization? Mean number of tasks being serviced divided by service rate Server utilization = Arrival Rate/Server Rate The value should be between 0 and 1 otherwise there would be more tasks arriving than could be serviced. 15. What are the steps to design an I/O system? Naïve cost-performance design and evaluation Availability of naïve design Response time Realistic cost-performance, design and evaluation Realistic design for availability and its evaluation. 16. Briefly discuss about classification of buses? I/O buses - These buses are lengthy ad have any types of devices connected to it. CPU memory buses They are short and generally of high speed. 17. Explain about bus transactions? Read transaction Transfer data from memory Write transaction Writes data to memory 18. What is the bus master? Bus masters are devices that can initiate the read or write transaction. E.g CPU is always a bus master. The bus cn have many masters when there are multiple CPU s and when the Input devices can initiate bus transaction. 19. Mention the advantage of using bus master? It offers higher bandwidth by using packets, as opposed to holding the bus for full transaction. 20. What is spilt transaction? The idea behind this is to split the bus into request and replies, so that the bus can be used in the time between request and the reply

10 UNIT V 1. What are multiprocessors? Mention the categories of multiprocessors? Multiprocessors are used to increase performance and improve availability. The different categories are SISD, SIMD, MISD, MIMD 2. What are threads? These are multiple processors executing a single program and sharing the code and most of their address space. When multiple processors share code and data in the way, they are often called threads. 3. What is cache coherence problem? Two different processors have two different values for the same location. 4. What are the protocols to maintain coherence? Directory based protocol Snooping Protocol 5. What are the ways to maintain coherence using snooping protocol? Write Invalidate protocol Write update or write broadcast protocol 6. What is write invalidate and write update? Write invalidate provide exclusive access to caches. This exclusive caches ensure that no other readable or writeable copies of an item exists when the write occurs. Write update updates all cached copies of a data item when that item is written. 7. What are the disadvantages of using symmetric shared memory? Compiler mechanisms are very limited Larger latency for remote memory access Fetching multiple words in a single cache block will increase the cost. 8. Mention the information in the directory? It keeps the state of each block that are cached. It keeps track of which caches have copies of the block. 9. What the operations that a directory based protocol handle? Handling read miss Handling a write to a shares clean cache block 10. What are the states of cache block? Shared, Uncached, Exclusive

11 11. What are the uses of having a bit vector? When a block is shared, the bit vector indicates whether the processor has the copy of the block. When block is in exclusive state, bit vector keep track of the owner of the block. 12. When do we say that a cache block is exclusive? When exactly one processor has the copy of the cached block, and it has written the block. The processor is called the owner of the block. 13. Explain the types of messages that can be send between the processors and directories? Local node Node where the requests originates Home Node Node where memory location and directory entry of the address resides. Remote Node - The copy of the block in the third node called remote node 14. What is consistency? Consistency says in what order must a processor observe the data writes of another processor. 15. Mention the models that are used for consistency? Sequential consistency Relaxed consistency model 16. What is sequential consistency? It requires that the result of any execution be the same, as if the memory accesses executed by each processor were kept in order and the accesses among different processors were interleaved. 17. What is relaxed consistency model? Relaxed consistency model allows reads and writes to be executed out of order. The three sets of ordering are: W-> R ordering W->W ordering R->W and R-> R ordering. 18. What is multi threading? Multithreading allows multiple threads to share the functional uits of the single processor in an overlapping fashion. 19. What is fine grained multithreading? It switches between threads on each instruction, causing the execution of multiple threads to be interleaved. 20. What is coarse grained multithreading? It switches threads only on costly stalls. Thus it is much less likely to slow down the execution of an individual thread.

12 UNIT I 1. Discuss the different ways how instruction set architecture can be classified? Stack Architecture Accumulator Architecture Register-Memory Architecture Register-Register Architecture 2. Explain about memory addressing and discuss the different addressing modes in instruction set architecture? Little endian, and big endian Register, immediate, displacement, register indirect, Indexed, direct or absolute, Memory indirect, auto increment, auto decrement. Scaled addressing modes 3. Explain the operands and operation used for media and signal processing? Operands - Fixed point, Blocked floating point Operations - Partitioned add operation, SIMD (single instruction multiple data), paired single operation, 4. Explain with examples the various hazards in pipelining? Data hazard, structural hazards, Control hazards. 5. Discuss in detail about data hazards and Explain the technique used to overcome data hazard? Data hazards how does it occur with an example Forwarding technique UNIT II 6. What is instruction-level parallelism? Explain in details about the various dependences caused in ILP? Define ILP Various dependences include Data dependence, Name dependence and control dependence. 7. Discuss about tomasulo s algorithm to over come data hazard using dynamic scheduling? Dynamic scheduling with an example. How data hazard is caused. Architecture Reservation station. 8. Explain how to reduce branch cost with dynamic hardware prediction? Basic branch prediction and prediction buffers Correlating branch predictors Tournament based predictors 9. Explain how hardware based speculation is used to overcome control dependence? Ideas in hardware based speculation dynamic scheduling, speculation, dynamic branch prediction. Architecture and example Reorder buffer. 10. Explain the limitations of ILP?

13 Hardware model Window size and maximum issue count Realistic branch and jump prediction Effects of finite registers Effects of imperfect alias analysis UNIT III 11. Explain loop unrolling with an example? Loop unrolling technique Example 12. Discuss about the VLIW approach? VLIW approach basic idea. Limitations and problems. Technical problem and logistical problem 13. Explain the different techniques to exploit and expose more parallelism using compiler support? Loop-Level parallelism Software pipelining Global code scheduling Trace scheduling and superblocks. 14. Explain how hardware support for exposing more parallelism at compile time? Conditional or predicated instructions Compiler speculation with hardware support 15. Differentiate hardware and software speculation mechanisms? Control flow Exception conditions Bookkeeping UNIT IV 16. Explain the different technique to reduce cache miss penalty? Multiple caches Critical word first and early restart Giving priority to read misses over writes. Merging write buffers Victim caches 17. Explain the different technique to reduce miss rate? Larger block size Larger caches Higher associativity Way prediction and pseudoassociative caches Compiler optimization 18. Discuss how main memory is organized to improve performance? Wider main memory Simple interleaved memory Independent memory banks 19. Explain the various levels of RAID? No redundancy Mirroring

14 Bit-interleaved parity Block- interleaved parity P+Q redundancy 20. Explain the various ways to measure I/O performance? Throughput versus response time Little queuing theory UNIT V 21. Explain the snooping protocol with a state diagram? Basic schemes for enforcing cache coherence Cache coherence definition Snooping protocol An example protocol state diagram 22. Explain the directory based protocol with a state diagram? Directory based cache coherence protocol basics Example directory protocol 23. Define synchronization and explain the different mechanisms employed for synchronization among processors? Basic hardware primitive Implementing locks using coherence Barrier synchronization Hardware and software implementation 24. Discuss about the different models for memory consistency? Sequential consistency model Relaxed consistency model 25. How is multithreading used to exploit thread level parallelism within a processor? Multithreading definition Fine grained multithreading Coarse grained multithreading Simultaneous multithreading

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III/VI Section : CSE-1 & CSE-2 Subject Code : CS2354 Subject Name : Advanced Computer Architecture Degree & Branch : B.E C.S.E. UNIT-1 1.

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer

anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer Contents advanced anced computer architecture i FOR m.tech (jntu - hyderabad & kakinada) i year i semester (COMMON TO ECE, DECE, DECS, VLSI & EMBEDDED SYSTEMS) CONTENTS UNIT - I [CH. H. - 1] ] [FUNDAMENTALS

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

ILP: Instruction Level Parallelism

ILP: Instruction Level Parallelism ILP: Instruction Level Parallelism Tassadaq Hussain Riphah International University Barcelona Supercomputing Center Universitat Politècnica de Catalunya Introduction Introduction Pipelining become universal

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

Tutorial 11. Final Exam Review

Tutorial 11. Final Exam Review Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches

More information

計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches

計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches 4.1 Basic Compiler Techniques for Exposing ILP 計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches 吳俊興高雄大學資訊工程學系 To avoid a pipeline stall, a dependent instruction must be

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

TDT 4260 lecture 7 spring semester 2015

TDT 4260 lecture 7 spring semester 2015 1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding

More information

Hiroaki Kobayashi 12/21/2004

Hiroaki Kobayashi 12/21/2004 Hiroaki Kobayashi 12/21/2004 1 Loop Unrolling Static Branch Prediction Static Multiple Issue: The VLIW Approach Software Pipelining Global Code Scheduling Trace Scheduling Superblock Scheduling Conditional

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Instruction Level Parallelism (ILP)

Instruction Level Parallelism (ILP) 1 / 26 Instruction Level Parallelism (ILP) ILP: The simultaneous execution of multiple instructions from a program. While pipelining is a form of ILP, the general application of ILP goes much further into

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Super Scalar. Kalyan Basu March 21,

Super Scalar. Kalyan Basu March 21, Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build

More information

EECC551 Exam Review 4 questions out of 6 questions

EECC551 Exam Review 4 questions out of 6 questions EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving

More information

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM UNIT III MULTIPROCESSORS AND THREAD LEVEL PARALLELISM 1. Symmetric Shared Memory Architectures: The Symmetric Shared Memory Architecture consists of several processors with a single physical memory shared

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model. Performance of Computer Systems CSE 586 Computer Architecture Review Jean-Loup Baer http://www.cs.washington.edu/education/courses/586/00sp Performance metrics Use (weighted) arithmetic means for execution

More information

Static vs. Dynamic Scheduling

Static vs. Dynamic Scheduling Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

MIPS ISA AND PIPELINING OVERVIEW Appendix A and C

MIPS ISA AND PIPELINING OVERVIEW Appendix A and C 1 MIPS ISA AND PIPELINING OVERVIEW Appendix A and C OUTLINE Review of MIPS ISA Review on Pipelining 2 READING ASSIGNMENT ReadAppendixA ReadAppendixC 3 THEMIPS ISA (A.9) First MIPS in 1985 General-purpose

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

Advanced Pipelining and Instruction- Level Parallelism 4

Advanced Pipelining and Instruction- Level Parallelism 4 4 Advanced Pipelining and Instruction- Level Parallelism 4 Who s first? America. Who s second? Sir, there is no second. Dialog between two observers of the sailing race later named The America s Cup and

More information

The basic structure of a MIPS floating-point unit

The basic structure of a MIPS floating-point unit Tomasulo s scheme The algorithm based on the idea of reservation station The reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

Instruction-Level Parallelism. Instruction Level Parallelism (ILP)

Instruction-Level Parallelism. Instruction Level Parallelism (ILP) Instruction-Level Parallelism CS448 1 Pipelining Instruction Level Parallelism (ILP) Limited form of ILP Overlapping instructions, these instructions can be evaluated in parallel (to some degree) Pipeline

More information

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1 Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]

More information

UNIT I 1.What is ILP ILP = Instruction level parallelism multiple operations (or instructions) can be executed in parallel

UNIT I 1.What is ILP ILP = Instruction level parallelism multiple operations (or instructions) can be executed in parallel DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY CS2354 ADVANCED COMPUTER ARCHITECTURE PART-A QUESTIONS WITH ANSWERS UNIT I 1.What is ILP ILP = Instruction level parallelism multiple operations

More information

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

CS 2410 Mid term (fall 2018)

CS 2410 Mid term (fall 2018) CS 2410 Mid term (fall 2018) Name: Question 1 (6+6+3=15 points): Consider two machines, the first being a 5-stage operating at 1ns clock and the second is a 12-stage operating at 0.7ns clock. Due to data

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation

More information

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches Session xploiting ILP with SW Approaches lectrical and Computer ngineering University of Alabama in Huntsville Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar,

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence

More information

/ : Computer Architecture and Design Fall Final Exam December 4, Name: ID #:

/ : Computer Architecture and Design Fall Final Exam December 4, Name: ID #: 16.482 / 16.561: Computer Architecture and Design Fall 2014 Final Exam December 4, 2014 Name: ID #: For this exam, you may use a calculator and two 8.5 x 11 double-sided page of notes. All other electronic

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Hiroaki Kobayashi // As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Branches will arrive up to n times faster in an n-issue processor, and providing an instruction

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

CS 614 COMPUTER ARCHITECTURE II FALL 2005

CS 614 COMPUTER ARCHITECTURE II FALL 2005 CS 614 COMPUTER ARCHITECTURE II FALL 2005 DUE : November 9, 2005 HOMEWORK III READ : - Portions of Chapters 5, 6, 7, 8, 9 and 14 of the Sima book and - Portions of Chapters 3, 4, Appendix A and Appendix

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false. CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY Department of Computer science and engineering Year :II year CS6303 COMPUTER ARCHITECTURE Question Bank UNIT-1OVERVIEW AND INSTRUCTIONS PART-B

More information

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK SUBJECT : CS6303 / COMPUTER ARCHITECTURE SEM / YEAR : VI / III year B.E. Unit I OVERVIEW AND INSTRUCTIONS Part A Q.No Questions BT Level

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Computer Science 246 Computer Architecture

Computer Science 246 Computer Architecture Computer Architecture Spring 2009 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Compiler ILP Static ILP Overview Have discussed methods to extract ILP from hardware Why can t some of these

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

Final Lecture. A few minutes to wrap up and add some perspective

Final Lecture. A few minutes to wrap up and add some perspective Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection

More information

Cache Optimisation. sometime he thought that there must be a better way

Cache Optimisation. sometime he thought that there must be a better way Cache sometime he thought that there must be a better way 2 Cache 1. Reduce miss rate a) Increase block size b) Increase cache size c) Higher associativity d) compiler optimisation e) Parallelism f) prefetching

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two? Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the

More information

Static Compiler Optimization Techniques

Static Compiler Optimization Techniques Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed at improving pipelined CPU performance: Static pipeline scheduling. Loop unrolling. Static branch

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.

More information

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Dealing With Control Hazards Simplest solution to stall pipeline until branch is resolved and target address is calculated

More information

(1) Measuring performance on multiprocessors using linear speedup instead of execution time is a good idea.

(1) Measuring performance on multiprocessors using linear speedup instead of execution time is a good idea. 1. (11) True or False: (1) DRAM and Disk access times are rapidly converging. (1) Measuring performance on multiprocessors using linear speedup instead of execution time is a good idea. (1) Amdahl s law

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

Alexandria University

Alexandria University Alexandria University Faculty of Engineering Computer and Communications Department CC322: CC423: Advanced Computer Architecture Sheet 3: Instruction- Level Parallelism and Its Exploitation 1. What would

More information

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lecture 29 Review CPU time: the best metric Be sure you understand CC, clock period Common (and good) performance metrics Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3

More information

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson

More information

RAID 0 (non-redundant) RAID Types 4/25/2011

RAID 0 (non-redundant) RAID Types 4/25/2011 Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security

More information

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Lecture 26: Parallel Processing. Spring 2018 Jason Tang Lecture 26: Parallel Processing Spring 2018 Jason Tang 1 Topics Static multiple issue pipelines Dynamic multiple issue pipelines Hardware multithreading 2 Taxonomy of Parallel Architectures Flynn categories:

More information

Chapter 3: Instruction Level Parallelism (ILP) and its exploitation. Types of dependences

Chapter 3: Instruction Level Parallelism (ILP) and its exploitation. Types of dependences Chapter 3: Instruction Level Parallelism (ILP) and its exploitation Pipeline CPI = Ideal pipeline CPI + stalls due to hazards invisible to programmer (unlike process level parallelism) ILP: overlap execution

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information