ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define Instruction-Level Parallelism Data dependences and hazards o Data dependences o Name dependences o Data hazards Control Dependences 2. Explain dynamic scheduling using Tomasulo s approach. Explain the 3 steps: o Issue o Execute o Write result Explain the 7 fields of reservation station Figure: The basic structure of a MIPS floating-point unit using Tomasulo s algorithm. 3. Explain the techniques for reducing branch costs with dynamic hardware prediction. Define basic branch prediction and branch-prediction buffers. Figure: The states in a 2-bit prediction scheme Correlating branch predictors Tournament predictors: Adaptively combining local and global predictors Figure: state transition diagram for tournament predictors with 4 states. 4. Explain in detail about hardware-based speculation. Define hardware speculation, instruction commit, reorder buffer. Four steps involved in instruction execution. o Issue o Execute o Write result o Commit Figure: The basic structure of a MIPS FP unit using Tomasulo s algorithm and extended to handle speculation Multiple issue with speculation 5. Explain in detail about basic compiler techniques for exposing ILP.

Basic pipeline scheduling and loop unrolling. Example codes Using loop unrolling and pipeline scheduling with static multiple issue 6. Explain in detail about static multiple issue using VLIW approach. Define VLIW. The basic VLIW approach: o Explain about the registers used. o Functional units used. o Complex global scheduling. Example code Technical and logistical problems. 7. Explain in detail about advanced compiler support for exposing and exploiting ILP. Detecting and Enhancing loop-level parallelism. o Finding dependences o Eliminating dependent computations. Software pipelining: Symbolic loop unrolling o Example code fragment. Global code scheduling o Trace scheduling: focusing on the critical path. o Super blocks o Example code fragment. 8. Explain in detail about hardware support for exposing more parallelism at compile time. Conditional or Predicated instructions o Example codes Compiler speculation with hardware support. o Hardware support for preserving exception behavior o Hardware support for memory reference speculation o Example codes. 9. Explain in detail about the Intel IA-64 instruction set architecture. The IA-64 register model Instruction format and support for explicit parallelism. Instruction set basics Predication and Speculation support The Itanium processor o Functional units and instruction issue o Itanium performance 10. Explain the limitations of ILP. Hardware model Limitations of the window size and maximum issue count.

Effects of realistic branch and jump prediction Effects of finite register. Effects of imperfect alias analysis. 11. Explain in detail about symmetric shared memory architecture. Define multiprocessor cache coherence. Basic schemes for enforcing coherence. o Define directory based o Define snooping Snooping protocols. Basic implementation techniques. An example protocol. 12. Explain the performance of symmetric shared-memory multiprocessors. Define true sharing and false sharing. Performance measurements of the commercial workload. Performance of the multiprogramming and OS workload. Performance for the scientific/technical workload. 13. Explain in detail about synchronization. Basic hardware primitives. o Define atomic exchange. o Define test and set, fetch-and-increment, load linked and store conditional instructions. Implementing locks using coherence. Synchronization performance challenges. o Barrier synchronization o Code for simple and sense reversing barrier. Synchronization mechanisms for larger-scale multiprocessors. o Software implementations o Hardware primitives 14. Explain the models of memory consistency. Sequential consistency. Relaxed consistency models. o W->R ordering o W->W ordering o R->W and R->R ordering 15. Explain the performance of symmetric shared-memory and distributed shared-memory multiprocessors. Symmetric shared-memory multiprocessors: Define true sharing and false sharing. Performance measurements of the commercial workload.

Performance of the multiprogramming and OS workload. Performance for the scientific/technical workload. Distributed shared-memory multiprocessor: Miss rate Memory access cost unit 16. Explain in detail about reducing cache miss penalty. First miss penalty reduction technique: multilevel caches. Second miss penalty reduction technique: critical word first and early restart. Third miss penalty reduction technique: giving priority to read misses over writes. Fourth miss penalty reduction technique: merging write buffer Fifth miss penalty reduction technique: victim caches 17. Explain in detail about reducing miss rate. First miss rate reduction technique: Larger block size. Second miss rate reduction technique: Larger caches. Third miss rate reduction technique: Higher associativity. Fourth miss rate reduction technique: Way prediction and Pseudoassociative caches Fifth miss rate reduction technique: Compiler optimizations. o Loop interchange o Blocking 18. Explain in detail about memory technology. DRAM technology. SRAM technology. Embedded processor memory technology: ROM and Flash Improving memory performance in a standard DRAM chip Improving memory performance via a new DRAM interface: RAMBUS Comparing RAMBUS and DDR SDRAMDes. 19.Explain the types of storage devices. Magnetic Disks The future of magnetic disks. Optical disks Magnetic tapes Automated tape libraries Flash memory 20. Explain in detail about Buses-Connecting I/O devices to CPU/Memory Bus design decisions Bus standards Interfacing storage devices to the CPU- Figure: A typical interface of I/O devices and an I/O bus to the CPU-memory bus.

Delegating I/O responsibility from the CPU 21. Explain in detail about SMT. Converting thread-level parallelism to instruction-level parallelism Design challenges in SMT processors Potential performance advantages from SMT 22. Explain about CMP architecture. Define CMP Architecture Explanation 23. Explain detail about software and hardware multithreading o Software multithreading o Hardware multithreading o Explanation 24. Explain about heterogeneous multi core processor. o Define multi core processor. o heterogeneous multi core processor o Diagram 25. Explain about IBM cell processor Define cell processor Architecture Explanation