ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Similar documents
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer

CS 1013 Advance Computer Architecture UNIT I

(1) Measuring performance on multiprocessors using linear speedup instead of execution time is a good idea.

UNIT I (Two Marks Questions & Answers)

CS 654 Computer Architecture Summary. Peter Kemper

Keywords and Review Questions

Computer Architecture Spring 2016

Instruction Level Parallelism (ILP)

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

Hardware-based Speculation

CS425 Computer Systems Architecture

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

CMSC411 Fall 2013 Midterm 2 Solutions

Reader's Guide Outline of the Book A Roadmap For Readers and Instructors Why Study Computer Organization and Architecture Internet and Web Resources

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Getting CPI under 1: Outline

Exploitation of instruction level parallelism

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

Tutorial 11. Final Exam Review

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA

LIMITS OF ILP. B649 Parallel Architectures and Programming

UNIT I 1.What is ILP ILP = Instruction level parallelism multiple operations (or instructions) can be executed in parallel

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

RECAP. B649 Parallel Architectures and Programming

A Cache Hierarchy in a Computer System

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Copyright 2012, Elsevier Inc. All rights reserved.

CS2253 COMPUTER ORGANIZATION AND ARCHITECTURE 1 KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

Dynamic Scheduling. CSE471 Susan Eggers 1

Pollard s Attempt to Explain Cache Memory

Super Scalar. Kalyan Basu March 21,

CSE502 Graduate Computer Architecture. Lec 22 Goodbye to Computer Architecture and Review

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

Control Hazards. Branch Prediction

Control Hazards. Prediction

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

ECE 505 Computer Architecture

Computer Architecture

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

Lecture-13 (ROB and Multi-threading) CS422-Spring

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

5008: Computer Architecture

IF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4

Chapter 5. Multiprocessors and Thread-Level Parallelism

CS425 Computer Systems Architecture

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

COURSE DELIVERY PLAN - THEORY Page 1 of 6

Instruction Level Parallelism

Parallel Architecture. Hwansoo Han

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

CS422 Computer Architecture

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

Memory Hierarchy Basics

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

UCI. Intel Itanium Line Processor Efforts. Xiaobin Li. PASCAL EECS Dept. UC, Irvine. University of California, Irvine

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 5)

Computer Architecture Area Fall 2009 PhD Qualifier Exam October 20 th 2008

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Superscalar Processors

Data Speculation Support for a Chip Multiprocessor Lance Hammond, Mark Willey, and Kunle Olukotun

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline


In embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.

Computer Architecture A Quantitative Approach

Instruction-Level Parallelism and Its Exploitation

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

LECTURE 5: MEMORY HIERARCHY DESIGN

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

ECE/CS 757: Homework 1

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov

Superscalar Machines. Characteristics of superscalar processors

Last lecture. Some misc. stuff An older real processor Class review/overview.

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

COSC4201. Multiprocessors and Thread Level Parallelism. Prof. Mokhtar Aboelaze York University

Copyright 2012, Elsevier Inc. All rights reserved.

1. (10) True or False: (1) It is possible to have a WAW hazard in a 5-stage MIPS pipeline.

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

IA-64, P4 HT and Crusoe Architectures Ch 15

Advanced Computer Architecture

Transcription:

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define Instruction-Level Parallelism Data dependences and hazards o Data dependences o Name dependences o Data hazards Control Dependences 2. Explain dynamic scheduling using Tomasulo s approach. Explain the 3 steps: o Issue o Execute o Write result Explain the 7 fields of reservation station Figure: The basic structure of a MIPS floating-point unit using Tomasulo s algorithm. 3. Explain the techniques for reducing branch costs with dynamic hardware prediction. Define basic branch prediction and branch-prediction buffers. Figure: The states in a 2-bit prediction scheme Correlating branch predictors Tournament predictors: Adaptively combining local and global predictors Figure: state transition diagram for tournament predictors with 4 states. 4. Explain in detail about hardware-based speculation. Define hardware speculation, instruction commit, reorder buffer. Four steps involved in instruction execution. o Issue o Execute o Write result o Commit Figure: The basic structure of a MIPS FP unit using Tomasulo s algorithm and extended to handle speculation Multiple issue with speculation 5. Explain in detail about basic compiler techniques for exposing ILP.

Basic pipeline scheduling and loop unrolling. Example codes Using loop unrolling and pipeline scheduling with static multiple issue 6. Explain in detail about static multiple issue using VLIW approach. Define VLIW. The basic VLIW approach: o Explain about the registers used. o Functional units used. o Complex global scheduling. Example code Technical and logistical problems. 7. Explain in detail about advanced compiler support for exposing and exploiting ILP. Detecting and Enhancing loop-level parallelism. o Finding dependences o Eliminating dependent computations. Software pipelining: Symbolic loop unrolling o Example code fragment. Global code scheduling o Trace scheduling: focusing on the critical path. o Super blocks o Example code fragment. 8. Explain in detail about hardware support for exposing more parallelism at compile time. Conditional or Predicated instructions o Example codes Compiler speculation with hardware support. o Hardware support for preserving exception behavior o Hardware support for memory reference speculation o Example codes. 9. Explain in detail about the Intel IA-64 instruction set architecture. The IA-64 register model Instruction format and support for explicit parallelism. Instruction set basics Predication and Speculation support The Itanium processor o Functional units and instruction issue o Itanium performance 10. Explain the limitations of ILP. Hardware model Limitations of the window size and maximum issue count.

Effects of realistic branch and jump prediction Effects of finite register. Effects of imperfect alias analysis. 11. Explain in detail about symmetric shared memory architecture. Define multiprocessor cache coherence. Basic schemes for enforcing coherence. o Define directory based o Define snooping Snooping protocols. Basic implementation techniques. An example protocol. 12. Explain the performance of symmetric shared-memory multiprocessors. Define true sharing and false sharing. Performance measurements of the commercial workload. Performance of the multiprogramming and OS workload. Performance for the scientific/technical workload. 13. Explain in detail about synchronization. Basic hardware primitives. o Define atomic exchange. o Define test and set, fetch-and-increment, load linked and store conditional instructions. Implementing locks using coherence. Synchronization performance challenges. o Barrier synchronization o Code for simple and sense reversing barrier. Synchronization mechanisms for larger-scale multiprocessors. o Software implementations o Hardware primitives 14. Explain the models of memory consistency. Sequential consistency. Relaxed consistency models. o W->R ordering o W->W ordering o R->W and R->R ordering 15. Explain the performance of symmetric shared-memory and distributed shared-memory multiprocessors. Symmetric shared-memory multiprocessors: Define true sharing and false sharing. Performance measurements of the commercial workload.

Performance of the multiprogramming and OS workload. Performance for the scientific/technical workload. Distributed shared-memory multiprocessor: Miss rate Memory access cost unit 16. Explain in detail about reducing cache miss penalty. First miss penalty reduction technique: multilevel caches. Second miss penalty reduction technique: critical word first and early restart. Third miss penalty reduction technique: giving priority to read misses over writes. Fourth miss penalty reduction technique: merging write buffer Fifth miss penalty reduction technique: victim caches 17. Explain in detail about reducing miss rate. First miss rate reduction technique: Larger block size. Second miss rate reduction technique: Larger caches. Third miss rate reduction technique: Higher associativity. Fourth miss rate reduction technique: Way prediction and Pseudoassociative caches Fifth miss rate reduction technique: Compiler optimizations. o Loop interchange o Blocking 18. Explain in detail about memory technology. DRAM technology. SRAM technology. Embedded processor memory technology: ROM and Flash Improving memory performance in a standard DRAM chip Improving memory performance via a new DRAM interface: RAMBUS Comparing RAMBUS and DDR SDRAMDes. 19.Explain the types of storage devices. Magnetic Disks The future of magnetic disks. Optical disks Magnetic tapes Automated tape libraries Flash memory 20. Explain in detail about Buses-Connecting I/O devices to CPU/Memory Bus design decisions Bus standards Interfacing storage devices to the CPU- Figure: A typical interface of I/O devices and an I/O bus to the CPU-memory bus.

Delegating I/O responsibility from the CPU 21. Explain in detail about SMT. Converting thread-level parallelism to instruction-level parallelism Design challenges in SMT processors Potential performance advantages from SMT 22. Explain about CMP architecture. Define CMP Architecture Explanation 23. Explain detail about software and hardware multithreading o Software multithreading o Hardware multithreading o Explanation 24. Explain about heterogeneous multi core processor. o Define multi core processor. o heterogeneous multi core processor o Diagram 25. Explain about IBM cell processor Define cell processor Architecture Explanation