anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer

Size: px

Start display at page:

Download "anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer"

Wilfrid Hoover
5 years ago
Views:

1 Contents advanced anced computer architecture i FOR m.tech (jntu - hyderabad & kakinada) i year i semester (COMMON TO ECE, DECE, DECS, VLSI & EMBEDDED SYSTEMS) CONTENTS UNIT - I [CH. H. - 1] ] [FUNDAMENTALS OF COMPUTER DESIGN] INTRODUCTION THE CHANGING FACE CE OF COMPUTING AND THE TASK OF THE COMPUTER DESIGNER Desktop Computing Servers Embedded Computers The Task of the Computer Designer TECHNOLOGY TRENDS Impact of Technology Trends on the Design of a Microprocessor Scaling of Transistor Performance, Wires and Power in Integrated Circuits COST,, PRICE AND THEIR TRENDS Impact of Time, Volume and Commodification on Cost Cost of Integrated Circuit Distribution of Cost in a System Cost Versus Price Cost Versus Performance

2 ii Contents 1.5 MEASURING AND REPORTING PERFORMANCE Measuring Performance Choosing Programs to Evaluate Performance erformance Benchmark Suites Reporting Performance Results esults Comparing and Summarizing Performance QUANTIT ANTITATIVE TIVE PRINCIPLES OF COMPUTER DESIGN Make the Common Case Fast Amdahl s Law The CPU Performance Equation Measuring and Modeling the Components of the CPU Performance Equation Principle of Locality Parallelism arallelism UNIT - I [CH. - 2] ] [INSTRUCTION SET PRINCIPLES AND EXAMPLES] INTRODUCTION CLASSIFYING INSTRUCTION SET ARCHITECTURES Stack Architecture Accumulator Architecture General Purpose Register Architecture Advantages and Disadvantages of ISAs MEMORY ADDRESSING Interpreting Memory Addresses Addressing Modes ADDRESSING MODES FOR SIGNAL PROCESSING Type of Operands for Signal Processing Operations for Signal and Media Processing TYPE AND SIZE OF OPERANDS OPERATIONS IN INSTRUCTION SET ARCHITECTURES

3 Contents iii UNIT - II [CH. H. - 3] ] [PIPELINING PIPELINING] INTRODUCTION THE BASICS OF A RISC INSTRUCTION SET ALU Instructions Load and Store Instructions Branches and Jumps A SIMPLE IMPLEMENTATION TION OF A RISC INSTRUCTION SET THE CLASSIC FIVE-ST STAGE PIPELINE FOR A RISC PROCESSOR BASIC PERFORMANCE ISSUES IN PIPELINING Speed-Up (S N ) Pipeline Efficiency (η) Throughput of the Pipeline Processor (ω) Delay (or) Latency Optimal Number of Stages (k) PIPELINE HAZARDS Performance of Pipelines with Stalls Structural Hazards Data Hazards Operand Forwarding Handling Data Hazards in Software (NOP Instruction) Side Effects Branch Hazards (or) Control Hazards (or) Instruction Hazards Unconditional Branches Conditional Branches and Branch Penalities

4 iv Contents UNIT - II [CH. H. - 4] ] [MEMORY HIERARCHY DESIGN] INTRODUCTION Memory Hierarchies of Desktop, Server and Embedded Computers REVIEW OF THE ABCS OF CACHES Four Memory Hierarchy Questions Block Placement Block Identification Block Replacement Write Strategies CACHE PERFORMANCE Improving Cache Performance erformance REDUCING CACHE CHE MISS PENALTY TY Second Level Caches Early Restart and Critical Word First irst Giving Priority to Read Misses Over Writes Merging Write Buffer Victim Caches REDUCING CACHE CHE MISS RATE TE Larger Cache Blocks Higher Associativity Larger Caches Way ay Prediction and Set Associative Caches Compiler Optimization VIRTUAL MEMORY Classification of Virtual Memory Systems Uses of Virtual Memory Comparison of Cache and Virtual Memory Four Memory Hierarchy Questions

5 Contents v Fast Address Translation Structure of Translation Lookaside Buffer Working of Translation Lookaside Buffer Selecting a Page Size PROTECTION OF VIRTU TUAL MEMORY Protection Processes Base and Bound Register Permission Flags Separate Page Tables Rings Key and Lock EXAMPLES OF VIRTUAL MEMORY Paged Virtual Memory System (The Alpha Memory Management) Working of Alpha Architecture Virtual Memory Alpha Page Table Entry (PTE) Virtual Address Segment of Alpha System Segmented Virtual Memory System Example (Intel Pentium) UNIT - III [CH. - 5] ] [INSTRUCTION LEVEL PARALLELISM AND ITS DYNAMIC EXPLOITATION] INSTRUCTION LEVEL PARALLELISM Dependences between Instructions Data Dependences Name Dependences Control Dependences Data Hazards Read After Write (RAW) Hazard Write rite-after After-Write rite (WAW) W) Hazard Write-After-Read (WAR) Hazard

6 vi Contents 5.2 OVERCOMING DATA A HAZARDS WITH DYNAMIC SCHEDULING Overview of Dynamic Scheduling Dynamic Scheduling Using Scoreboarding Dynamic Scheduling Using Tomasulo s Approach BRANCH PREDICTION Dynamic Branch Prediction Bit Prediction Scheme Bit Prediction Scheme Two Level Adaptive Prediction Scheme Correlating Branch Predictors Tournament Predictors Accuracy of Branch Prediction Limitations to the Benefits of Branch Prediction HIGH PERFORMANCE INSTRUCTION DELIVERY Branch Target Buffers Integrated Instruction Fetch Units Return Address Predictors HARDWARE ARE BASED SPECULATION Design Considerations for Speculative Machine LIMITATIONS TIONS OF ILP The Hardware Model Factors Affecting ILP Effect of Realistic Branch and Jump Prediction Effects of Limited Number of Registers Effect of Memory Alias Analysis

7 Contents vii UNIT - III [CH. H. - 6] ] [ILP SOFTWARE APPROACH] INTRODUCTION BASIC COMPILER LEVEL TECHNIQUES Basic Pipeline Scheduling and Loop Unrolling Limitation of Loop Unrolling STATIC TIC BRANCH PREDICTION VLIW APPROACH COMPILER (SOFTWARE) SUPPORT FOR EXPLOITING ILP Loop oop-l -Level Parallelism (LLP) Analysis Loop Carried Dependence Loop-Carried Dependence Detection Eliminating Dependent Computations Software Pipelining (Symbolic Loop Unrolling) Global Code Scheduling Trace Scheduling-F -Focusing on Critical Path ath Superblocks HARDWARE SUPPORT FOR EXPOSING MORE PARALLELISM AT T COMPILE TIME Conditional or Predicated Instructions Compiler Speculation with Hardware Support Hardware Support for Preserving Exception Behaviour Hardware Support for Memory Reference Speculation CORSSCUTTING ISSUES : HARDWARE VERSUS SOFTWARE SOLUTIONS

8 viii Contents UNIT - IV [CH. H. - 7] ] [MULTIPROCESSORS AND THREAD LEVEL PARALLELISM] INTRODUCTION Comparison of Uniprocessor and Multiprocessor Systems Taxonomy of Parallel Architecture Single Instruction Stream Single Data Stream (SISD) Single Instruction Stream Multiple Data Stream (SIMD) Multiple Instruction Stream Single Data Stream (MISD) Multiple Instruction Stream Multiple Data Stream (MIMD) Models for Communication and Memory Architecture Performance Metrics for Communication Mechanisms Advantages of Different Communication Mechanisms Challenges of Parallel Processing rocessing Limited Parallelism Large Latency of Remote Memory Access CHARACTERISTICS CTERISTICS OF APPLICATION DOMAINS SYMMETRIC SHARED-MEMORY ARCHITECTURES Cache Coherence Enforcing Coherence Cache Coherence Protocols Bus Snooping Protocols Write Invalidate Protocol Write Broadcast (Write Update) Protocol Differences between Write Invalidate and Write Update Protocols

9 Contents ix Implementation of Write Invalidate Protocol Performance of Symmetric Shared Memory Multiprocessors Performance of Multiprogramming and OS Workload DISTRIBUTED SHARED-MEMORY ARCHITECTURE Directory Based Cache Coherence Protocols Directory Operation Performance Issues SYNCHRONIZATION Basic Hardware Primitives To Attain Synchronization Implementing Spin Locks Using Coherence Synchronization Performance Challenges Barrier Synchronization Sense Reversing Barrier Synchronization Mechanisms for Large-Scale Multiprocessors UNIT - V [CH. - 8] ] [INTERCONNECTION NETWORKS AND CLUSTERS] INTERCONNECTION TO O INTERCONNECTION NETWORKS Internetworking Challenges Generic Types of Internetworks A SIMPLE NETWORK Performance Parameters of Internetworks INTERCONNECTION NETWORK MEDIA Twisted Pair Cable Coaxial Cable Fiber Optic Cable Connecting Fiber Optics Wavelength Division Multiplexing (WDM)

10 x Contents 8.4 CONNECTING MORE THAN TWO COMPUTERS Shared Media Versus Switched Media Connection-Oriented Versus Connectionless Communication Routing Congestion Control PRACTICAL ISSUES IN INTERCONNECTING NETWORKS EXAMPLES OF INTERCONNECTION NETWORKS Ethernet : The Local Area Network Storage Area Network : Infiniband Wide ide Area Network : ATM TM CLUSTERS Performance Challenges of Clusters Advantages of Clusters Popularity of Clusters DESIGNING A CLUSTER THE INTEL IA-64 ARCHITECTURE AND ITANIUM PROCESSOR ILP IN THE EMBEDDED AND MOBILE MARKETS FALLACIES AND PITFALLS

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define