Wenisch Final Review. Fall 2007 Prof. Thomas Wenisch EECS 470. Slide 1

Similar documents
(1) Measuring performance on multiprocessors using linear speedup instead of execution time is a good idea.

EECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table

Keywords and Review Questions

CS 654 Computer Architecture Summary. Peter Kemper

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

Page 1. Multilevel Memories (Improving performance using a little cash )

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

ECE 571 Advanced Microprocessor-Based Design Lecture 10

Advanced Caching Techniques

Chapter 2: Memory Hierarchy Design Part 2

Lecture notes for CS Chapter 2, part 1 10/23/18

Advanced Caching Techniques

EECS 470. Lecture 14 Advanced Caches. DEC Alpha. Fall Jon Beaumont

Chapter 5. Topics in Memory Hierachy. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ.

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Memory Hierarchy. Slides contents from:

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

A Cache Hierarchy in a Computer System

Chapter 2: Memory Hierarchy Design Part 2

Computer Systems Architecture

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

ECE/CS 757: Homework 1

Computer Systems Architecture

Kaisen Lin and Michael Conley

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

Computer Architecture Spring 2016

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

CSE502 Graduate Computer Architecture. Lec 22 Goodbye to Computer Architecture and Review

Tutorial 11. Final Exam Review

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

EECS 470 Lecture 13. Basic Caches. Fall 2018 Jon Beaumont

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Online Course Evaluation. What we will do in the last week?

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Copyright 2012, Elsevier Inc. All rights reserved.

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

15-740/ Computer Architecture

EITF20: Computer Architecture Part4.1.1: Cache - 2

Lecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1)

Adapted from David Patterson s slides on graduate computer architecture

Computer Architecture Crash course

Memory Hierarchy. Advanced Optimizations. Slides contents from:

EITF20: Computer Architecture Part4.1.1: Cache - 2

Module 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.

Virtual Memory, Address Translation

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

MEMORY HIERARCHY DESIGN. B649 Parallel Architectures and Programming

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

Multithreaded Processors. Department of Electrical Engineering Stanford University

Fall 2007 Prof. Thomas Wenisch

Pollard s Attempt to Explain Cache Memory

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

RECAP. B649 Parallel Architectures and Programming

Last lecture. Some misc. stuff An older real processor Class review/overview.

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

Portland State University ECE 588/688. Cray-1 and Cray T3E

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

UNIT I (Two Marks Questions & Answers)

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

Virtual Memory, Address Translation

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

COSC 6385 Computer Architecture - Review for the 2 nd Quiz

Lec 11 How to improve cache performance

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CS6303-COMPUTER ARCHITECTURE UNIT I OVERVIEW AND INSTRUCTIONS PART A

Computer Systems Architecture

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

LECTURE 5: MEMORY HIERARCHY DESIGN

Lecture 11. Virtual Memory Review: Memory Hierarchy

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Computer Architecture Spring 2016

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Copyright 2012, Elsevier Inc. All rights reserved.

Memory Hierarchy Basics

Lecture-18 (Cache Optimizations) CS422-Spring

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers

anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer

Transcription:

Final Review Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 Slide 1

Announcements Wenisch 2007 Exam is Monday, 12/17 4 6 in this room I recommend bringing a scientific calculator Closed book/notes Slide 2

Stuff from the first half Parallelism, locality, amortization, memoization Amdahl s Law Iron Law Calculating speedup Instruction level parallelism Performance impactof in order vs OoO General HW structure of predictors (not branch predictors specifically) Slide 3

Memory Speculation What semantics does LSQ have to guarantee? How does non speculative load to store forwaring work? What hardware does it require? What are the implications of speculative loads? What is the purpose of dependence prediction? How does it work? What is the purpose of a store buffer? How can you break the dataflow ILP limit? Can you draw a simple HW diagram for value prediction? i Slide 4

Basic Caches Formula for effective access time for 1 level cache? What about 2 level cache? Associativity, block size, cache size Local vs. global hit/miss ratios Causes of cache misses (classification) Writeback vs. write through, allocate vs. no allocate Temporal vs. spatial locality Slide 5

Improving Cache Performance: Summary Miss rate large block size higher associativity victim caches hardware/software prefetching compiler optimizations Miss penalty give priority to read misses over writes/writebacks subblock placement early restart and critical word first non blocking caches multi level level caches Hit time (difficult?) small and simple caches avoiding translation during L1 indexing Slide 6

More Cache Issues Wenisch 2007 What is inclusion? How do you implement it? How to implement non blocking cache? Bandwidth enhancements: Multi porting Multiple cache copies Virtual multiporting Multi banking Line buffer Slide 7

Prefetching Software vs. hardware prefetching Instruction prefetching Stride based prefetching Stream buffers Run ahead prefetching Correlation based prefetching Slide 8

Virtual Memory Why do we have it? (protection, paging) Base/bound, Segmented VM, paged VM How is VM management different from cache management? Page table entries Page table designs (top down, bottom up, inverted) TLB designs VIVT, VIPT, PIPT caches Dealing with synonyms Slide 9

SRAM vs. DRAM Multiple memory banks Bank interleaving Main Memory Slide 10

Software ILP Wenisch 2007 List scheduling Speculativecode code motion (especially implications) Classic optimizations (know what these are, be able to give an example) Copy propagation, constant folding, strength reduction Common subexpressions, dead code elimination Induction variable elimination, Inlining Loop unrolling, loop invariant code motion, Profile driven optimization (esp. implications of using profiles) Trace scheduling & compensation code Slide 11

Binary Translation / Virtualization Why is it useful? Where isitused it today? (VMware, Java,Transmeta Crusoe) How does it work? Whatare are somedifficult corner cases to handle? Static vs. dynamic Slide 12

Power Why do power and energy matter for various markets? Dynamic, leakage, short circuit power Power ~ ½ C V 2 A f Performance ~ f ~ V Know how to compare voltage/freq scaling to other techniques Power vs. energy PDP, EDP, ED 2 P Slide 13

What are vectors? Data Level Parellelism Whathardware hardware isneeded for vectorcpus? How do vectors interact with caches? Strided/indexed scatter/gatter memory accesses Masks and chaining Slide 14

Thread Level Parellism Shared memory architectures NUMA vs UMA Busvs. point to pointto interconnects Advantages and limitations of busses MESI cache coherence protocol Purpose of a directory Slide 15

Multithreading Wenisch 2007 Advantages/Disadvantages: Superscalar Chip multiprocessor Coarse grain multithreading (switch on miss) Fine grain multithreading (round robinrobin every cycle) Simultaneous multithreading How do these interact with cache hierarchies? What kinds of programs work best on each of these? What changes to the microarchitecture are required? Slide 16