Loops and Locality. with an introduc-on to the memory hierarchy. COMP 506 Rice University Spring target code. source code OpJmizer

Size: px
Start display at page:

Download "Loops and Locality. with an introduc-on to the memory hierarchy. COMP 506 Rice University Spring target code. source code OpJmizer"

Transcription

1 COMP 506 Rice University Spring 2017 Loops and Locality with an introduc-on to the memory hierarchy source code Front End IR OpJmizer IR Back End target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 506 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educajonal insjtujons may use these materials for nonprofit educajonal purposes, provided this copyright nojce is preserved Most of this material is not in EaC2e

2 Op1miza1on From Lecture 14 Compilers operate at mul1ple granulari1es or scopes Local techniques Work on a single basic block Maximal length sequence of straight-line code Regional techniques Consider muljple blocks, but less than whole procedure Single loop, loop nest, dominator region, Intraprocedural, or global, techniques Operate on an enjre procedure Common of compilajon Interprocedural, or whole-program, techniques Operate on > 1 procedure, up to whole program LogisJcal issues related to accessing the code (but just one) (op-mize in the linker?) COMP 506, Spring

3 The Opportuni1es Loop Op1miza1on Compilers have always focused on loops They have higher execujon counts than code outside loops They have repeated operajons and related operajons Much of the real work of compujng takes place inside loops There are several effects to acack Loop overhead Decrease the control-structure cost for each iterajon Locality SpaJal Locality use of co-resident Temporal Locality reuse of the same at different Jmes Parallelism Move loops with indepent operajons to inner or outer posijon 1 COMP 515 COMP 1 Innermost 506, Spring loop makes 2017 sense for vector machines; outermost loop makes sense for muljprocessors. 3

4 Elimina1ng Overhead Loop Unrolling (the oldest trick in the book) To reduce overhead, replicate the body Overhead is the increment, test, and branch do i = 1 to 100 by 1 a(i) = a(i) + b(i) becomes (unroll by 4) do i = 1 to 100 by 4 a(i) = a(i) + b(i) a(i+1) = a(i+1) + b(i+1) a(i+2) = a(i+2) + b(i+2) a(i+3) = a(i+3) + b(i+3) Sources of Improvement Less overhead per useful operajon Longer basic blocks for local opjmizajon COMP 506, Spring

5 Elimina1ng Overhead Loop Unrolling With Unknown Bounds Generate extra loops to handle cases smaller than the unroll factor do i = 1 to n by 1 a(i) = a(i) + b(i) becomes (unroll by 4) While loop needs an explicit update for variable i You will find code like this in the BLAS and in BitBlt i = 1 while (i+3 < n) do a(i) = a(i) + b(i) a(i+1) = a(i+1) + b(i+1) a(i+2) = a(i+2) + b(i+2) a(i+3) = a(i+3) + b(i+3) i = i + 4 while (i < n ) do a(i) = a(i) + b(i) i = i + 1 COMP 506, Spring

6 Elimina1ng Overhead One Other Use For Unrolling Eliminate copies at the of a loop t 1 = b(0) do i = 1 to 100 t 2 = b(i) a(i) = a(i) + t 1 + t 2 t 1 = t 2 becomes (unroll by 2 and rename) t 1 = b(0) do i = 1 to 100 by 2 t 2 = b(i) a(i) = a(i) + t 1 + t 2 t 1 = b(i+1) a(i+1) = a(i+1) + t 2 + t 1 More Complex Cases MulJple cycles of cross-iterajon copies Use LCM of cycle lengths as unroll factor Result has been rediscovered many Jmes [214] COMP 506, Spring

7 Locality-Driven Improvement Loop Fusion Two loops iterate over the same iterajon space Convert them into a single loop do i = 1 to n c(i) = a(i) + b(i) do j = 1 to n d(j) = a(j) * e(j) becomes (fuse) do i = 1 to n c(i) = a(i) + b(i) d(i) = a(i) * e(i) Advanes Fewer total operajons (lower overhead) Longer basic blocks for local opjmizajon & scheduling Can convert reuse between loops to reuse within a loop COMP 506, Spring

8 Locality-Driven Improvement Loop Fusion Two loops iterate over the same iterajon space Convert them into a single loop do i = 1 to n c(i) = a(i) + b(i) do j = 1 to n d(j) = a(j) * e(j) becomes (fuse) Advanes Fewer total operajons (lower overhead) Longer basic blocks for local opjmizajon & scheduling Can convert reuse between loops to reuse within a loop do i = 1 to n c(i) = a(i) + b(i) d(i) = a(i) * e(i) This transforma1on is safe if and only if the fused loop does not change the values used or defined by any statement in either loop. COMP 506, Spring

9 Locality-Driven Improvement Loop Fusion Two loops iterate over the same iterajon space Convert them into a single loop do i = 1 to n c(i) = a(i) + b(i) do j = 1 to n d(j) = a(j) * e(j) becomes (fuse) For large enough arrays, a(x) will not be in the cache by the 1me the second loop tries to reuse it. Advanes Fewer total operajons (lower overhead) Longer basic blocks for local opjmizajon & scheduling Can convert reuse between loops to reuse within a loop do i = 1 to n c(i) = a(i) + b(i) d(i) = a(i) * e(i) a(x) will almost certainly be in the cache at the second use. Safety is expressed in terms of depences: essen1ally, the same values COMP 506, Spring 2017 flow to the same places. 10

10 Locality-Driven Improvement Loop Distribu1on (or fission) Single loop with muljple indepent statements Can transform it into muljple indepent loops Reads b, c, e, f, h, & k Writes a, d, & g do i = 1 to n a(i) = b(i) + c(i) d(i) = e(i) * f(i) g(i) = h(i) - k(i) becomes (fission) do i = 1 to n a(i) = b(i) + c(i) do i = 1 to n d(i) = e(i) * f(i) do i = 1 to n g(i) = h(i) - k(i) Advanes Loops in the transformed code can have a smaller cache footprint More reuse in the cache leads to faster execujon Enables other transformajons, such as vectorizajon Reads b & c Writes a Reads e & f Writes d Reads h & k Writes g DistribuJon is safe if all the statements that form a cycle in the COMP depence 506, Spring graph 2017 up in the same loop (see COMP 515) 11

11 Locality-Driven Improvement Loop Interchange Reorders Loops To Improve Locality Swap inner and outer loops to rearrange the iterajon space do i = 1 to 50 do j = 1 to 100 a(i,j) = b(i,j) * c(i,j) becomes (interchange) do j = 1 to 100 do i = 1 to 50 a(i,j) = b(i,j) * c(i,j) Effect Improves spajal reuse by using more elements per cache line Goal is to get as much reuse into the inner loop as possible COMP 506, Spring

12 Locality-Driven Improvement Loop Interchange Reorders Loops To Improve Locality Swap inner and outer loops to rearrange the iterajon space do i = 1 to 50 do j = 1 to 100 a(i,j) = b(i,j) * c(i,j) In Fortran s column-major order, a(4,4) would lay out as 1,1 2,1 3,1 4,1 1,2 2,2 3,2 4,2 1,3 2,3 3,3 4,3 1,4 2,4 3,4 4,4 As licle as 1 element used per line cache line Effect Improves spajal reuse by using more elements per cache line Goal is to get as much reuse into the inner loop as possible COMP 506, Spring

13 Locality-Driven Improvement Loop Interchange Reorders Loops To Improve Locality Swap inner and outer loops to rearrange the iterajon space Aaer interchange, direc1on of Itera1on is changed 1,1 2,1 3,1 4,1 1,2 2,2 3,2 4,2 1,3 2,3 3,3 4,3 1,4 2,4 3,4 4,4 cache line Runs down cache lines do j = 1 to 100 do i = 1 to 50 a(i,j) = b(i,j) * c(i,j) Root cause of the speed difference in the array example from the 1 st COMP 506 lecture Effect Improves spajal reuse by using more elements per cache line Goal is to get as much reuse into the inner loop as possible COMP 506, Spring

14 Locality-Driven Improvement Loop Interchange Reorders Loops To Improve Locality Swap inner and outer loops to rearrange the iterajon space do i = 1 to 50 do j = 1 to 100 a(i,j) = b(i,j) * c(i,j) becomes do j = 1 to 100 do i = 1 to 50 a(i,j) = b(i,j) * c(i,j) If arrays are stored in row-major order, the same effects occur with the opposite order of loops and subscripts. Effect Improves spajal reuse by using more elements per cache line Goal is to get as much reuse into the inner loop as possible COMP 506, Spring

15 Locality-Driven Improvement Loop Permuta1on Generalizes Interchange to Mul1ple Loops Interchange (2 loops) is the degenerate case of two perfectly nested loops In more general seengs, the transformajon is called permuta-on Safety PermutaJon is safe iff no depences are reversed That is, the flow of from defini-ons to uses is preserved Effects Change the order of access and the order of computajon Move accesses closer together in Jme increased temporal locality Move computajons further apart in Jme cover pipeline latencies COMP 506, Spring

16 The Big Picture Loop op1miza1ons can radically change locality For programs that are memory bound, loop opjmizajon is the primary way to find improvements Change the order of iterajon, change paferns of memory accesses Safety condi1ons and opportuni1es The formal statements of the safety condijons typically involve depence analysis (see COMP 515) Many formulajons of the transformajons Polyhedral analysis Unimodular transformajons Ad-hoc and one-off techniques Safety expressed in terms of depences: essen1ally, the same values flow to the same places. Improving memory-bound programs is possible, but takes some knowledge Most run-of-the-mill compilers do not perform op-miza-ons this complex COMP 506, Spring

17 Address Space Layout We have seen this drawing several Jmes in COMP 506 Most language run1mes layout the address space in a similar way Stacks Growth space for stacks Heap Code Globals Pieces (stack, heap, code, & globals) may move, but all will be there Stack and heap grow toward each other (if heap grows) Arrays live on one of the stacks, in the global area, or in the heap The picture shows one virtual address space. The hardware supports one virtual address space per process. How does a virtual address space map into physical memory? Java Memory Layout COMP 506, Spring

18 How Does Address Space Mapping Work? The Big Picture Compiler s view virtual address spaces S t a c k H e a p C o d e S G t l a & o t b i a c l S t a c k H e a p C o d e S G t l a & o t b i a c l S t a c k H e a p C o d e S G t l a & o t b i a c l... S t a c k H e a p C o d e S G t l a & o t b i a c l OS view... TLB 0 high Physical address space 1980 Hardware view TLB is an address cache used by the OS to speed virtual-to-physical address translajon. A processor may have COMP > 1 level 506, of Spring TLB

19 Cache structure macers for performance, not correctness More Address Space Mapping Of course, the Hardware view is no longer that simple Main Memory L2 Cache... 0 high Data & Code... TLB L1 Caches Data Code Data Code Processor Cores Registers Registers Many COMP processors 506, Spring now 2017 include L3 caches; L4 caches are on their way. 20

20 Cache Memory L3 L2 L1 Core Data Data & Code Data & Code Registers Code Typically shared among 2 cores TLB Modern hardware features mul1ple levels of cache & of TLB L1 is typically private to a core L2 (and beyond) is typically shared between cores and between code (I) and (D) Most caches are inclusive Item in L1 in L2 in L3 Some are exclusive (L1 not in L2) Most caches are set associajve 2, 4 or 8 way TLBs are also associajve Lifle documentajon Difficult to detect or measure COMP 506, Spring

21 Cache Memory The primary func1on of a cache is to provide fast memory near the core L1 is a couple of cycles and small L2 is slower than L1 and larger; L3 is slower and larger, This Laptop (Core i7) L1 5 cycles 32KB L2 13 cycles 256KB L3 36 cycles 4,096KB COMP 506, Spring

22 Cache Memory The primary func1on of a cache is to provide fast memory near the core L1 is a couple of cycles and small L2 is slower than L1 and larger; L3 is slower, again, and larger The other func1on of a cache is to map addresses Cache is organized into blocks, or lines Each line consists of a and a set of words Cache block or line COMP 506, Spring

23 Cache Memory To make good use of cache memory, the code must reuse values. SpaJal reuse refers to the use of more than one word in a line. Temporal reuse refers to reuse of the same word over Jme. The primary func1on of a cache is to provide fast memory near the core L1 is a couple of cycles and small L2 is slower than L1 and larger; L3 is slower, again, and larger The other func1on of a cache is to map addresses Cache is organized into blocks, or lines Each line consists of a and a set of words Cache block or line A full cache is a set of lines Address maps into 3 parts:, index, and offset address index offset COMP 506, Spring 2017 index is a manyto-one map 24

24 Cache Memory Caches differ in how they appor1on the and index bits A direct-mapped cache has one line per index Cache lookup is simple The index bits are an ordinal index into the set of lines index offset t s o Direct-mapped cache Line 0 Cache Do the s match? Line 1 Line 2 Line 3 Line Line 2 s rest of address COMP 506, Spring 2017 A direct mapped cache has s lines. Capacity is the sum of the sizes of the lines. 25

25 Cache Memory Caches differ in how they appor1on the and index bits A set-associa1ve cache has muljple lines per index index maps to a set, lookup matches s within the set Small content-addressable memory 1 for each set 2-way Set-Associa1ve Cache A set-associajve cache has 2 s sets. For a given total size, s is smaller than in direct mapped. The is longer; the index is shorter. Set 0 COMP 506, Spring 2017 Set 1 Set 2 Set 2 s -2 Set 2 s -1 Way 1 Way 0 index offset 26 1 somejmes called associajve memory.

26 What Happens on a Load? The hardware must find the in this complex hierarchy Assume that the address is in a register, e.g. load r 0 => r 1 Assume set-associajve cache Assume cache s are virtual addresses Sequence of Events for a load 1. Processor looks in L1 cache Index maps to a set, then an associa-ve search on the s in the set If found (a cache hit), return the value; otherwise 2. Processor looks in L2 cache Index maps to a set, then an associa-ve search on the s in the set If found (a cache hit), return the value; otherwise 3. And so on COMP 506, Spring

27 What Happens on a Load? What about virtual to physical address transla1on? The address in the load is a virtual address If the load misses in all caches, we need a physical address Caches can be designed to operate on virtual or physical addresses L1 is typically indexed by virtual addresses L2 and above are typically indexed by physical addresses Physically-addressed cache virtual address transla1on during lookup Involves understanding the map from virtual pages to physical pages Involves cooperajon between hardware and the operajng system Worst case behavior involves walking the page tables (ooen locked in L2 or L3) Design of virtual memory systems is covered in a good OS course COMP 506, Spring

28 Cache Memory L3 L2 L1 Core Data Data & Code Data & Code Registers Code Typically shared among 2 cores TLB The TLB plays a key role in virtual to physical address mapping Small cache that maps virtual addresses to physical addresses Holds subset of (acjve) pages that are in virtual memory Tag is v-addr, content is p-addr Physically ged cache must translate v-addr to p-addr TLB hit access can conjnue TLB miss search to bring page into TLB, then conjnue (or reissue) the access A page-fault on the way to an L1 lookup is a lot of delay COMP 506, Spring

29 Cache Memory L3 L2 L1 Core Data Data & Code Data & Code Registers Code Typically shared among 2 cores TLB The TLB plays a key role in virtual to physical address mapping Small cache that maps virtual addresses to physical addresses Holds subset of (acjve) pages that are in virtual memory Tag is v-addr, content is p-addr Most processors use a virtually ged L1 cache, with physical s in upper-level caches Removes TLB role in L1 lookup TLB can be as fast as L1, so it is not a problem for L2 and beyond Physical s are smaller than virtual s fewer gates, less area, lower power consumpjon COMP 506, Spring

30 What Happens on a Load? index offset t s o Careful design can let the TLB lookup & index set lookup run in parallel By playing with the size of t, s, and o, the cache designer can separate index lookup from virtual-to-physical translajon If s + o log 2 (pagesize) then the index and offset bits are the same in physical & virtual addresses If s + o log 2 (pagesize), then the processor can start both the L1 lookup to find the set and the TLB lookup to translate the address at the same Jme By the Jme it has found the set, it should have the from the physical address (unless the lookup misses in the TLB) In effect, associajvity lets cache capacity grow without increasing the number of bits in the index field of the address Do manufacturers play this game? Absolutely. My laptop has a 32,768 byte L1 cache, with 64 byte lines, for 512 lines. It is 8-way set associajve, which means 64 sets. Thus, s = 6, o = 6, and s + o = 12 bits = 4,096, which is the pagesize. COMP With my 506, laptop s Spring cache 2017 parameters, a 4-way associajve cache would need 32 byte lines to keep s + o

Loop Transformations! Part II!

Loop Transformations! Part II! Lecture 9! Loop Transformations! Part II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Loop Unswitching Hoist invariant control-flow

More information

The Processor Memory Hierarchy

The Processor Memory Hierarchy Corrected COMP 506 Rice University Spring 2018 The Processor Memory Hierarchy source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT Autotuning John Cavazos University of Delaware What is Autotuning? Searching for the best code parameters, code transformations, system configuration settings, etc. Search can be Quasi-intelligent: genetic

More information

The So'ware Stack: From Assembly Language to Machine Code

The So'ware Stack: From Assembly Language to Machine Code Taken from COMP 506 Rice University Spring 2017 The So'ware Stack: From Assembly Language to Machine Code source code IR Front End OpJmizer Back End IR target code Somewhere Out Here Copyright 2017, Keith

More information

Naming in OOLs and Storage Layout Comp 412

Naming in OOLs and Storage Layout Comp 412 COMP 412 FALL 2018 Naming in OOLs and Storage Layout Comp 412 source IR IR target Front End Optimizer Back End Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in

More information

Demand- Paged Virtual Memory

Demand- Paged Virtual Memory Demand- Paged Virtual Memory Main Points Can we provide the illusion of near infinite memory in limited physical memory? Demand- paged virtual memory Memory- mapped files How do we choose which page to

More information

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Register Allocation Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP at Rice. Copyright 00, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality A Crash Course in Compilers for Parallel Computing Mary Hall Fall, 2008 1 Overview of Crash Course L1: Data Dependence Analysis and Parallelization (Oct. 30) L2 & L3: Loop Reordering Transformations, Reuse

More information

The Software Stack: From Assembly Language to Machine Code

The Software Stack: From Assembly Language to Machine Code COMP 506 Rice University Spring 2018 The Software Stack: From Assembly Language to Machine Code source code IR Front End Optimizer Back End IR target code Somewhere Out Here Copyright 2018, Keith D. Cooper

More information

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators. Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators Comp 412 COMP 412 FALL 2016 source code IR Front End Optimizer Back

More information

Compiling for Advanced Architectures

Compiling for Advanced Architectures Compiling for Advanced Architectures In this lecture, we will concentrate on compilation issues for compiling scientific codes Typically, scientific codes Use arrays as their main data structures Have

More information

Intermediate Representations

Intermediate Representations Most of the material in this lecture comes from Chapter 5 of EaC2 Intermediate Representations Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP

More information

Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412

Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412 COMP 412 FALL 2017 Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412 source IR Front End Optimizer Back End IR target Copyright 2017, Keith D. Cooper & Linda Torczon, all rights

More information

Instruction Selection: Preliminaries. Comp 412

Instruction Selection: Preliminaries. Comp 412 COMP 412 FALL 2017 Instruction Selection: Preliminaries Comp 412 source code Front End Optimizer Back End target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled

More information

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target. COMP 412 FALL 2017 Generating Code for Assignment Statements back to work Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights

More information

Coarse-Grained Parallelism

Coarse-Grained Parallelism Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop interchange and skewing, Loop Strip-mining cs6363 1 Introduction Our previous loop transformations target vector and

More information

Code Shape II Expressions & Assignment

Code Shape II Expressions & Assignment Code Shape II Expressions & Assignment Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make

More information

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019 MEMORY: SWAPPING Shivaram Venkataraman CS 537, Spring 2019 ADMINISTRIVIA - Project 2b is out. Due Feb 27 th, 11:59 - Project 1b grades are out Lessons from p2a? 1. Start early! 2. Sketch out a design?

More information

Arrays and Functions

Arrays and Functions COMP 506 Rice University Spring 2018 Arrays and Functions source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled

More information

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies

More information

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End COMP 412 FALL 2017 Procedure and Function Calls, Part II Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

Tiling: A Data Locality Optimizing Algorithm

Tiling: A Data Locality Optimizing Algorithm Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay Rajopadhye is talking at BMAC Friday December 2nd, Dr. Sanjay Rajopadhye will be leading CS553 Last Monday Kelly

More information

Runtime Support for Algol-Like Languages Comp 412

Runtime Support for Algol-Like Languages Comp 412 COMP 412 FALL 2018 Runtime Support for Algol-Like Languages Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

The Polyhedral Model (Transformations)

The Polyhedral Model (Transformations) The Polyhedral Model (Transformations) Announcements HW4 is due Wednesday February 22 th Project proposal is due NEXT Friday, extension so that example with tool is possible (see resources website for

More information

Advanced optimizations of cache performance ( 2.2)

Advanced optimizations of cache performance ( 2.2) Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped

More information

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End COMP 412 FALL 2017 Code Shape Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at

More information

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code COMP 412 FALL 2017 Local Optimization: Value Numbering The Desert Island Optimization Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon,

More information

Handling Assignment Comp 412

Handling Assignment Comp 412 COMP 412 FALL 2018 Handling Assignment Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp

More information

Memory Management! Goals of this Lecture!

Memory Management! Goals of this Lecture! Memory Management! Goals of this Lecture! Help you learn about:" The memory hierarchy" Why it works: locality of reference" Caching, at multiple levels" Virtual memory" and thereby " How the hardware and

More information

Control flow graphs and loop optimizations. Thursday, October 24, 13

Control flow graphs and loop optimizations. Thursday, October 24, 13 Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange

More information

Introduction to Optimization Local Value Numbering

Introduction to Optimization Local Value Numbering COMP 506 Rice University Spring 2018 Introduction to Optimization Local Value Numbering source IR IR target code Front End Optimizer Back End code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights

More information

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412 COMP 12 FALL 20 The ILOC Virtual Machine (Lab 1 Background Material) Comp 12 source code IR Front End OpMmizer Back End IR target code Copyright 20, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

Parallelisation. Michael O Boyle. March 2014

Parallelisation. Michael O Boyle. March 2014 Parallelisation Michael O Boyle March 2014 1 Lecture Overview Parallelisation for fork/join Mapping parallelism to shared memory multi-processors Loop distribution and fusion Data Partitioning and SPMD

More information

Program Transformations for the Memory Hierarchy

Program Transformations for the Memory Hierarchy Program Transformations for the Memory Hierarchy Locality Analysis and Reuse Copyright 214, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California

More information

Program Op*miza*on and Analysis. Chenyang Lu CSE 467S

Program Op*miza*on and Analysis. Chenyang Lu CSE 467S Program Op*miza*on and Analysis Chenyang Lu CSE 467S 1 Program Transforma*on op#mize Analyze HLL compile assembly assemble Physical Address Rela5ve Address assembly object load executable link Absolute

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 08: Caches III Shuai Wang Department of Computer Science and Technology Nanjing University Improve Cache Performance Average memory access time (AMAT): AMAT =

More information

Storage Management 1

Storage Management 1 Storage Management Goals of this Lecture Help you learn about: Locality and caching Typical storage hierarchy Virtual memory How the hardware and OS give applications the illusion of a large, contiguous,

More information

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018 CS 31: Intro to Systems Virtual Memory Kevin Webb Swarthmore College November 15, 2018 Reading Quiz Memory Abstraction goal: make every process think it has the same memory layout. MUCH simpler for compiler

More information

Performance of Various Levels of Storage. Movement between levels of storage hierarchy can be explicit or implicit

Performance of Various Levels of Storage. Movement between levels of storage hierarchy can be explicit or implicit Memory Management All data in memory before and after processing All instructions in memory in order to execute Memory management determines what is to be in memory Memory management activities Keeping

More information

Loops. Announcements. Loop fusion. Loop unrolling. Code motion. Array. Good targets for optimization. Basic loop optimizations:

Loops. Announcements. Loop fusion. Loop unrolling. Code motion. Array. Good targets for optimization. Basic loop optimizations: Announcements HW1 is available online Next Class Liang will give a tutorial on TinyOS/motes Very useful! Classroom: EADS Hall 116 This Wed ONLY Proposal is due on 5pm, Wed Email me your proposal Loops

More information

Supercomputing in Plain English Part IV: Henry Neeman, Director

Supercomputing in Plain English Part IV: Henry Neeman, Director Supercomputing in Plain English Part IV: Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma Wednesday September 19 2007 Outline! Dependency Analysis! What is

More information

Lecture 1 Introduc-on

Lecture 1 Introduc-on Lecture 1 Introduc-on What would you get out of this course? Structure of a Compiler Op9miza9on Example 15-745: Introduc9on 1 What Do Compilers Do? 1. Translate one language into another e.g., convert

More information

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics Systemprogrammering 27 Föreläsning 4 Topics The memory hierarchy Motivations for VM Address translation Accelerating translation with TLBs Random-Access (RAM) Key features RAM is packaged as a chip. Basic

More information

Intermediate Representations

Intermediate Representations COMP 506 Rice University Spring 2018 Intermediate Representations source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

Implementing Control Flow Constructs Comp 412

Implementing Control Flow Constructs Comp 412 COMP 412 FALL 2018 Implementing Control Flow Constructs Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy Systemprogrammering 29 Föreläsning 4 Topics! The memory hierarchy! Motivations for VM! Address translation! Accelerating translation with TLBs Random-Access (RAM) Key features! RAM is packaged as a chip.!

More information

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Memory Management! How the hardware and OS give application pgms: The illusion of a large contiguous address space Protection against each other Memory Management! Goals of this Lecture! Help you learn about:" The memory hierarchy" Spatial and temporal locality of reference" Caching, at multiple levels" Virtual memory" and thereby " How the hardware

More information

CS 293S Parallelism and Dependence Theory

CS 293S Parallelism and Dependence Theory CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Noël Pouche, Mary Hall End of Moore's Law

More information

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) COMP 412 FALL 2017 Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) Copyright 2017, Keith D. Cooper & Zoran Budimlić, all rights reserved. Students enrolled in Comp 412 at Rice

More information

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy Memory Management Goals of this Lecture Help you learn about: The memory hierarchy Spatial and temporal locality of reference Caching, at multiple levels Virtual memory and thereby How the hardware and

More information

Optimising for the p690 memory system

Optimising for the p690 memory system Optimising for the p690 memory Introduction As with all performance optimisation it is important to understand what is limiting the performance of a code. The Power4 is a very powerful micro-processor

More information

Just-In-Time Compilers & Runtime Optimizers

Just-In-Time Compilers & Runtime Optimizers COMP 412 FALL 2017 Just-In-Time Compilers & Runtime Optimizers Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page. CS 433 Homework 4 Assigned on 10/17/2017 Due in class on 11/7/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

CSC D70: Compiler Optimization Memory Optimizations

CSC D70: Compiler Optimization Memory Optimizations CSC D70: Compiler Optimization Memory Optimizations Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry, Greg Steffan, and

More information

CS399 New Beginnings. Jonathan Walpole

CS399 New Beginnings. Jonathan Walpole CS399 New Beginnings Jonathan Walpole Memory Management Memory Management Memory a linear array of bytes - Holds O.S. and programs (processes) - Each cell (byte) is named by a unique memory address Recall,

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

Pipelining Exercises, Continued

Pipelining Exercises, Continued Pipelining Exercises, Continued. Spot all data dependencies (including ones that do not lead to stalls). Draw arrows from the stages where data is made available, directed to where it is needed. Circle

More information

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct

More information

Princeton University. Computer Science 217: Introduction to Programming Systems. The Memory/Storage Hierarchy and Virtual Memory

Princeton University. Computer Science 217: Introduction to Programming Systems. The Memory/Storage Hierarchy and Virtual Memory Princeton University Computer Science 27: Introduction to Programming Systems The Memory/Storage Hierarchy and Virtual Memory Goals of this Lecture Help you learn about: Locality and caching The memory

More information

This lecture. Virtual Memory. Virtual memory (VM) CS Instructor: Sanjeev Se(a

This lecture. Virtual Memory. Virtual memory (VM) CS Instructor: Sanjeev Se(a Virtual Memory Instructor: Sanjeev Se(a This lecture (VM) Overview and mo(va(on VM as tool for caching VM as tool for memory management VM as tool for memory protec(on Address transla(on 2 Virtual Memory

More information

Simone Campanoni Loop transformations

Simone Campanoni Loop transformations Simone Campanoni simonec@eecs.northwestern.edu Loop transformations Outline Simple loop transformations Loop invariants Induction variables Complex loop transformations Simple loop transformations Simple

More information

Global Register Allocation via Graph Coloring

Global Register Allocation via Graph Coloring Global Register Allocation via Graph Coloring Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission

More information

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan Memory Hierarchy

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan Memory Hierarchy SE-292 High Performance Computing Memory Hierarchy R. Govindarajan govind@serc Memory Hierarchy 2 1 Memory Organization Memory hierarchy CPU registers few in number (typically 16/32/128) subcycle access

More information

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning

More information

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning

More information

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs Virtual Memory Today Motivations for VM Address translation Accelerating translation with TLBs Fabián Chris E. Bustamante, Riesbeck, Fall Spring 2007 2007 A system with physical memory only Addresses generated

More information

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1 CSE P 501 Compilers Loops Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 U-1 Agenda Loop optimizations Dominators discovering loops Loop invariant calculations Loop transformations A quick look at some

More information

Today: Segmentation. Last Class: Paging. Costs of Using The TLB. The Translation Look-aside Buffer (TLB)

Today: Segmentation. Last Class: Paging. Costs of Using The TLB. The Translation Look-aside Buffer (TLB) Last Class: Paging Process generates virtual addresses from 0 to Max. OS divides the process onto pages; manages a page table for every process; and manages the pages in memory Hardware maps from virtual

More information

Optimizer. Defining and preserving the meaning of the program

Optimizer. Defining and preserving the meaning of the program Where are we? Well understood Engineering Source Code Front End IR Optimizer IR Back End Machine code Errors The latter half of a compiler contains more open problems, more challenges, and more gray areas

More information

Module 16: Data Flow Analysis in Presence of Procedure Calls Lecture 32: Iteration. The Lecture Contains: Iteration Space.

Module 16: Data Flow Analysis in Presence of Procedure Calls Lecture 32: Iteration. The Lecture Contains: Iteration Space. The Lecture Contains: Iteration Space Iteration Vector Normalized Iteration Vector Dependence Distance Direction Vector Loop Carried Dependence Relations Dependence Level Iteration Vector - Triangular

More information

COSC3330 Computer Architecture Lecture 20. Virtual Memory

COSC3330 Computer Architecture Lecture 20. Virtual Memory COSC3330 Computer Architecture Lecture 20. Virtual Memory Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Virtual Memory Topics Reducing Cache Miss Penalty (#2) Use

More information

Memories. CPE480/CS480/EE480, Spring Hank Dietz.

Memories. CPE480/CS480/EE480, Spring Hank Dietz. Memories CPE480/CS480/EE480, Spring 2018 Hank Dietz http://aggregate.org/ee480 What we want, what we have What we want: Unlimited memory space Fast, constant, access time (UMA: Uniform Memory Access) What

More information

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache

Memory Cache. Memory Locality. Cache Organization -- Overview L1 Data Cache Memory Cache Memory Locality cpu cache memory Memory hierarchies take advantage of memory locality. Memory locality is the principle that future memory accesses are near past accesses. Memory hierarchies

More information

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018 CIS 3207 - Operating Systems Memory Management Cache and Demand Paging Professor Qiang Zeng Spring 2018 Process switch Upon process switch what is updated in order to assist address translation? Contiguous

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 20 Main Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Pages Pages and frames Page

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

15 Sharing Main Memory Segmentation and Paging

15 Sharing Main Memory Segmentation and Paging Operating Systems 58 15 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

virtual memory Page 1 CSE 361S Disk Disk

virtual memory Page 1 CSE 361S Disk Disk CSE 36S Motivations for Use DRAM a for the Address space of a process can exceed physical memory size Sum of address spaces of multiple processes can exceed physical memory Simplify Management 2 Multiple

More information

Memory Management. Dr. Yingwu Zhu

Memory Management. Dr. Yingwu Zhu Memory Management Dr. Yingwu Zhu Big picture Main memory is a resource A process/thread is being executing, the instructions & data must be in memory Assumption: Main memory is infinite Allocation of memory

More information

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015 Operating Systems 09. Memory Management Part 1 Paul Krzyzanowski Rutgers University Spring 2015 March 9, 2015 2014-2015 Paul Krzyzanowski 1 CPU Access to Memory The CPU reads instructions and reads/write

More information

Advanced Compiler Construction Theory And Practice

Advanced Compiler Construction Theory And Practice Advanced Compiler Construction Theory And Practice Introduction to loop dependence and Optimizations 7/7/2014 DragonStar 2014 - Qing Yi 1 A little about myself Qing Yi Ph.D. Rice University, USA. Associate

More information

Virtual Memory: Concepts

Virtual Memory: Concepts Virtual Memory: Concepts Instructor: Dr. Hyunyoung Lee Based on slides provided by Randy Bryant and Dave O Hallaron Today Address spaces VM as a tool for caching VM as a tool for memory management VM as

More information

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

CS 61C: Great Ideas in Computer Architecture. Virtual Memory CS 61C: Great Ideas in Computer Architecture Virtual Memory Instructor: Justin Hsia 7/30/2012 Summer 2012 Lecture #24 1 Review of Last Lecture (1/2) Multiple instruction issue increases max speedup, but

More information

16 Sharing Main Memory Segmentation and Paging

16 Sharing Main Memory Segmentation and Paging Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence

More information

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14 98:23 Intro to Computer Organization Lecture 4 Virtual Memory 98:23 Introduction to Computer Organization Lecture 4 Instructor: Nicole Hynes nicole.hynes@rutgers.edu Credits: Several slides courtesy of

More information

Transla'on Out of SSA Form

Transla'on Out of SSA Form COMP 506 Rice University Spring 2017 Transla'on Out of SSA Form Benoit Boissinot, Alain Darte, Benoit Dupont de Dinechin, Christophe Guillon, and Fabrice Rastello, Revisi;ng Out-of-SSA Transla;on for Correctness,

More information

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017 CIS 5512 - Operating Systems Memory Management Cache Professor Qiang Zeng Fall 2017 Previous class What is logical address? Who use it? Describes a location in the logical memory address space Compiler

More information

Opera&ng Systems ECE344

Opera&ng Systems ECE344 Opera&ng Systems ECE344 Lecture 8: Paging Ding Yuan Lecture Overview Today we ll cover more paging mechanisms: Op&miza&ons Managing page tables (space) Efficient transla&ons (TLBs) (&me) Demand paged virtual

More information

Class Information INFORMATION and REMINDERS Homework 8 has been posted. Due Wednesday, December 13 at 11:59pm. Third programming has been posted. Due Friday, December 15, 11:59pm. Midterm sample solutions

More information

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code Traditional Three-pass Compiler Source Code Front End IR Middle End IR Back End Machine code Errors Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce

More information

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Memory. From Chapter 3 of High Performance Computing. c R. Leduc Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation CS 152 Computer Architecture and Engineering Lecture 9 - Address Translation Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Habanero Extreme Scale Software Research Project

Habanero Extreme Scale Software Research Project Habanero Extreme Scale Software Research Project Comp215: Garbage Collection Zoran Budimlić (Rice University) Adapted from Keith Cooper s 2014 lecture in COMP 215. Garbage Collection In Beverly Hills...

More information

CS 61C: Great Ideas in Computer Architecture Direct- Mapped Caches. Increasing distance from processor, decreasing speed.

CS 61C: Great Ideas in Computer Architecture Direct- Mapped Caches. Increasing distance from processor, decreasing speed. CS 6C: Great Ideas in Computer Architecture Direct- Mapped s 9/27/2 Instructors: Krste Asanovic, Randy H Katz hdp://insteecsberkeleyedu/~cs6c/fa2 Fall 2 - - Lecture #4 New- School Machine Structures (It

More information

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

USC 227 Office hours: 3-4 Monday and Wednesday  CS553 Lecture 1 Introduction 4 CS553 Compiler Construction Instructor: URL: Michelle Strout mstrout@cs.colostate.edu USC 227 Office hours: 3-4 Monday and Wednesday http://www.cs.colostate.edu/~cs553 CS553 Lecture 1 Introduction 3 Plan

More information

This Unit: Main Memory. Virtual Memory. Virtual Memory. Other Uses of Virtual Memory

This Unit: Main Memory. Virtual Memory. Virtual Memory. Other Uses of Virtual Memory This Unit: Virtual Application OS Compiler Firmware I/O Digital Circuits Gates & Transistors hierarchy review DRAM technology A few more transistors Organization: two level addressing Building a memory

More information

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Next time Dynamic memory allocation and memory bugs Fabián E. Bustamante,

More information