CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality. Lauren Milne Spring 2015

Similar documents
CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality. Linda Shapiro Winter 2015

CSE373: Data Structures & Algorithms Lecture 9: Priority Queues. Aaron Bauer Winter 2014

CSE 373: Data Structures and Algorithms

ECE468 Computer Organization and Architecture. Memory Hierarchy

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

CSE 373: Data Structures and Algorithms. Memory and Locality. Autumn Shrirang (Shri) Mare

ECE232: Hardware Organization and Design

CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find. Lauren Milne Spring 2015

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CSC Memory System. A. A Hierarchy and Driving Forces

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

LECTURE 11. Memory Hierarchy

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Memory Management! Goals of this Lecture!

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Recap: Machine Organization

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Lecture 12: Memory hierarchy & caches

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Welcome to Part 3: Memory Systems and I/O

Computer Science 432/563 Operating Systems The College of Saint Rose Spring Topic Notes: Memory Hierarchy

Memory Hierarchies &

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

Outline for Today. Why could we construct them in time O(n)? Analyzing data structures over the long term.

CS399 New Beginnings. Jonathan Walpole

Course Administration

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Computer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

CS Prof. Clarkson Fall 2014

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Memory Hierarchy: Caches, Virtual Memory

F28HS Hardware-Software Interface: Systems Programming

CS3350B Computer Architecture

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Fundamentals of Computer Systems

Introduction to OpenMP. Lecture 10: Caches

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Cache introduction. April 16, Howard Huang 1

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

10/11/17. New-School Machine Structures. Review: Single Cycle Instruction Timing. Review: Single-Cycle RISC-V RV32I Datapath. Components of a Computer

CMPSC 311- Introduction to Systems Programming Module: Caching

Lecture-14 (Memory Hierarchy) CS422-Spring

CSE 326: Data Structures Splay Trees. James Fogarty Autumn 2007 Lecture 10

6.172 Performance Engineering of Software Systems Spring Lecture 9. P after. Figure 1: A diagram of the stack (Image by MIT OpenCourseWare.

Systems Programming and Computer Architecture ( ) Timothy Roscoe

CMPSC 311- Introduction to Systems Programming Module: Caching

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs"

CSE Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: Due: Problem 1: (10 points)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Amortized Analysis. Ric Glassey

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

The Memory Hierarchy 10/25/16

Wrap up Amortized Analysis; AVL Trees. Riley Porter Winter CSE373: Data Structures & Algorithms

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

14:332:331. Week 13 Basics of Cache

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Cycle Time for Non-pipelined & Pipelined processors

EE 4683/5683: COMPUTER ARCHITECTURE

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

Assignment 1 due Mon (Feb 4pm

Outline for Today. Analyzing data structures over the long term. Why could we construct them in time O(n)? A simple and elegant queue implementation.

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Fundamentals of Computer Systems

Chapter 8 Virtual Memory

MEMORY HIERARCHY DESIGN

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Transcription:

CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality Lauren Milne Spring 2015

Announcements Homework 3 due on Wednesday at 11pm Catie back Monday Spring 2015 CSE 373 Data structures and Algorithms 2

Amortized Analysis In amortized analysis, the time required to perform a sequence of data structure operations is averaged over all the operations performed. Typically used to show that the average cost of an operation is small for a sequence of operations, even though a single operation can cost a lot Spring 2015 CSE 373 Data structures and Algorithms 3

Amortized Analysis Recall our plain-old stack implemented as an array that doubles its size if it runs out of room Can we claim push is O(1) time if resizing is O(n) time? We can t, but we can claim it s an O(1) amortized operation Spring 2015 CSE 373 Data structures and Algorithms 4

Amortized Complexity We get an upperbound T(n) on the total time of a sequence of n operations. The average time per operation is then T(n)/n, which is also the amortized time per operation. If a sequence of n operations takes O(n f(n)) time, we say the amortized runtime is O(f(n)) If n operations take O(n), what is amortized time per operation? O(1) per operation If n operations take O(n 3 ), what is amortized time per operation? O(n 2 ) per operation The worst case time for an operation can be larger than f(n), but amortized guarantee ensures the average time per operation for any sequence is O(f(n)) Spring 2015 CSE 373 Data structures and Algorithms 5

Building Up Credit Can think of preceding cheap operations as building up credit that can be used to pay for later expensive operations Because any sequence of operations must be under the bound, enough cheap operations must come first Else a prefix of the sequence would violate the bound Spring 2015 CSE 373 Data structures and Algorithms 6

Example #1: Resizing stack A stack implemented with an array where we double the size of the array if it becomes full Claim: Any sequence of push/pop/isempty is amortized O(1) per operation Need to show any sequence of M operations takes time O(M) Recall the non-resizing work is O(M) (i.e., M*O(1)) The resizing work is proportional to the total number of element copies we do for the resizing So it suffices to show that: After M operations, we have done < 2M total element copies (So average number of copies per operation is bounded by a constant) Spring 2015 CSE 373 Data structures and Algorithms 7

Amount of copying Claim: after M operations, we have done < 2M total element copies Let n be the size of the array after M operations Then we have done a total of: n/2 + n/4 + n/8 + INITIAL_SIZE < n element copies Because we must have done at least enough push operations to cause resizing up to size n: So M n/2 2M n > number of element copies Spring 2015 CSE 373 Data structures and Algorithms 8

Other approaches If array grows by a constant amount (say 1000), operations are not amortized O(1) After O(M) operations, you may have done (M 2 ) copies If array shrinks when 1/2 empty, operations are not amortized O(1) Terrible case: pop once and shrink, push once and grow, pop once and shrink, If array shrinks when 3/4 empty, it is amortized O(1) Proof is more complicated, but basic idea remains: by the time an expensive operation occurs, many cheap ones occurred Spring 2015 CSE 373 Data structures and Algorithms 9

Example #2: Queue with two stacks A clever and simple queue implementation using only stacks class Queue<E> { Stack<E> in = new Stack<E>(); Stack<E> out = new Stack<E>(); void enqueue(e x){ in.push(x); E dequeue(){ if(out.isempty()) { while(!in.isempty()) { out.push(in.pop()); return out.pop(); enqueue: A, B, C C B A in out Spring 2015 CSE 373 Data structures and Algorithms 10

Example #2: Queue with two stacks A clever and simple queue implementation using only stacks class Queue<E> { Stack<E> in = new Stack<E>(); Stack<E> out = new Stack<E>(); void enqueue(e x){ in.push(x); E dequeue(){ if(out.isempty()) { while(!in.isempty()) { out.push(in.pop()); return out.pop(); dequeue in B C out A Spring 2015 CSE 373 Data structures and Algorithms 11

Example #2: Queue with two stacks A clever and simple queue implementation using only stacks class Queue<E> { Stack<E> in = new Stack<E>(); Stack<E> out = new Stack<E>(); void enqueue(e x){ in.push(x); E dequeue(){ if(out.isempty()) { while(!in.isempty()) { out.push(in.pop()); return out.pop(); enqueue D, E E D in B C out A Spring 2015 CSE 373 Data structures and Algorithms 12

Example #2: Queue with two stacks A clever and simple queue implementation using only stacks class Queue<E> { Stack<E> in = new Stack<E>(); Stack<E> out = new Stack<E>(); void enqueue(e x){ in.push(x); E dequeue(){ if(out.isempty()) { while(!in.isempty()) { out.push(in.pop()); return out.pop(); dequeue twice E D in out C B A Spring 2015 CSE 373 Data structures and Algorithms 13

Example #2: Queue with two stacks A clever and simple queue implementation using only stacks class Queue<E> { Stack<E> in = new Stack<E>(); Stack<E> out = new Stack<E>(); void enqueue(e x){ in.push(x); E dequeue(){ if(out.isempty()) { while(!in.isempty()) { out.push(in.pop()); return out.pop(); dequeue again D C B A E in out Spring 2015 CSE 373 Data structures and Algorithms 14

Analysis dequeue is not O(1) worst-case because out might be empty and in may have lots of items But if the stack operations are (amortized) O(1), then any sequence of queue operations is amortized O(1) The total amount of work done per element is 1 push onto in, 1 pop off of in, 1 push onto out, 1 pop off of out When you reverse n elements, there were n earlier O(1) enqueue operations to average with Spring 2015 CSE 373 Data structures and Algorithms 15

When is Amortized Analysis Useful? When the average per operation is all we care about (i.e., sum over all operations), amortized is perfectly fine If we need every operation to finish quickly (e.g., in a web server), amortized bounds may be too weak Spring 2015 CSE 373 Data structures and Algorithms 16

Not always so simple Proofs for amortized bounds can be much more complicated Example: Splay trees are dictionaries with amortized O(log n) operations See Chapter 4.5 if curious For more complicated examples, the proofs need much more sophisticated invariants and potential functions to describe how earlier cheap operations build up energy or money to pay for later expensive operations See Chapter 11 if curious But complicated proofs have nothing to do with the code (which may be easy!) Spring 2015 CSE 373 Data structures and Algorithms 17

Switching gears Memory hierarchy/locality Spring 2015 CSE 373 Data structures and Algorithms 18

Why do we need to know about the memory hierarchy/locality? One of the assumptions that Big-O makes is that all operations take the same amount of time Is this really true? Spring 2015 CSE 373 Data structures and Algorithms 19

Definitions A cycle (for our purposes) is the time it takes to execute a single simple instruction (e.g. adding two registers together) Memory latency is the time it takes to access memory Spring 2015 CSE 373 Data structures and Algorithms 20

Time to access: ~16-64+ registers CPU 1 ns per instruction SRAM 8 KB - 4 MB DRAM 2-10 GB Cache Main Memory 2-10 ns 40-100 ns many GB Spring 2015 Disk CSE 373 Data structures and Algorithms a few milliseconds (5-10 million ns) 21

What does this mean? It is much faster to do: Than: 5 million arithmetic ops 1 disk access 2500 L2 cache accesses 1 disk access 400 main memory accesses 1 disk access Why are computers build this way? Physical realities (speed of light, closeness to CPU) Cost (price per byte of different storage technologies) Under the right circumstances, this kind of hierarchy can simulate storage with access time of highest (fastest) level and size of lowest (largest) level Spring 2015 CSE 373 Data structures and Algorithms 22

Spring 2015 CSE 373 Data structures and Algorithms 23

Processor-Memory Performance Gap Spring 2015 CSE 373 Data structures and Algorithms 24

What can be done? Goal: attempt to reduce the accesses to slower levels Spring 2015 CSE 373 Data structures and Algorithms 25

So, what can we do? The hardware automatically moves data from main memory into the caches for you Replacing items already there Algorithms are much faster if data fits in cache (often does) Disk accesses are done by software (e.g. ask operating system to open a file or database to access some records) So most code just runs, but sometimes it s worth designing algorithms / data structures with knowledge of memory hierarchy To do this, we need to understand locality Spring 2015 CSE 373 Data structures and Algorithms 26

Locality Temporal Locality (locality in time) If an item (a location in memory) is referenced, that same location will tend to be referenced again soon. Spatial Locality (locality in space) If an item is referenced, items whose addresses are close by tend to be referenced soon. Spring 2015 CSE 373 Data structures and Algorithms 27

How does data move up the hierarchy? Moving data up the hierarchy is slow because of latency (think distance to travel) Since we re making the trip anyway, might as well carpool Get a block of data in the same time we could get a byte Sends nearby memory because It s easy Spatial Locality Likely to be asked for soon (think fields/arrays) Once a value is in cache, may as well keep it around for a while; accessed once, a value is more likely to be accessed again in the near future (as opposed to some random other value) Temporal Locality Spring 2015 CSE 373 Data structures and Algorithms 28

Cache Facts Definitions: Cache hit address requested is in the cache Cache miss address requested is NOT in the cache Block or page size the number of contiguous bytes moved from disk to memory Cache line size the number of contiguous bytes moved from memory to cache Spring 2015 CSE 373 Data structures and Algorithms 29

Examples x = a + 6 y = a + 5 z = 8 * a miss hit hit Spring 2015 CSE 373 Data structures and Algorithms 30

Examples x = a + 6 miss x = a[0] + 6 miss y = a + 5 hit y = a[1] + 5 hit z = 8 * a hit z = 8 * a[2] hit Spring 2015 CSE 373 Data structures and Algorithms 31

Examples x = a + 6 miss x = a[0] + 6 miss y = a + 5 hit y = a[1] + 5 hit z = 8 * a hit z = 8 * a[2] hit temporal locality spatial locality Spring 2015 CSE 373 Data structures and Algorithms 32

Locality and Data Structures Which has (at least the potential) for better spatial locality, arrays or linked lists? 100 101 102 103 104 105 106 1 2 3 4 5 6 7 a[0] a[1] a[2] a[3] a[4] a[5] a[6] Spring 2015 CSE 373 Data structures and Algorithms 33

Locality and Data Structures Which has (at least the potential) for better spatial locality, arrays or linked lists? e.g. traversing elements 100 101 102 103 104 105 106 1 2 3 4 5 6 7 a[0] miss a[1] a[2] a[3] a[4] a[5] a[6] miss hit hit hit hit hit cache line size cache line size Only miss on first item in a cache line Spring 2015 CSE 373 Data structures and Algorithms 34

Locality and Data Structures Which has (at least the potential) for better spatial locality, arrays or linked lists? e.g. traversing elements 100 101 300 301 50 51 62 63 1 2 3 4 Spring 2015 CSE 373 Data structures and Algorithms 35

Locality and Data Structures Which has (at least the potential) for better spatial locality, arrays or linked lists? e.g. traversing elements 100 101 300 301 50 51 62 63 1 2 3 4 miss hit miss hit miss hit miss hit Miss on every item (unless more than one randomly happen to be in the same cache line) Spring 2015 CSE 373 Data structures and Algorithms 36