Similar documents
Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture

Copyright 2010, Elsevier Inc. All rights Reserved

LECTURE 12. Virtual Memory

Virtual memory. Virtual memory - Swapping. Large virtual memory. Processes

ECE468 Computer Organization and Architecture. Virtual Memory

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

ECE4680 Computer Organization and Architecture. Virtual Memory

Recap: Memory Management

Address spaces and memory management

V. Primary & Secondary Memory!

Pipelined processors and Hazards

1. Creates the illusion of an address space much larger than the physical memory

Caching and Demand-Paged Virtual Memory

Memory systems. Memory technology. Memory technology Memory hierarchy Virtual memory

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

Virtual Memory 1. Virtual Memory

Virtual Memory 1. Virtual Memory

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CSE 451: Operating Systems. Section 10 Project 3 wrap-up, final exam review

Advanced Computer Architecture

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Memory hierarchy review. ECE 154B Dmitri Strukov

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CIS Operating Systems Memory Management Cache Replacement & Review. Professor Qiang Zeng Fall 2017

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy: Caches, Virtual Memory

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

CMPSC 311- Introduction to Systems Programming Module: Caching

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Virtual Memory. 1 Administrivia. Tom Kelliher, CS 240. May. 1, Announcements. Homework, toolboxes due Friday. Assignment.

Paral el Hardware All material not from online sources/textbook copyright Travis Desell, 2012

Computer Architecture Review. Jo, Heeseung

CIS Operating Systems Memory Management Page Replacement. Professor Qiang Zeng Spring 2018

Chapter 6 Objectives

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Tutorial 11. Final Exam Review

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

Memory. Objectives. Introduction. 6.2 Types of Memory

Allocating Memory. Where does malloc get memory? See mmap.c 1-2

Dan Noé University of New Hampshire / VeloBit

Caches. Hiding Memory Access Times

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

Princeton University. Computer Science 217: Introduction to Programming Systems. The Memory/Storage Hierarchy and Virtual Memory

Virtual Memory. Reading: Silberschatz chapter 10 Reading: Stallings. chapter 8 EEL 358

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Main Memory: Address Translation

Last Class. Demand Paged Virtual Memory. Today: Demand Paged Virtual Memory. Demand Paged Virtual Memory

Chapter 8. Virtual Memory

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CMPSC 311- Introduction to Systems Programming Module: Caching

CPE300: Digital System Architecture and Design

Memory Hierarchy Y. K. Malaiya

Chapter 10: Virtual Memory. Lesson 05: Translation Lookaside Buffers

Lecture 14: Cache & Virtual Memory

Memory Management Virtual Memory

Page 1. Multilevel Memories (Improving performance using a little cash )

CSE 120 Principles of Operating Systems

A Cache Hierarchy in a Computer System

COMPUTER ORGANIZATION AND DESIGN

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

VIRTUAL MEMORY READING: CHAPTER 9

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Course Outline. Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

ECE 341. Lecture # 18

Virtual Memory. Yannis Smaragdakis, U. Athens

Locality of Reference

Chapter 5. Large and Fast: Exploiting Memory Hierarchy. Part II Virtual Memory

Keywords and Review Questions

Handout 4 Memory Hierarchy

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Storage Management 1

Virtual Memory COMPSCI 386

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

ECE 571 Advanced Microprocessor-Based Design Lecture 13

Virtual Memory: From Address Translation to Demand Paging

Final Lecture. A few minutes to wrap up and add some perspective

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Caching and Demand- Paged Virtual Memory

Modern Computer Architecture

Chapter 8 Virtual Memory

Transistor: Digital Building Blocks

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan Memory Hierarchy

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

CS427 Multicore Architecture and Parallel Computing

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

Transcription:

Betty Holberton

Multitasking Gives the illusion that a single processor system is running multiple programs simultaneously. Each process takes turns running. (time slice) After its time is up, it waits until it has a turn again and context switches. Copyright 2010, Elsevier Inc. All rights Reserved

Similar to multitasking but within a single process. Originally a method for hiding memory latency. Example, covert dot-product of two vectors x,y of length n and covert to a 4 threaded environment dp = 0; for (i=0; i < n; i++) { dp += x[i] * y[i] } dp = 0; for (int k=0; k < 4; k++) { partialprod(k, k*n/4, n/4); } for (int i=0; i < 4; i++) { dp += pdp[i]; } void partialprod(int k, int a, int b) { pdp[k] = 0; for (i=a; i<a+b; i++) { pdp[k] += x[i]*y[i]; } }

dp = 0; for (int k=0; k < 4; k++) { createthread(partialprod(k, k*n/4, n/4)); } waitonthreadstocomplete(); for (int i=0; i < 4; i++) { dp += pdp[i]; } void partialprod(int k, int a, int b) { pdp[k] = 0; for (i=a; i<a+b; i++) { pdp[k] += x[i]*y[i]; } }

Windows/Solaris are slightly different Versions of linux also know about threads Kernel threads are special

Terms the master thread terminating a thread Is called joining starting a thread Is called forking Figure 2.2 Copyright 2010, Elsevier Inc. All rights Reserved

Principle of locality Accessing one location is followed by an access of a nearby location. Spatial locality accessing a nearby location. Temporal locality accessing in the near future. Copyright 2010, Elsevier Inc. All rights Reserved

Principle of locality float z[1000]; sum = 0.0; for (i = 0; i < 1000; i++) sum += z[i]; Copyright 2010, Elsevier Inc. All rights Reserved

Questions: Is my data in the cache? How do I find it? What if it is not in the cache? Two broad solutions: Cache items can go anywhere in cache (associative) Must have some method of looking up Each memory location in certain locations (direct) Must have rule of mapping

Cache mappings Full associative a new line can be placed at any location in the cache. Direct mapped each cache line has a unique location in the cache to which it will be assigned. n-way set associative each cache line can be place in one of n different locations in the cache. Copyright 2010, Elsevier Inc. All rights Reserved

When more than one line in memory can be mapped to several different locations in cache we also need to be able to decide which line should be replaced or evicted. First in First Out (FIFO) Least Recently Used (LRU) Least Frequently Used (LFU)

Example Copyright 2010, Elsevier Inc. All rights Reserved

Cache lines Cache Line Store more than just a single address, instead we store x addresses line of data. Copyright 2010, Elsevier Inc. All rights Reserved

for (j = 0; j < 1000; j++) column_sum[j] = 0.0; for (i = 0; i < 1000; i++) column_sum[j] += b[i][j];

Writing to Cache When a CPU writes data to cache, the value in cache may be inconsistent with the value in main memory. Write-through caches handle this by updating the data in main memory at the time it is written to cache. Write-back caches mark data in the cache as dirty. When the cache line is replaced by a new cache line from memory, the dirty line is written to memory. Copyright 2010, Elsevier Inc. All rights Reserved

Levels of Data Access

Virtual memory If we run a very large program or a program that accesses very large data sets, all of the instructions and data may not fit into main memory. Virtual memory functions as a cache for secondary storage. It exploits the principle of spatial and temporal locality. It only keeps the active parts of running programs in main memory. Copyright 2010, Elsevier Inc. All rights Reserved

Virtual memory Swap space - those parts that are idle are kept in a block of secondary storage. Pages blocks of data and instructions. Usually these are relatively large. Most systems have a fixed page size that currently ranges from 4 to 16 kilobytes. Copyright 2010, Elsevier Inc. All rights Reserved

Virtual page numbers When a program is compiled its pages are assigned virtual page numbers. When the program is run, a table is created that maps the virtual page numbers to physical addresses. A page table is used to translate the virtual address into a physical address. Copyright 2010, Elsevier Inc. All rights Reserved

Translation-lookaside buffer (TLB) Using a page table has the potential to significantly increase each program s overall run-time A special address translation cache in the processor. It caches a small number of entries (typically 16512) from the page table in very fast memory. Page fault attempting to access a valid physical address for a page in the page table but the page is only stored on disk. Copyright 2010, Elsevier Inc. All rights Reserved

Instruction Level Parallelism (ILP) Attempts to improve processor performance by having multiple processor components or functional units simultaneously executing instructions. Pipelining - functional units are arranged in stages. Multiple issue - multiple instructions can be simultaneously initiated. Copyright 2010, Elsevier Inc. All rights Reserved

Pipelining example Add the floating point numbers 9.87 104 and 6.54 103 Copyright 2010, Elsevier Inc. All rights Reserved

Pipelining example float x[1000], y[1000], z[1000]; for (int i=0; i < 1000; i++) z[i] = x[i] + y[i] Assume each operation takes one nanosecond. 7 operations per addition. This for loop takes about 7000 nanoseconds.

Pipelining Divide the floating point adder into 7 separate pieces of hardware or functional units. First unit fetches two operands, second unit compares exponents, etc. Output of one functional unit is input to the next. Numbers in the table are subscripts of operands/results. Copyright 2010, Elsevier Inc. All rights Reserved

Pipelining example float x[1000], y[1000], z[1000]; for (int i=0; I < 1000; i++) z[i] = x[i] + y[i] Each operation still takes one nanosecond. 7 operations per addition (still 7ns per addition) How long will 1000 operations take? 1006ns with pipelining!

Multiple Issue Multiple issue processors replicate functional units and try to simultaneously execute different instructions in a program. for (i = 0; i < 1000; i++) z[i] = x[i] + y[i]; Copyright 2010, Elsevier Inc. All rights Reserved

Multiple Issue VLIW Very long instruction word functional units are scheduled at compile time (static scheduling). Superscaler - functional units are scheduled at run-time (dynamic scheduling).

Speculation In order to make use of multiple issue, the system must find instructions that can be executed simultaneously. z = x + y; if ( z > 0) w = x; else w = y; Copyright 2010, Elsevier Inc. All rights Reserved