IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation
|
|
- Charlotte Jennings
- 6 years ago
- Views:
Transcription
1 Blue Gene/P ASIC
2 Memory Overview/Considerations No virtual Paging only the physical memory (2-4 GBytes/node) In C, C++, and Fortran, the malloc routine returns a NULL pointer when users request more memory than the physical memory available. We recommend you always check malloc() return values for validity.
3 Memory Overview/Considerations Avoid instructions when prefetching data in the L1 cache. Using the processor, you can concurrently fill in three L1 cache lines. It is mandatory to reduce the number of prefetching streams to three or less Applications tuned for Power 4-7 with more than 4 prefetching engines will choke BG/P memory. To take advantage of the single-instruction, multiple-data (SIMD) instructions, it is essential to keep the data in the L1 cache as much as possible The optimization of the applications must be based on the 32 KB of the L1 cache The benefits of the SIMD instructions can be canceled out if data does not fit in the L1 cache.
4 L1 Cache IBM PSSC Montpellier Customer Center Architecture 32kB I-cache + 32KB D-cache 32B line, 64 way-set-associative, 16 sets Round-robin replacement Write-through 2 buses towards the L2, 128 bits width 3 lines fetch buffer Performance L1 load hit 8B/cycle, 4 cycle latency (floating point) L1 load miss, L2 hit 4.6B/cycle, 12 cycle latency Store Write through, limited by external logic to about one request every 2.9 cycles (about 5.5B/cycle peak) Important Avoid instructions prefetching in L1. Reduce the number of prefetch streams below 3 Programmer can use the XL compiler directives or assembler instruction (dcbt) Without an intensive reused of the L1 and register, memory subsystem is not allowed to feed the double FPU
5 L2 Cache IBM PSSC Montpellier Customer Center 3 independent PPC ports Instruction read Data Read L2 Prefetch Data write L3 arbiter Switch function 1 read and 1 write switched to each L3 every 425MHz cycle L2 boosts the overall performance and does not require any special software provisions. PPC450 L1 Instr. Cache L1 Data Cache L1 Instr. Cache L1 Data Cache L2 Cache L2 IC RD L2 DC RD L2 DC WR L2 IC RD L2 DC RD L2 DC WR L3 Arbiter to L3 0 to L3 1 16B@850MHz
6 L2 Data Prefetch Prefetch engine 128B line prefetch 15 entry fully associative prefetch buffer 1 or 2 line deep sequential prefetching Supports up to 7 streams Supports strides less than 128B 4.6B/cycle, 12 cycle latency Data PPC450 Address L2 DATA READ / PREFETCH b line buffers 15 stream engines stream detector 8 miss history data from other units L3 data address
7 L3 Cache per port 8 read requests 4 write requests L3 cache 0 4 edram banks per chip, each containing independent Core0/1 edram directory 15 entry 128B-wide write combining buffer edram Hit under Miss resolution Limit defined by request s and write buffer Up to 8 read misses per port Up to 15 write misses per write combining buffer Limitation: banking conflict (possibility to configure dedicated L3/core need IBM Lab support) Core2/3 DMA L3 cache 1 edram edram
8 DRAM Controller 20 8b-wide DDR2 DRAM modules per controller 4 module-internal banks (4x512 MB) Command reordering based on bank availablility Theoretical bandwidth: 128 Bytes/16 cycles the L3 cache edram edram DDR controller Miss handler 16B@425MHz
9 Performance Latencies L1: 4 cycles L2: 12 cycles L3: 50 cycles DDR: 104 cycles Bandwidth/peak Reg <-> L1 cache: 8B/cycle L1 <-> L2: 4.6B/cycle L2 <-> L3: 16B/cycle per core L3 <-> DDR: less than 16B/cycle per chip
10 Bottlenecks L2 L3 switch Not a full core to L3 bank crossbar rate and bandwidth limited if two cores of one dual processor group access the same L3 cache Banking 4 banks on 512Mb DDR modules Burst-8 transfer (128B): 16 cycles Page open, access, precharge: 64 cycles Peak bandwidth only achievable if accessing 3 other banks before accessing the same bank again
11 Main Memory Banking Optimization Example For sequential access, two arrays used in a single operation must not be aligned on the same bank parameter (n= ) real(8) x(n), y(n), w(n).. do j=1,n x(j) = x(j) + y(j)*w() Enddo BG memory tuning parameter (n= , offset=16) real(8) x(n+offset), y(n+2*offset), w(n).. do j=1,n x(j) = x(j) + y(j)*w() Enddo
12 CNK Memory Usage Static TLB mappings Only 64 mappings per core Kernel mapped protected App R/O text & data protected Alignment restrictions Static TLB benefits no penalty for random data access patterns greatly simplifies DMA An implementation of mmap that uses a dynamically assigned physical memory would be required to take page translations in order to map the current virtual addresses to physical addresses. At this time, we have no plans to implement such a TLB miss handler.
13 CNK Memory Design (Memory leaks) The design points of CNK: to have a statically partitioned physical memory Reasons No memory translation exceptions If a virtual mapping is not known by the processor, the hardware issues an exception. The operating system is then required to install the mapping in the hardware (if it exists), or fail with a segfault (if it does not exist). Processing this exception can take a.25-1usec penalty per translation, depending on how sophisiticated the miss handler needs to be. For an application that performs a lot of pointer chasing or pseudo-random memory accesses, this effectively would result in a.25-1usec latency for every memory access. Determinisitic performance from run-to-run locking contention between processes Determinisitic out-of-memory e.g., process 0 will be able to use the same memory footprint from run-to-run because process 0 and processes 1-3 are not competing for the same memory allocations. No OS noise e.g., if nodes enter a barrier, and node 0 takes a translation fault, delaying its participating in the barrier by 1usec. The application just lost msec of "useful work".
14 CNK Memory Usage an application stores data in memory, it can be classified as follows: data = Initialized static and common variables bss = Uninitialized static and common variables heap = Controlled allocatable arrays stack = Controlled automatic arrays and variables You can use the Linux size command to gain an idea of the memory size of the program. However, the size command does not provide any information about the runtime memory The stack section starts from the top, at address 7fffd31c in SMP Node Mode (approximately 2 GB) and at address 403fd31c in Dual Node Mode (approximately1 GB) and 209fd31c in Virtual Node Mode (approximately 512 MB)
15 CNK / Shared Memory Support Shared Memory is supported in Virtual and Dual mode Normal theme: do it the Linux way shm_open() standard interface Allocate: fd = shm_open( SHM_FILE, O_RDWR, 0600 ); ftruncate( fds[0], MAX_SHARED_SIZE ); shmptr1 = mmap( NULL, MAX_SHARED_SIZE, PROT_READ PROT_WRITE, MAP_SHARED, fd, 0); Deallocate: munmap(shmptrl, MAX_SHARED_SIZE); close(fd) shm_unlink(shm_file);
16 CNK / Persistent Memory Persistent memory is process memory that retains its contents from job to job. To allocate persistent memory, the environment variable BG_PERSISTMEMSIZE=X must be specified, where X is the number of 1024*1024 bytes to be allocated for use as persistent memory. In order for the persistent memory to be maintained across jobs, all job submissions must specify the same value for BG_PERSISTMEMSIZE. The contents of persistent memory can be re-initialized during job startup by either changing the value of BG_PERSISTMEMSIZE or by specifying the environment variable BG_PERSISTMEMRESET=1. The following new kernel function was added to support persistent memory: persist_open()
Virtual Memory Nov 9, 2009"
Virtual Memory Nov 9, 2009" Administrivia" 2! 3! Motivations for Virtual Memory" Motivation #1: DRAM a Cache for Disk" SRAM" DRAM" Disk" 4! Levels in Memory Hierarchy" cache! virtual memory! CPU" regs"
More informationRandom-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics
Systemprogrammering 27 Föreläsning 4 Topics The memory hierarchy Motivations for VM Address translation Accelerating translation with TLBs Random-Access (RAM) Key features RAM is packaged as a chip. Basic
More informationvirtual memory Page 1 CSE 361S Disk Disk
CSE 36S Motivations for Use DRAM a for the Address space of a process can exceed physical memory size Sum of address spaces of multiple processes can exceed physical memory Simplify Management 2 Multiple
More informationRandom-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy
Systemprogrammering 29 Föreläsning 4 Topics! The memory hierarchy! Motivations for VM! Address translation! Accelerating translation with TLBs Random-Access (RAM) Key features! RAM is packaged as a chip.!
More informationMotivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk
class8.ppt 5-23 The course that gives CMU its Zip! Virtual Oct. 29, 22 Topics Motivations for VM Address translation Accelerating translation with TLBs Motivations for Virtual Use Physical DRAM as a Cache
More informationVirtual Memory: Systems
Virtual Memory: Systems 5-23: Introduction to Computer Systems 8 th Lecture, March 28, 27 Instructor: Franz Franchetti & Seth Copen Goldstein Recap: Hmmm, How Does This Work?! Process Process 2 Process
More informationVirtual Memory Oct. 29, 2002
5-23 The course that gives CMU its Zip! Virtual Memory Oct. 29, 22 Topics Motivations for VM Address translation Accelerating translation with TLBs class9.ppt Motivations for Virtual Memory Use Physical
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationCISC 360. Virtual Memory Dec. 4, 2008
CISC 36 Virtual Dec. 4, 28 Topics Motivations for VM Address translation Accelerating translation with TLBs Motivations for Virtual Use Physical DRAM as a Cache for the Disk Address space of a process
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationvirtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk
5-23 March 23, 2 Topics Motivations for VM Address translation Accelerating address translation with TLBs Pentium II/III system Motivation #: DRAM a Cache for The full address space is quite large: 32-bit
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationEarly experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007
Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some
More informationPowerPC 740 and 750
368 floating-point registers. A reorder buffer with 16 elements is used as well to support speculative execution. The register file has 12 ports. Although instructions can be executed out-of-order, in-order
More informationVirtual Memory. Motivations for VM Address translation Accelerating translation with TLBs
Virtual Memory Today Motivations for VM Address translation Accelerating translation with TLBs Fabián Chris E. Bustamante, Riesbeck, Fall Spring 2007 2007 A system with physical memory only Addresses generated
More informationCache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance
6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,
More informationIntel P The course that gives CMU its Zip! P6/Linux Memory System November 1, P6 memory system. Review of abbreviations
15-213 The course that gives CMU its Zip! P6/Linux ory System November 1, 01 Topics P6 address translation Linux memory management Linux fault handling memory mapping Intel P6 Internal Designation for
More informationBlue Gene/P Universal Performance Counters
Blue Gene/P Universal Performance Counters Bob Walkup (walkup@us.ibm.com) 256 counters, 64 bits each; hardware unit on the BG/P chip 72 counters are in the clock-x1 domain (ppc450 core: fpu, fp load/store,
More informationPortland State University ECE 588/688. Cray-1 and Cray T3E
Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More informationPorting Applications to Blue Gene/P
Porting Applications to Blue Gene/P Dr. Christoph Pospiech pospiech@de.ibm.com 05/17/2010 Agenda What beast is this? Compile - link go! MPI subtleties Help! It doesn't work (the way I want)! Blue Gene/P
More informationPentium/Linux Memory System March 17, 2005
15-213 The course that gives CMU its Zip! Topics Pentium/Linux Memory System March 17, 2005 P6 address translation x86-64 extensions Linux memory management Linux page fault handling Memory mapping 17-linuxmem.ppt
More information6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU
1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high
More informationCSC266 Introduction to Parallel Computing using GPUs Optimizing for Caches
CSC266 Introduction to Parallel Computing using GPUs Optimizing for Caches Sreepathi Pai October 4, 2017 URCS Outline Cache Performance Recap Data Layout Reuse Distance Besides the Cache Outline Cache
More information@2010 Badri Computer Architecture Assembly II. Virtual Memory. Topics (Chapter 9) Motivations for VM Address translation
Virtual Memory Topics (Chapter 9) Motivations for VM Address translation 1 Motivations for Virtual Memory Use Physical DRAM as a Cache for the Disk Address space of a process can exceed physical memory
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationMemory System Case Studies Oct. 13, 2008
Topics 15-213 Memory System Case Studies Oct. 13, 2008 P6 address translation x86-64 extensions Linux memory management Linux page fault handling Memory mapping Class15+.ppt Intel P6 (Bob Colwell s Chip,
More informationOptimising for the p690 memory system
Optimising for the p690 memory Introduction As with all performance optimisation it is important to understand what is limiting the performance of a code. The Power4 is a very powerful micro-processor
More informationAdvanced Computer Architecture
Advanced Computer Architecture 1 L E C T U R E 2: M E M O R Y M E M O R Y H I E R A R C H I E S M E M O R Y D A T A F L O W M E M O R Y A C C E S S O P T I M I Z A T I O N J A N L E M E I R E Memory Just
More informationVirtual Memory. Virtual Memory
Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical
More informationPage Which had internal designation P5
Intel P6 Internal Designation for Successor to Pentium Which had internal designation P5 Fundamentally Different from Pentium 1 Out-of-order, superscalar operation Designed to handle server applications
More informationSolutions for Chapter 7 Exercises
olutions for Chapter 7 Exercises 1 olutions for Chapter 7 Exercises 7.1 There are several reasons why you may not want to build large memories out of RAM. RAMs require more transistors to build than DRAMs
More informationMainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation
Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer
More informationMainstream Computer System Components
Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationComputer Systems A Programmer s Perspective 1 (Beta Draft)
Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface
More informationFoundations of Computer Systems
8-6 Foundations of Computer Systems Lecture 5: Virtual Memory Concepts and Systems October 8, 27 8-6 SE PL OS CA Required Reading Assignment: Chapter 9 of CS:APP (3 rd edition) by Randy Bryant & Dave O
More informationVirtual Memory 1. To do. q Segmentation q Paging q A hybrid system
Virtual Memory 1 To do q Segmentation q Paging q A hybrid system Address spaces and multiple processes IBM OS/360 Split memory in n parts (possible!= sizes) A process per partition Program Code Heap Operating
More informationMemory. From Chapter 3 of High Performance Computing. c R. Leduc
Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor
More informationIBM Blue Gene/Q solution
IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems
More informationP6 memory system P6/Linux Memory System October 31, Overview of P6 address translation. Review of abbreviations. Topics. Symbols: ...
15-213 P6/Linux ory System October 31, 00 Topics P6 address translation Linux memory management Linux fault handling memory mapping DRAM bus interface unit instruction fetch unit processor package P6 memory
More informationI/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13
I/O Handling ECE 650 Systems Programming & Engineering Duke University, Spring 2018 Based on Operating Systems Concepts, Silberschatz Chapter 13 Input/Output (I/O) Typical application flow consists of
More informationSystems Programming and Computer Architecture ( ) Timothy Roscoe
Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-6-) Timothy Roscoe Herbstsemester 26 AS 26 Virtual Memory 8: Virtual Memory Computer Architecture
More informationIntel P6 (Bob Colwell s Chip, CMU Alumni) The course that gives CMU its Zip! Memory System Case Studies November 7, 2007.
class.ppt 15-213 The course that gives CMU its Zip! ory System Case Studies November 7, 07 Topics P6 address translation x86-64 extensions Linux memory management Linux fault handling ory mapping Intel
More informationCS 261 Fall Mike Lam, Professor. Virtual Memory
CS 261 Fall 2016 Mike Lam, Professor Virtual Memory Topics Operating systems Address spaces Virtual memory Address translation Memory allocation Lingering questions What happens when you call malloc()?
More informationContent. Execution Modes SMP, DUAL, VN Partition Types HPC VS HTC HTC Compute Node Linux (CNL) IBM PSSC Montpellier Customer Center
Content IB SSC ontpellier Customer Center Execution odes S, DUAL, VN artition Types HC VS HTC HTC Compute Node Linux (CNL) Execution odes ossibilities Single Node / ulti Node 1, 2 or 4 rocesses per Node
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationCUDA Performance Optimization. Patrick Legresley
CUDA Performance Optimization Patrick Legresley Optimizations Kernel optimizations Maximizing global memory throughput Efficient use of shared memory Minimizing divergent warps Intrinsic instructions Optimizations
More informationUnderstanding the Design of Virtual Memory. Zhiqiang Lin
CS 6V8-05: System Security and Malicious Code Analysis Understanding the Design of Virtual Memory Zhiqiang Lin Department of Computer Science University of Texas at Dallas February 27 th, 202 Outline Basic
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More information14 May 2012 Virtual Memory. Definition: A process is an instance of a running program
Virtual Memory (VM) Overview and motivation VM as tool for caching VM as tool for memory management VM as tool for memory protection Address translation 4 May 22 Virtual Memory Processes Definition: A
More information1. PowerPC 970MP Overview
1. The IBM PowerPC 970MP reduced instruction set computer (RISC) microprocessor is an implementation of the PowerPC Architecture. This chapter provides an overview of the features of the 970MP microprocessor
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationChapter Seven Morgan Kaufmann Publishers
Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be
More informationStructure of Computer Systems
222 Structure of Computer Systems Figure 4.64 shows how a page directory can be used to map linear addresses to 4-MB pages. The entries in the page directory point to page tables, and the entries in a
More informationVirtual Memory. Today. Segmentation Paging A good, if common example
Virtual Memory Today Segmentation Paging A good, if common example Virtual memory system Goals Transparency Programs should not know that memory is virtualized; the OS +HW multiplex memory among processes
More informationVirtual Memory. Alan L. Cox Some slides adapted from CMU slides
Alan L. Cox alc@rice.edu Some slides adapted from CMU 5.23 slides Objectives Be able to explain the rationale for VM Be able to explain how VM is implemented Be able to translate virtual addresses to physical
More informationECE 4750 Computer Architecture, Fall 2017 T03 Fundamental Memory Concepts
ECE 4750 Computer Architecture, Fall 2017 T03 Fundamental Memory Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-09-26-15-52 1 Memory/Library Analogy 2 1.1. Three
More informationSE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan govind@serc Reality Check Question 1: Are real caches built to work on virtual addresses or physical addresses? Question 2: What about
More informationChapter 10: Virtual Memory. Lesson 05: Translation Lookaside Buffers
Chapter 10: Virtual Memory Lesson 05: Translation Lookaside Buffers Objective Learn that a page table entry access increases the latency for a memory reference Understand that how use of translationlookaside-buffers
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationMaster Informatics Eng.
Advanced Architectures Master Informatics Eng. 207/8 A.J.Proença The Roofline Performance Model (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 207/8 AJProença, Advanced Architectures,
More information18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013
18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,
More informationReducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses
Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW
More information1. Memory technology & Hierarchy
1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories
More informationFile Systems: Consistency Issues
File Systems: Consistency Issues File systems maintain many data structures Free list/bit vector Directories File headers and inode structures res Data blocks File Systems: Consistency Issues All data
More informationIntroducing the Cray XMT. Petr Konecny May 4 th 2007
Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationECE 571 Advanced Microprocessor-Based Design Lecture 13
ECE 571 Advanced Microprocessor-Based Design Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements More on HW#6 When ask for reasons why cache
More informationDRAM Main Memory. Dual Inline Memory Module (DIMM)
DRAM Main Memory Dual Inline Memory Module (DIMM) Memory Technology Main memory serves as input and output to I/O interfaces and the processor. DRAMs for main memory, SRAM for caches Metrics: Latency,
More informationCSCI 4061: Virtual Memory
1 CSCI 4061: Virtual Memory Chris Kauffman Last Updated: Thu Dec 7 12:52:03 CST 2017 2 Logistics: End Game Date Lecture Outside Mon 12/04 Lab 13: Sockets Tue 12/05 Sockets Thu 12/07 Virtual Memory Mon
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationProcesses and Virtual Memory Concepts
Processes and Virtual Memory Concepts Brad Karp UCL Computer Science CS 37 8 th February 28 (lecture notes derived from material from Phil Gibbons, Dave O Hallaron, and Randy Bryant) Today Processes Virtual
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationOutline. Low-Level Optimizations in the PowerPC/Linux Kernels. PowerPC Architecture. PowerPC Architecture
Low-Level Optimizations in the PowerPC/Linux Kernels Dr. Paul Mackerras Senior Technical Staff Member IBM Linux Technology Center OzLabs Canberra, Australia paulus@samba.org paulus@au1.ibm.com Introduction
More informationMemory management. Single process. Multiple processes. How to: All memory assigned to the process Addresses defined at compile time
Memory management Single process All memory assigned to the process Addresses defined at compile time Multiple processes. How to: assign memory manage addresses? manage relocation? manage program grow?
More informationTopics to be covered. EEC 581 Computer Architecture. Virtual Memory. Memory Hierarchy Design (II)
EEC 581 Computer Architecture Memory Hierarchy Design (II) Department of Electrical Engineering and Computer Science Cleveland State University Topics to be covered Cache Penalty Reduction Techniques Victim
More informationCommon Computer-System and OS Structures
Common Computer-System and OS Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection General System Architecture Oct-03 1 Computer-System Architecture
More informationLecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)
Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache 2 Virtually Indexed Caches 24-bit virtual address, 4KB page size 12 bits offset and 12 bits
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationMain Memory (Fig. 7.13) Main Memory
Main Memory (Fig. 7.13) CPU CPU CPU Cache Multiplexor Cache Cache Bus Bus Bus Memory Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3 Memory b. Wide memory organization c. Interleaved memory organization
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationAdvanced Computer Architecture
18-742 Advanced Computer Architecture Test 2 April 14, 1998 Name (please print): Instructions: DO NOT OPEN TEST UNTIL TOLD TO START YOU HAVE UNTIL 12:20 PM TO COMPLETE THIS TEST The exam is composed of
More informationMemory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.
Memory Hierarchy Goal: Fast, unlimited storage at a reasonable cost per bit. Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory. Fast: When you need something
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationCSE 153 Design of Operating Systems
CSE 53 Design of Operating Systems Winter 28 Lecture 6: Paging/Virtual Memory () Some slides modified from originals by Dave O hallaron Today Address spaces VM as a tool for caching VM as a tool for memory
More information15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling
More informationComputer Systems Laboratory Sungkyunkwan University
DRAMs Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Main Memory & Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width
More informationCS399 New Beginnings. Jonathan Walpole
CS399 New Beginnings Jonathan Walpole Memory Management Memory Management Memory a linear array of bytes - Holds O.S. and programs (processes) - Each cell (byte) is named by a unique memory address Recall,
More informationAn introduction to SDRAM and memory controllers. 5kk73
An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part
More informationI/O Buffering and Streaming
I/O Buffering and Streaming I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More informationOpenMP on the IBM Cell BE
OpenMP on the IBM Cell BE PRACE Barcelona Supercomputing Center (BSC) 21-23 October 2009 Marc Gonzalez Tallada Index OpenMP programming and code transformations Tiling and Software Cache transformations
More informationRecap: Machine Organization
ECE232: Hardware Organization and Design Part 14: Hierarchy Chapter 5 (4 th edition), 7 (3 rd edition) http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy,
More information