Memory Access Scheduling
|
|
- Aubrey Hampton
- 5 years ago
- Views:
Transcription
1 Memory Access Scheduling ECE 5900 Computer Engineering Seminar Ying Xu Mar 4, 2005 Instructor: Dr. Chigan 1 ECE 5900 spring 05 1
2 Outline Introduction Modern DRAM architecture Memory access scheduling Structure of access scheduler Scheduling policies Experimental results First-ready scheduling Aggressive reordering Conclusions 2 ECE 5900 spring 05 2
3 Introduction Bandwidth of memory chip increases dramatically DDR2, SDRAM Media processors Streaming memory reference patterns Memory bandwidth bottleneck 3 ECE 5900 spring 05 3
4 Intro (contd) Pipelining memory accesses Maximize the memory bandwidth Sequential accesses to the different row of the same bank can t be pipelined Memory access scheduling Reorder memory operations Bank precharge, row activation, column access Memory references completed out of order 4 ECE 5900 spring 05 4
5 Intro(contd) 5 ECE 5900 spring 05 5
6 Characteristics of DRAM architecture DRAMs are not truly random access devices 3 dimensional memories Bank Row Column 3 operations Bank precharge Row activation Column access 6 ECE 5900 spring 05 6
7 DRAM organization 7 ECE 5900 spring 05 7
8 Resource constraints of DRAMS Dram resources Internal banks A single set of address lines A single set of data lines Different operation has different demand 8 ECE 5900 spring 05 8
9 Bank state 9 ECE 5900 spring 05 9
10 Memory access scheduling Process of ordering DRAM operations Subject to resource constraints Simplest: oldest pending references first Inefficient DRAM Not ready for the oldest references Leave the available resource idle Need more complicated scheduling algorithm 10 ECE 5900 spring 05 10
11 Memory access scheduler structure 11 ECE 5900 spring 05 11
12 Memory access scheduling policies 12 ECE 5900 spring 05 12
13 Memory access scheduling algorithm Combination of policies used by precharge manager, row arbiter, column arbiter, address arbiter Address arbiter decides which selected precharge, row, column operation to perform Choices: in-order, priority, precharge operation first, row operation first, column operation first 13 ECE 5900 spring 05 13
14 Experimental setup Streaming media processors are preferred Streams lack temporal locality Stream transfer bandwidth drives the processor performance The image stream processor is simulated frequency 500MHZ Dram frequency 125MHZ Peak system bandwidth 2GB/s 14 ECE 5900 spring 05 14
15 Experimental setup(contd) Benchmarks and media processing applications 15 ECE 5900 spring 05 15
16 In order scheduling In-order access scheduler No access reordering A column is only performed for the oldest pending reference; same as bank precharge and row activation Baseline 16 ECE 5900 spring 05 16
17 First-ready ready scheduling Uses the ordered priority scheme for all units Subjects to resource and timing constraints Schedule an operation for the oldest pending references Benefits: Accesses targeting other banks can be performed while waiting for a precharge or row activation parallelism: multiple references in progress 17 ECE 5900 spring 05 17
18 Experimental results Sustained memory bandwidth increased about 79% 18 ECE 5900 spring 05 18
19 Experimental results Sustained bandwidth increased about 17% 19 ECE 5900 spring 05 19
20 Experimental results Sustained memory bandwidth increased about 79% 20 ECE 5900 spring 05 20
21 Aggressive reordering Drawback of first-ready scheduling Precharges a bank when the oldest pending reference targets a different row than the active row in a bank, there are still multiple pending references to the active row Aggressive reordering to further increase sustained memory bandwidth 21 ECE 5900 spring 05 21
22 Possible reordering scheduling algorithm polices Large range of possible memory access scheduler Four representative 22 ECE 5900 spring 05 22
23 Experimental results Improve bandwidth by % 23 ECE 5900 spring 05 23
24 Experimental results Improve bandwidth by 27-30% 24 ECE 5900 spring 05 24
25 Experimental results Improve bandwidth 85-93% 25 ECE 5900 spring 05 25
26 Row-first policy VS column first policy Address arbiter Row-first: always select row operation first Column-first: always select column operation first Little difference across all benchmarks Exception: FFT Less to do with the scheduling algorithm than the characteristic of benchmark itself FFT most sensitive to stream load latency Col/op policy allows a store stream to delay load streams 26 ECE 5900 spring 05 26
27 Open or closed precharge policy? Closed precharge policy banks are precharged as soon as no pending references to the active row Open precharge policy No pending references to the active row, pending references to other rows of the same bank Difference between open and closed precharge policy is slight Benchmarks with random access pattern prefer closed precharge policy Little reference locality No benefit to keep row open FFT prefers op precharge policy Numerous accesses to each row 27 ECE 5900 spring 05 27
28 Effect of bank buffer size Row/closed scheduling algorithm 28 ECE 5900 spring 05 28
29 Conclusions Memory access scheduling greatly increases the bandwidth utilization Buffering memory references Access internal banks in parallel Maximize the number of column accesses per row access First ready scheduling algorithm 79% bandwidth improvement on microbenchmarks, 40% on application traces Aggressive reordering algorithm 144% bandwidth improvement on benchmarks, 30% on media processing applications, 93% on the application traces 29 ECE 5900 spring 05 29
30 Conclusions Closed precharge policy preferred by most benchmarks Little difference in performance between rowfirst or column first policies. For latency sensitive applications, scheduling loads ahead of stores preferred. Banks are precharged as soon as the last column reference to an active row is completed 30 ECE 5900 spring 05 30
31 Paper reference Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, John D. Owens, Memory access scheduling, ACM SIGARCH Computer Architecture News, Proceedings of the 27th annual international symposium on Computer architecture, Volume 28 Issue 2, May ECE 5900 spring 05 31
32 Thank you! 32 ECE 5900 spring 05 32
2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]
EE482: Advanced Computer Organization Lecture #7 Processor Architecture Stanford University Tuesday, June 6, 2000 Memory Systems and Memory Latency Lecture #7: Wednesday, April 19, 2000 Lecturer: Brian
More informationMemory Access Scheduling
To appear in ISA-27 (2) Memory Access Scheduling Scott Rixner 1, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens omputer Systems Laboratory Stanford University Stanford, A 9435 {rixner,
More informationMemory Access Scheduler
Memory Access Scheduler Matt Cohen and Alvin Lin 6.884 Final Project Report May 10, 2005 1 Introduction We propose to implement a non-blocking memory system based on the memory access scheduler described
More informationStanford University Computer Systems Laboratory. Stream Scheduling. Ujval J. Kapasi, Peter Mattson, William J. Dally, John D. Owens, Brian Towles
Stanford University Concurrent VLSI Architecture Memo 122 Stanford University Computer Systems Laboratory Stream Scheduling Ujval J. Kapasi, Peter Mattson, William J. Dally, John D. Owens, Brian Towles
More informationIMAGINE: Signal and Image Processing Using Streams
IMAGINE: Signal and Image Processing Using Streams Brucek Khailany William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong, John D. Owens, Brian Towles Concurrent VLSI Architecture
More informationLatency Symbol DRAM cycles Latency Symbol DRAM cycles Latency Symbol DRAM cycles Read / Write tbl 4 Additive Latency AL 0 Activate to Activate trc 39
Regular Paper Triple-helix Writeback: Reducing Inner-Bankgroup Contention in DDR4 SDRAM Memory System Kohei Hosokawa 1,a) Yasuo Ishii 1,b) Kei Hiraki 1,c) Abstract: For decades, memory controllers for
More informationThe Implementation and Analysis of Important Symmetric Ciphers on Stream Processor
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore The Implementation and Analysis of Important Symmetric Ciphers on Stream Processor
More informationUnderstanding and Evaluating the Performance of DRAM Memory Controller Policies under Various Algorithms Using DRAMsim
Understanding and Evaluating the Performance of DRAM Memory Controller Policies under Various Algorithms Using DRAMsim Tanima Dey, Enamul Hoque, Sudhanva Gurumurthi Department of Computer Science School
More informationLecture: Memory Technology Innovations
Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile cells, photonics Multiprocessor intro 1 Row Buffers
More informationReducing main memory access latency through SDRAM address mapping techniques and access reordering mechanisms
Michigan Technological University Digital Commons @ Michigan Tech Dissertations, Master's Theses and Master's Reports - Open Dissertations, Master's Theses and Master's Reports 2006 Reducing main memory
More informationEE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems
EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems Mattan Erez The University of Texas at Austin EE382: Principles of Computer Architecture, Fall 2011 -- Lecture
More informationMain Memory Systems. Department of Electrical Engineering Stanford University Lecture 5-1
Lecture 5 Main Memory Systems Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 5-1 Announcements If you don t have a group of 3, contact us ASAP HW-1 is
More informationHigh Performance Memory Access Scheduling Using Compute-Phase Prediction and Writeback-Refresh Overlap
High Performance Memory Access Scheduling Using Compute-Phase Prediction and Writeback-Refresh Overlap Yasuo Ishii Kouhei Hosokawa Mary Inaba Kei Hiraki The University of Tokyo, 7-3-1, Hongo Bunkyo-ku,
More informationAccelerated Motion Estimation of H.264 on Imagine Stream Processor
Accelerated Motion Estimation of H.264 on Imagine Stream Processor Haiyan Li, Mei Wen, Chunyuan Zhang, Nan Wu, Li Li, Changqing Xun School of Computer Science, National University of Defense Technology
More informationAn introduction to SDRAM and memory controllers. 5kk73
An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part
More informationEnergy Optimizations for FPGA-based 2-D FFT Architecture
Energy Optimizations for FPGA-based 2-D FFT Architecture Ren Chen and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Ganges.usc.edu/wiki/TAPAS Outline
More informationLecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)
Lecture 15: DRAM Main Memory Systems Today: DRAM basics and innovations (Section 2.3) 1 Memory Architecture Processor Memory Controller Address/Cmd Bank Row Buffer DIMM Data DIMM: a PCB with DRAM chips
More informationAN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4. Bas Breijer, Filipa Duarte, and Stephan Wong
AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4 Bas Breijer, Filipa Duarte, and Stephan Wong Computer Engineering, EEMCS Delft University of Technology Mekelweg 4, 2826CD, Delft, The Netherlands email:
More informationADAPTIVE HISTORY-BASED MEMORY SCHEDULERS FOR MODERN PROCESSORS
ADAPTIVE HISTORY-BASED MEMORY SCHEDULERS FOR MODERN PROCESSORS CAREFUL MEMORY SCHEDULING CAN INCREASE MEMORY BANDWIDTH AND OVERALL SYSTEM PERFORMANCE. WE PRESENT A NEW MEMORY SCHEDULER THAT MAKES DECISIONS
More information15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationOptimizing Memory Performance for FPGA Implementation of PageRank
Optimizing Memory Performance for FPGA Implementation of PageRank Shijie Zhou, Charalampos Chelmis, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles,
More informationBasics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS
Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationECE 5730 Memory Systems
ECE 5730 Memory Systems Spring 2009 More on Memory Scheduling Lecture 16: 1 Exam I average =? Announcements Course project proposal What you propose to investigate What resources you plan to use (tools,
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More informationAdvanced cache optimizations. ECE 154B Dmitri Strukov
Advanced cache optimizations ECE 154B Dmitri Strukov Advanced Cache Optimization 1) Way prediction 2) Victim cache 3) Critical word first and early restart 4) Merging write buffer 5) Nonblocking cache
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationSlide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, DRAM Bandwidth
Slide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2016 DRAM Bandwidth MEMORY ACCESS PERFORMANCE Objective To learn that memory bandwidth is a first-order performance factor in
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationThis Unit: Main Memory. Building a Memory System. First Memory System Design. An Example Memory System
This Unit: Main Memory Building a Memory System Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Memory hierarchy review DRAM technology A few more transistors Organization:
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 10 -- Cache I 2014-2-20 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L10: Cache I UC
More informationA Power and Temperature Aware DRAM Architecture
A Power and Temperature Aware DRAM Architecture Song Liu, Seda Ogrenci Memik, Yu Zhang, and Gokhan Memik Department of Electrical Engineering and Computer Science Northwestern University, Evanston, IL
More informationLecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models
Lecture: Memory, Multiprocessors Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models 1 Refresh Every DRAM cell must be refreshed within a 64 ms window A row
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCSE 599 I Accelerated Computing - Programming GPUS. Memory performance
CSE 599 I Accelerated Computing - Programming GPUS Memory performance GPU Teaching Kit Accelerated Computing Module 6.1 Memory Access Performance DRAM Bandwidth Objective To learn that memory bandwidth
More informationStride- and Global History-based DRAM Page Management
1 Stride- and Global History-based DRAM Page Management Mushfique Junayed Khurshid, Mohit Chainani, Alekhya Perugupalli and Rahul Srikumar University of Wisconsin-Madison Abstract To improve memory system
More informationData Parallel Architectures
EE392C: Advanced Topics in Computer Architecture Lecture #2 Chip Multiprocessors and Polymorphic Processors Thursday, April 3 rd, 2003 Data Parallel Architectures Lecture #2: Thursday, April 3 rd, 2003
More informationDesign and Verification of High Speed SDRAM Controller with Adaptive Bank Management and Command Pipeline
Design and Verification of High Speed SDRAM Controller with Adaptive Bank Management and Command Pipeline Ganesh Mottee, P.Shalini Mtech student, Dept of ECE, SIR MVIT Bangalore, VTU university, Karnataka,
More informationProcessor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs
Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs Shin-Shiun Chen, Chun-Kai Hsu, Hsiu-Chuan Shih, and Cheng-Wen Wu Department of Electrical Engineering National Tsing Hua University
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast
More informationNVIDIA nforce IGP TwinBank Memory Architecture
NVIDIA nforce IGP TwinBank Memory Architecture I. Memory Bandwidth and Capacity There s Never Enough With the recent advances in PC technologies, including high-speed processors, large broadband pipelines,
More informationTradeoff between coverage of a Markov prefetcher and memory bandwidth usage
Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Elec525 Spring 2005 Raj Bandyopadhyay, Mandy Liu, Nico Peña Hypothesis Some modern processors use a prefetching unit at the front-end
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationComputer Architecture Lecture 24: Memory Scheduling
18-447 Computer Architecture Lecture 24: Memory Scheduling Prof. Onur Mutlu Presented by Justin Meza Carnegie Mellon University Spring 2014, 3/31/2014 Last Two Lectures Main Memory Organization and DRAM
More informationUtilizing RF-I and Intelligent Scheduling for Better Throughput/Watt in a Mobile GPU Memory System
Utilizing RF-I and Intelligent Scheduling for Better Throughput/Watt in a Mobile GPU Memory System KANIT THERDSTEERASUKDI, University of California, Los Angeles GYUNGSU BYUN, West Virginia University JASON
More informationIMAGINE: MEDIA PROCESSING
IMAGINE: MEDIA PROCESSING WITH STREAMS THE POWER-EFFICIENT IMAGINE STREAM PROCESSOR ACHIEVES PERFORMANCE DENSITIES COMPARABLE TO THOSE OF SPECIAL-PURPOSE EMBEDDED PROCESSORS. EXECUTING PROGRAMS MAPPED
More informationA Bandwidth-efficient Architecture for a Streaming Media Processor
A Bandwidth-efficient Architecture for a Streaming Media Processor by Scott Rixner B.S. Computer Science Massachusetts Institute of Technology, 1995 M.Eng. Electrical Engineering and Computer Science Massachusetts
More informationLecture 18: DRAM Technologies
Lecture 18: DRAM Technologies Last Time: Cache and Virtual Memory Review Today DRAM organization or, why is DRAM so slow??? Lecture 18 1 Main Memory = DRAM Lecture 18 2 Basic DRAM Architecture Lecture
More informationTextbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:
Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331
More informationCS698Y: Modern Memory Systems Lecture-16 (DRAM Timing Constraints) Biswabandan Panda
CS698Y: Modern Memory Systems Lecture-16 (DRAM Timing Constraints) Biswabandan Panda biswap@cse.iitk.ac.in https://www.cse.iitk.ac.in/users/biswap/cs698y.html Row decoder Accessing a Row Access Address
More informationComputer Systems Laboratory Sungkyunkwan University
DRAMs Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Main Memory & Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationAn Approach for Adaptive DRAM Temperature and Power Management
An Approach for Adaptive DRAM Temperature and Power Management Song Liu, Seda Ogrenci Memik, Yu Zhang, and Gokhan Memik Department of Electrical Engineering and Computer Science Northwestern University,
More informationPower Aware External Bus Arbitration for System-on-a-Chip Embedded Systems
Power Aware External Bus Arbitration for System-on-a-Chip Embedded Systems Ke Ning 12 and David Kaeli 1 1 Northeastern University 360 Huntington Avenue, Boston MA 02115 2 Analog Devices Inc. 3 Technology
More informationHigh Performance Memory Requests Scheduling Technique for Multicore Processors
High Performance Memory Requests Scheduling Technique for Multicore Processors Walid El-Reedy Electronics and Comm. Engineering Cairo University, Cairo, Egypt walid.elreedy@gmail.com Ali A. El-Moursy Electrical
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (III)
COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory
More informationIBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation
Blue Gene/P ASIC Memory Overview/Considerations No virtual Paging only the physical memory (2-4 GBytes/node) In C, C++, and Fortran, the malloc routine returns a NULL pointer when users request more memory
More informationExploring GPU Architecture for N2P Image Processing Algorithms
Exploring GPU Architecture for N2P Image Processing Algorithms Xuyuan Jin(0729183) x.jin@student.tue.nl 1. Introduction It is a trend that computer manufacturers provide multithreaded hardware that strongly
More informationChapter 02. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Chapter 02 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 2.1 The levels in a typical memory hierarchy in a server computer shown on top (a) and in
More informationThe Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):
The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:
More informationA Comprehensive Analytical Performance Model of DRAM Caches
A Comprehensive Analytical Performance Model of DRAM Caches Authors: Nagendra Gulur *, Mahesh Mehendale *, and R Govindarajan + Presented by: Sreepathi Pai * Texas Instruments, + Indian Institute of Science
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationThe Memory Hierarchy & Cache
Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory
More informationIntro to Computer Architecture, Spring 2012 Midterm Exam II. Name:
18-447 Intro to Computer Architecture, Spring 2012 Midterm Exam II Instructor: Onur Mutlu Teaching Assistants: Chris Fallin, Lavanya Subramanian, Abeer Agrawal Date: April 11, 2012 Name: Instructions:
More informationEffective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management
International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,
More informationABSTRACT. This dissertation investigates prefetching scheme for servers with respect to realistic
ABSTRACT Title Of Dissertation: PREFETCHING VS THE MEMORY SYSTEM : OPTIMIZATIONS FOR MULTI-CORE SERVER PLATFORMS Sadagopan Srinivasan, Doctor of Philosophy, 2007 Dissertation Directed by: Professor Bruce
More informationMaster Informatics Eng.
Advanced Architectures Master Informatics Eng. 207/8 A.J.Proença The Roofline Performance Model (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 207/8 AJProença, Advanced Architectures,
More informationMainstream Computer System Components
Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved
More informationMainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation
Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer
More informationCENG3420 Lecture 08: Memory Organization
CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory
More informationComputer Architecture
Computer Architecture Lecture 1: Introduction and Basics Dr. Ahmed Sallam Suez Canal University Spring 2016 Based on original slides by Prof. Onur Mutlu I Hope You Are Here for This Programming How does
More informationThe Design Space of Data-Parallel Memory Systems
The Design Space of Data-Parallel Memory Systems Jung Ho Ahn, Mattan Erez, and William J. Dally Computer Systems Laboratory Stanford University, Stanford, California 95, USA {gajh,merez,billd}@cva.stanford.edu
More informationRethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization
Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Fazal Hameed and Jeronimo Castrillon Center for Advancing Electronics Dresden (cfaed), Technische Universität Dresden,
More informationNegotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye
Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface
More informationStaged Memory Scheduling
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationStructure of Computer Systems. advantage of low latency, read and write operations with auto-precharge are recommended.
148 advantage of low latency, read and write operations with auto-precharge are recommended. The MB81E161622 chip is targeted for small-scale systems. For that reason, the output buffer capacity has been
More informationCSEE W4824 Computer Architecture Fall 2012
CSEE W4824 Computer Architecture Fall 2012 Lecture 8 Memory Hierarchy Design: Memory Technologies and the Basics of Caches Luca Carloni Department of Computer Science Columbia University in the City of
More informationDRAM Main Memory. Dual Inline Memory Module (DIMM)
DRAM Main Memory Dual Inline Memory Module (DIMM) Memory Technology Main memory serves as input and output to I/O interfaces and the processor. DRAMs for main memory, SRAM for caches Metrics: Latency,
More informationInternational IEEE Symposium on Field-Programmable Custom Computing Machines
- International IEEE Symposium on ield-programmable Custom Computing Machines Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Bandwidth Kentaro Sano Yoshiaki Hatsuda
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third
More informationA Fast Synchronous Pipelined DRAM Architecture with SRAM Buffers
A Fast Synchronous Pipelined DRAM Architecture with SRAM Buffers Chi-Weon Yoon, Yon-Kyun Im, Seon-Ho Han, Hoi-Jun Yoo and Tae-Sung Jung* Dept. of Electrical Engineering, KAIST *Samsung Electronics Co.,
More informationMemory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple
Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationMemory technology and optimizations ( 2.3) Main Memory
Memory technology and optimizations ( 2.3) 47 Main Memory Performance of Main Memory: Latency: affects Cache Miss Penalty» Access Time: time between request and word arrival» Cycle Time: minimum time between
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationComputer System Components
Computer System Components CPU Core 1 GHz - 3.2 GHz 4-way Superscaler RISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware
More informationDatasheet. Zetta 4Gbit DDR3L SDRAM. Features VDD=VDDQ=1.35V / V. Fully differential clock inputs (CK, CK ) operation
Zetta Datasheet Features VDD=VDDQ=1.35V + 0.100 / - 0.067V Fully differential clock inputs (CK, CK ) operation Differential Data Strobe (DQS, DQS ) On chip DLL align DQ, DQS and DQS transition with CK
More informationMemories: Memory Technology
Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy
More informationPerformance Evolution of DDR3 SDRAM Controller for Communication Networks
Performance Evolution of DDR3 SDRAM Controller for Communication Networks U.Venkata Rao 1, G.Siva Suresh Kumar 2, G.Phani Kumar 3 1,2,3 Department of ECE, Sai Ganapathi Engineering College, Visakhaapatnam,
More informationI/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13
I/O Handling ECE 650 Systems Programming & Engineering Duke University, Spring 2018 Based on Operating Systems Concepts, Silberschatz Chapter 13 Input/Output (I/O) Typical application flow consists of
More informationHigh Performance AXI Protocol Based Improved DDR3 Memory Controller With Improved Memory Bandwidth
High Performance AXI Protocol Based Improved DDR3 Memory Controller With Improved Memory Bandwidth Manoj Gupta a, Dr. Ashok Kumar Nagawat b a Research Scholar, Faculty of Science, University of Rajasthan,
More informationSpring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand
Main Memory & DRAM Nima Honarmand Main Memory Big Picture 1) Last-level cache sends its memory requests to a Memory Controller Over a system bus of other types of interconnect 2) Memory controller translates
More informationThe Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA
The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in
More informationLecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)
Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache 2 Virtually Indexed Caches 24-bit virtual address, 4KB page size 12 bits offset and 12 bits
More information