Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni
|
|
- Naomi Henderson
- 6 years ago
- Views:
Transcription
1 orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni
2 Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26
3 Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O INTEFEENCE!!! 1/26
4 Multi-equestor Systems CPU CPU CPU Hard eal Time Systems Must Inter-connect be Predictable!!! DAM DMA I/O INTEFEENCE!!! 1/26
5 Multi-equestor Systems Schedulability Analysis: needs CET as input CET depends on hardware platform CET: needs Latency to access shared resource (e.g. cache, DAM) Existing approaches can bound the interference but they assume the latency for DAM access is constant 2/26
6 Multi-equestor Systems Schedulability Analysis: needs CET as input Problem: DAM latency is variable and changes depending on its state CET depends on hardware platform CET: needs Latency to access shared resource (e.g. cache, DAM) Existing approaches can bound the interference but they assume the latency for DAM access is constant 2/26
7 Contribution equestor Under Analysis CPU CPU CPU Timing analysis that bounds Inter-connect the worst case latency for DAM access DAM DMA I/O 3/26
8 Contribution Interfering equestors CPU CPU CPU Assuming we do not know Inter-connect what they are doing, so we assume they cause the worst case interference DAM DMA I/O Interfering equestors 3/26
9 Outline 1. Background & elated ork 2. Memory Controller Model 3. orst Case Latency Analysis 4. esults & Conclusion
10 Background Storage Array contains Can only ead/rite to ow Buffer 4/26
11 Background EAD Targeting in this ow ow Buffer contain data from a different row 4/26
12 Background EAD P, A, Front End generates the needed commands Back End issues commands on command bus 4/26
13 Background P, A, PE ACT ACT: Load the data from array Pre-Charge: into buffer store the data back into array P A Pre-charge command issued on command bus Timing Constraint 4/26
14 Background P, A, EAD P A 4/26
15 Background EAD Targeting Already in ow Buffer Only Need ead Command Can be issued immediately P A 4/26
16 Background EAD -Latency of a close request is much longer than the latency of an open request -Latency of memory access is variable! Latency of a close request Latency of a open request P A 4/26
17 Predictable Memory Controllers Close ow Policy: After each -Can access, not take the advantage row buffer of is automatically locality pre-charged (row hits) -Latency is much longer than open request Memory Latency is the same for all requests Implicit Next Pre-charge equest targets same bank A P A 5/26
18 Predictable Memory Controllers Interleaving Banks Bank 1 Bank 2 Bank 3 Bank 4 Accessing data in multiple banks A Multiple data can be pipelined A A A 6/26
19 Predictable Memory Controllers Interleaving Banks Bank 1 Bank 2 Bank 3 Bank 4 Problem: requestors can close each other s row buffer since they can access all banks A Thus closed row policy is used to make A latency predictable The problem of long latency of close row policy still exist! A A A 6/26
20 Predictable Memory Controllers Interleaving Banks This is good for system with small DAM data bus width (e.g. 16 bits) Bank 1 Bank 2 Bank 3 Bank 4 A A A A A Larger data buses can transfer same amount of data without interleaving so many banks 6/26
21 Predictable Memory Controllers Interleaving Banks Bank 1 Bank 2 Interleaving two banks for wider data bus (e.g. 32 bits) Interleaving Problems: A1. equestors can close each other s rows (interference) A 2. Must be used with close row policy to make latency predictable 3. For wider data bus, effectiveness of interleaving is diminished Time asted!! A 7/26
22 Predictable Memory Controllers Private Banks Can partition banks to either requestors or tasks Core 1 Core 2 DMA Bank 1 Bank 2 Bank 3 Bank 4 This can be done by: Hardware if Memory controller supports By compiler In OS, using virtual memory 8/26
23 elated ork AMC[1] and Predator [2]: -Close ow Policy -Interleaved Bank Conservative Open-Page [3]: Interleaved Bank Leave row open for a small window of time PET DAM Controller [4]: Close ow Policy Private Bank 9/26
24 Our Approach Private Bank eliminates row buffer Challenge: interferences from other requestors 1. Analysis is more complex 2. More than 20 timing constraints 3. Latency depends on the dynamic state of DAM Open ow Policy reduce latency and take advantage or row hit ratio (locality) 10/26
25 Outline 1. Background & elated ork 2. Memory Controller Model 3. orst Case Latency Analysis 4. esults & Conclusion
26 Memory Controller Model e focus on the back end latency ignore CONSTANT front end delay Front End Back End Core 1 Per equestor Buffers A Global FIFO Queue Command Bus DMA Command Generator A P Core 2 Bus 11/26
27 Memory Controller Model Each requestor has a Global private FIFO is used for arbitration buffer for memory command Front End Back End Core 1 Per equestor Buffers A Global FIFO Queue Command Bus DMA Command Generator A P Core 2 Bus 11/26
28 Memory Controller Model Command at head of each private buffer are inserted into the FIFO Front End Back End Core 1 Per equestor Buffers A Global FIFO Queue Command Bus DMA Command Generator A P Core 2 Bus 11/26
29 Memory Controller Model Command at head of each private buffer are inserted into the FIFO Front End Back End Core 1 DMA Command Generator Per equestor Buffers A Global FIFO Queue A P Command Bus Core 2 Bus 11/26
30 Memory Controller Model Controller scan the global FIFO from front to end for a command that can be issued Front End Back End Core 1 DMA Command Generator Per equestor Buffers A Global FIFO Queue A P Command Bus Core 2 Bus 11/26
31 Memory Controller Model Next command must wait until timing constraints are satisfied before it can be inserted into FIFO Core 1 DMA Intuitively, the arbitration is fair and Front is End similar to a round Back robin End policy Command Generator Per equestor Buffers A Command Issued Global FIFO Queue P Command Bus A Core 2 Bus 11/26
32 Outline 1. Background & elated ork 2. Memory Controller Model 3. orst Case Latency Analysis 4. esults & Conclusion
33 orst Case Analysis Total # of equestors Memory Device Parameters Task Under Analysis orst Case Single equest Latency Analysis Part 2 Only provided for in-order core # of open reads # of close reads # of open writes # of close writes Part 1 Main Contribution ork for any type of cores Latency for different types of request Open Close Open Assumption: ead ead rite e do not know about the activity on the other interfering requestors, so we assume those requestors Cumulative produce the worst case orst pattern Case to cause maximum interference Execution Time Close rite CET 12/26
34 orst Case Analysis Total # of equestors Memory Device Parameters orst Case Single equest Latency Analysis Latency for different types of request Open ead Close ead Open rite Close rite Task Under Analysis # of open reads # of close reads # of open writes # of close writes Cumulative orst Case Execution Time CET 12/26
35 Single equest Latency Decomposed into two parts equest Arrival / / Arrival to ead/rite ead/rite to Arrival until ead/rite command is inserted into the global FIFO ead/write inserted into FIFO until data is finished transmitting 13/26
36 Single equest Latency This part may include Pre-charge and ACT commands equest Arrival / P A / Arrival to ead/rite ead/rite to Latency depends on the previous request (i.e., state of the DAM) Latency does not depend on state of the DAM 13/26
37 Single equest Latency Both parts depends on the # of interfering requestors as well as DAM timing constraints equest Arrival / P A / Arrival to ead/rite ead/rite to 13/26
38 Single equest Latency equest Arrival / P A / Arrival to ead/rite ead/rite to For details on this part, refer to paper e will focus on this part 13/26
39 ead/rite to Latency ead to ead has no timing constraints, only contention on the data bus Same for rite to rite 14/26
40 ead/rite to Latency Therefore, an alternation of read and write commands produce longer latency rite to ead timing constraint ead to rite timing constraint 15/26
41 ead/rite to Latency Interference on rite command All other requestors inserts / commands to create maximum interference Front 16/26
42 ead/rite to Latency Interference on rite command Front A write command could of finished immediately before t 0 17/26
43 ead/rite to Latency Interference on rite command Therefore, further delay the first ead command Front 18/26
44 orst Case Analysis Total # of equestors Memory Device Parameters orst Case Single equest Latency Analysis Part 2 Only provided for in-order core Latency for different types of request Open ead Close ead Open rite Close rite Task Under Analysis # of open reads # of close reads # of open writes # of close writes Cumulative orst Case Execution Time CET
45 Cumulative Latency Open ead Close ead Open rite Close rite Task Under Analysis: t 19/26
46 Cumulative Latency orst case request order depends on input value, code path, cache state, etc. Open ead Close ead Open rite Close rite Task Under Analysis: If worst case request order is known, we can sum the latency of each request t 19/26
47 Cumulative Latency Open ead Close ead Open rite Close rite Static Analysis tools can be used to obtain safe bound for # of each type of request Task Under Analysis: If worst case request order is known, we can sum the latency of each request t 19/26
48 Cumulative Latency Open ead Close ead Open rite Close rite This problem can be solved in constant time; see paper for detail Task Under Analysis: hich pattern leads to worst case latency? 19/26
49 Outline 1. Background & elated ork 2. Memory Controller Model 3. orst Case Latency Analysis Single equest Latency Cumulative Latency 4. esults & Conclusion
50 esults Comparison against Analyzable Memory Controller [1] Since they use fair arbitration (ound obin) which is similar to our approach Synthetic Benchmarks Used to show how worst case latency varies as parameters are changed CHStone Benchmarks Memory traces are obtained from gem5 simulator Memory traces are used as input the worst case analysis 20/26
51 esults Synthetic Benchmarks 21/26
52 esults Synthetic Benchmarks 22/26
53 esults As memory devices becomes faster, the difference between open and close access is getting larger and therefore close row is becoming too pessimistic 50% ow Hit atio, 4 equestors, 20% rites Devices 800D (ns) 1066F (ns) 1333H (ns) 1600K (ns) 1866L (ns) 2133N (ns) % better AMC (64 bits) % Our (64 bits) % 23/26
54 esults CHStone Benchmarks for 64bits bus 24/26
55 Conclusion A novel worst case analysis that takes dynamic state into account Open row policy can reduce memory latency as devices are becoming faster Private bank scheme is used to eliminate row buffer interference from other requestors 25/26
56 Future ork Discussion of shared data Bus utilization is still poor due to read/write switching ead/rite optimization to reduce latency bound Handle Multiple anks Implementation in hardware 26/26
57 eferences [1] M. Paolieri, E. Quin ones, F. Cazorla, and M. Valero, An Analyzable Memory Controller for Hard eal-time CMPs, Embedded Systems Letters, IEEE, vol. 1, no. 4, pp , [2] B. Akesson, K. Goossens, and M. inghofer, Predator: a predictable SDAM memory controller, in CODES+ISSS, 2007, pp [3] S. Goossens, B. Akesson, and K. Goossens, Conservative Open- page Policy for Mixed Time-Criticality Memory Controllers, in DATE, [4] J. eineke, I. Liu, H. D. Patel, S. Kim, and E. A. Lee, Pret dram controller: Bank privatization for predictability and temporal isolation, in CODES+ISSS, 2011, pp
Trends in Embedded System Design
Trends in Embedded System Design MPSoC design gets increasingly complex Moore s law enables increased component integration Digital convergence creates a market for highly integrated devices The resulting
More informationWorst Case Analysis of DRAM Latency in Hard Real Time Systems
Worst Case Analysis of DRAM Latency in Hard Real Time Systems by Zheng Pei Wu A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied
More informationManaging Memory for Timing Predictability. Rodolfo Pellizzoni
Managing Memory for Timing Predictability Rodolfo Pellizzoni Thanks This work would not have been possible without the following students and collaborators Zheng Pei Wu*, Yogen Krish Heechul Yun* Renato
More informationVariability Windows for Predictable DDR Controllers, A Technical Report
Variability Windows for Predictable DDR Controllers, A Technical Report MOHAMED HASSAN 1 INTRODUCTION In this technical report, we detail the derivation of the variability window for the eight predictable
More informationMemory Controllers for Real-Time Embedded Systems. Benny Akesson Czech Technical University in Prague
Memory Controllers for Real-Time Embedded Systems Benny Akesson Czech Technical University in Prague Trends in Embedded Systems Embedded systems get increasingly complex Increasingly complex applications
More informationAdministrivia. Mini project is graded. 1 st place: Justin (75.45) 2 nd place: Liia (74.67) 3 rd place: Michael (74.49)
Administrivia Mini project is graded 1 st place: Justin (75.45) 2 nd place: Liia (74.67) 3 rd place: Michael (74.49) 1 Administrivia Project proposal due: 2/27 Original research Related to real-time embedded
More informationEECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun
EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,
More informationBalancing DRAM Locality and Parallelism in Shared Memory CMP Systems
Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard
More informationA Comparative Study of Predictable DRAM Controllers
1 A Comparative Study of Predictable DRAM Controllers DALU GUO, MOHAMED HASSA, RODOLFO PELLIZZOI, and HIRE PATEL, University of Waterloo, CADA Recently, the research community has introduced several predictable
More informationDesign and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256
Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256 Jan Reineke @ saarland university computer science ACACES Summer School 2017
More informationA Comparative Study of Predictable DRAM Controllers
0:1 0 A Comparative Study of Predictable DRAM Controllers Real-time embedded systems require hard guarantees on task Worst-Case Execution Time (WCET). For this reason, architectural components employed
More informationA Comparative Study of Predictable DRAM Controllers
A A Comparative Study of Predictable DRAM Controllers Danlu Guo,Mohamed Hassan,Rodolfo Pellizzoni and Hiren Patel, {dlguo,mohamed.hassan,rpellizz,hiren.patel}@uwaterloo.ca, University of Waterloo Recently,
More informationDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi $, Prathap Kumar Valsan^, Renato Mancuso *, Heechul Yun $ $ University of Kansas, ^ Intel, * Boston University
More informationReducing NoC and Memory Contention for Manycores
Reducing NoC and Memory Contention for Manycores Vishwanathan Chandru and Frank Mueller North Carolina State University, Raleigh, NC, mueller@cs.ncsu.edu Abstract. Platforms consisting of many computing
More informationarxiv: v1 [cs.dc] 25 Jul 2014
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems arxiv:1407.7448v1 [cs.dc] 25 Jul 2014 Heechul Yun University of Kansas, USA. heechul.yun@ku.edu July 29, 2014 Abstract In
More informationThis is the published version of a paper presented at MCC14, Seventh Swedish Workshop on Multicore Computing, Lund, Nov , 2014.
http://www.diva-portal.org This is the published version of a paper presented at MCC14, Seventh Swedish Workshop on Multicore Computing, Lund, Nov. 27-28, 2014. Citation for the original published paper:
More informationAn introduction to SDRAM and memory controllers. 5kk73
An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationMain Memory Supporting Caches
Main Memory Supporting Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width clocked bus Bus clock is typically slower than CPU clock Cache Issues 1 Example cache block read
More informationA Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study
A Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study Javier Jalle,, Eduardo Quiñones, Jaume Abella, Luca Fossati, Marco Zulianello, Francisco J. Cazorla, Barcelona
More informationPollard s Attempt to Explain Cache Memory
Pollard s Attempt to Explain Cache Start with (Very) Basic Block Diagram CPU (Actual work done here) (Starting and ending data stored here, along with program) Organization of : Designer s choice 1 Problem
More informationarxiv: v1 [cs.ar] 5 Jul 2012
Dynamic Priority Queue: An SDRAM Arbiter With Bounded Access Latencies for Tight WCET Calculation Hardik Shah 1, Andreas Raabe 1 and Alois Knoll 2 1 ForTISS GmbH, Guerickestr. 25, 80805 Munich 2 Department
More informationStaged Memory Scheduling
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationHardware Support for WCET Analysis of Hard Real-Time Multicore Systems
Hardware Support for WCET Analysis of Hard Real-Time Multicore Systems Marco Paolieri Barcelona Supercomputing Center (BSC) Barcelona, Spain marco.paolieri@bsc.es Eduardo Quiñones Barcelona Supercomputing
More informationReal-Time Mixed-Criticality Wormhole Networks
eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks
More informationDesign and Analysis of Real-Time Systems Predictability and Predictable Microarchitectures
Design and Analysis of Real-Time Systems Predictability and Predictable Microarcectures Jan Reineke Advanced Lecture, Summer 2013 Notion of Predictability Oxford Dictionary: predictable = adjective, able
More informationChapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel
Chapter-6 SUBJECT:- Operating System TOPICS:- I/O Management Created by : - Sanjay Patel Disk Scheduling Algorithm 1) First-In-First-Out (FIFO) 2) Shortest Service Time First (SSTF) 3) SCAN 4) Circular-SCAN
More informationDRAM Tutorial Lecture. Vivek Seshadri
DRAM Tutorial 18-447 Lecture Vivek Seshadri DRAM Module and Chip 2 Goals Cost Latency Bandwidth Parallelism Power Energy 3 DRAM Chip Bank I/O 4 Sense Amplifier top enable Inverter bottom 5 Sense Amplifier
More informationUnderstanding Shared Memory Bank Access Interference in Multi-Core Avionics
Understanding Shared Memory Bank Access Interference in Multi-Core Avionics Andreas Löfwenmark and Simin Nadjm-Tehrani Department of Computer and Information Science Linköping University, Sweden {andreas.lofwenmark,
More informationDesigning Predictable Real-Time and Embedded Systems
Designing Predictable Real-Time and Embedded Systems Juniorprofessor Dr. Jian-Jia Chen Karlsruhe Institute of Technology (KIT), Germany 0 KIT Feb. University 27-29, 2012 of at thetu-berlin, State of Baden-Wuerttemberg
More informationTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems
More informationAdvanced Caching Techniques
Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory
More informationA Timing Effects of DDR Memory Systems in Hard Real-Time Multicore Architectures: Issues and Solutions
A Timing Effects of DDR Memory Systems in Hard Real-Time Multicore Architectures: Issues and Solutions MARCO PAOLIERI, Barcelona Supercomputing Center (BSC) EDUARDO QUIÑONES, Barcelona Supercomputing Center
More information15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling
More informationLecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )
Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationCS 426 Parallel Computing. Parallel Computing Platforms
CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:
More information15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationOn the Off-chip Memory Latency of Real-Time Systems: Is DDR DRAM Really the Best Option?
On the Off-chip Memory Latency of Real-Time Systems: Is DDR DRAM Really the Best Option? Mohamed Hassan University of Guelph, Canada, mohamed.hassan@uoguelph.ca Intel Corporation, Canada, mohamed1.hassan@intel.com
More informationCache memory. Lecture 4. Principles, structure, mapping
Cache memory Lecture 4 Principles, structure, mapping Computer memory overview Computer memory overview By analyzing memory hierarchy from top to bottom, the following conclusions can be done: a. Cost
More informationA Predictable and Command-Level Priority-Based DRAM Controller for Mixed-Criticality Systems
This is the author prepared accepted version. 2015 IEEE. Hokeun Kim, David Broman, Edward A. Lee, Michael Zimmer, Aviral Shrivastava, and Junkwang Oh. A Predictable and Command-Level Priority-Based DRAM
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationCaches Concepts Review
Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on
More informationTiming analysis and timing predictability
Timing analysis and timing predictability Architectural Dependences Reinhard Wilhelm Saarland University, Saarbrücken, Germany ArtistDesign Summer School in China 2010 What does the execution time depends
More informationECE 30 Introduction to Computer Engineering
ECE 0 Introduction to Computer Engineering Study Problems, Set #9 Spring 01 1. Given the following series of address references given as word addresses:,,, 1, 1, 1,, 8, 19,,,,, 7,, and. Assuming a direct-mapped
More informationResource Sharing and Partitioning in Multicore
www.bsc.es Resource Sharing and Partitioning in Multicore Francisco J. Cazorla Mixed Criticality/Reliability Workshop HiPEAC CSW Barcelona May 2014 Transition to Multicore and Manycores Wanted or imposed
More informationSIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core
SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core Sebastian Hahn and Jan Reineke RTSS, Nashville December, 2018 saarland university computer science SIC: Provably Timing-Predictable
More informationFinal Lecture. A few minutes to wrap up and add some perspective
Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection
More informationTiming Effects of DDR Memory Systems in Hard Real-Time Multicore Architectures: Issues and Solutions
Timing Effects of DDR Memory Systems in Hard Real-Time Multicore Architectures: Issues and Solutions MARCO PAOLIERI and EDUARDO QUIÑONES, Barcelona Supercomputing Center (BSC) FRANCISCO J. CAZORLA, Spanish
More informationNegotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye
Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface
More informationECE/CS 757: Homework 1
ECE/CS 757: Homework 1 Cores and Multithreading 1. A CPU designer has to decide whether or not to add a new micoarchitecture enhancement to improve performance (ignoring power costs) of a block (coarse-grain)
More informationChapter Seven. Large & Fast: Exploring Memory Hierarchy
Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM
More informationSGI Challenge Overview
CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 2 (Case Studies) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationAdvanced Caching Techniques
Advanced Caching Approaches to improving memory system performance eliminate memory accesses/operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide
More informationNEW REAL-TIME MEMORY CONTROLLER DESIGN FOR EMBEDDED MULTI-CORE SYSTEM By Ahmed Shafik Shafie Mohamed
NEW REAL-TIME MEMORY CONTROLLER DESIGN FOR EMBEDDED MULTI-CORE SYSTEM By Ahmed Shafik Shafie Mohamed A Thesis Submitted to the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements
More informationA DRAM Centric NoC Architecture and Topology Design Approach
11 IEEE Computer Society Annual Symposium on VLSI A DRAM Centric NoC Architecture and Topology Design Approach Ciprian Seiculescu, Srinivasan Murali, Luca Benini, Giovanni De Micheli LSI, EPFL, Lausanne,
More informationStructure of Computer Systems
222 Structure of Computer Systems Figure 4.64 shows how a page directory can be used to map linear addresses to 4-MB pages. The entries in the page directory point to page tables, and the entries in a
More informationChallenges of WCET Analysis in COTS Multi-core due to Different Levels of Abstraction
Challenges of WCET Analysis in COTS Multi-core due to Different Levels of Abstraction Hardik Shah, Andreas Raabe, and Alois Knoll fortiss GmbH, Guerickestrasse 5, 80805 Munich, Germany Department of Informatics
More informationEfficient real-time SDRAM performance
1 Efficient real-time SDRAM performance Kees Goossens with Benny Akesson, Sven Goossens, Karthik Chandrasekar, Manil Dev Gomony, Tim Kouters, and others Kees Goossens
More informationImproving DRAM Performance by Parallelizing Refreshes with Accesses
Improving DRAM Performance by Parallelizing Refreshes with Accesses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu Executive Summary DRAM refresh interferes
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationPredictable Cache Coherence for Multi- Core Real-Time Systems
Predictable Cache Coherence for Multi- Core Real-Time Systems Mohamed Hassan, Anirudh M. Kaushik and Hiren Patel RTAS 2017 Motivation: Data sharing in multi-core real-time systems Core 1 RT tasks L1 D/I
More informationThe UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Comp 411 Computer Organization Fall 2006 Solutions Problem Set #10 Problem 1. Cache accounting The diagram below illustrates a blocked, direct-mapped cache
More informationLecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff
Lecture 20: ory Hierarchy Main ory and Enhancing its Performance Professor Alvin R. Lebeck Computer Science 220 Fall 1999 HW #4 Due November 12 Projects Finish reading Chapter 5 Grinch-Like Stuff CPS 220
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationMemory Systems and Compiler Support for MPSoC Architectures. Mahmut Kandemir and Nikil Dutt. Cap. 9
Memory Systems and Compiler Support for MPSoC Architectures Mahmut Kandemir and Nikil Dutt Cap. 9 Fernando Moraes 28/maio/2013 1 MPSoC - Vantagens MPSoC architecture has several advantages over a conventional
More informationMastering The Behavior of Multi-Core Systems to Match Avionics Requirements
www.thalesgroup.com Mastering The Behavior of Multi-Core Systems to Match Avionics Requirements Hicham AGROU, Marc GATTI, Pascal SAINRAT, Patrice TOILLON {hicham.agrou,marc-j.gatti, patrice.toillon}@fr.thalesgroup.com
More informationCS/CoE 1541 Exam 2 (Spring 2019).
CS/CoE 1541 Exam 2 (Spring 2019) Name: Question 1 (5+5+5=15 points): Show the content of each of the caches shown below after the two memory references 35, 44 Use the notation [tag, M(address),] to describe
More informationIntroduction to memory system :from device to system
Introduction to memory system :from device to system Jianhui Yue Electrical and Computer Engineering University of Maine The Position of DRAM in the Computer 2 The Complexity of Memory 3 Question Assume
More informationCOTS Multicore Processors in Avionics Systems: Challenges and Solutions
COTS Multicore Processors in Avionics Systems: Challenges and Solutions Dionisio de Niz Bjorn Andersson and Lutz Wrage dionisio@sei.cmu.edu, baandersson@sei.cmu.edu, lwrage@sei.cmu.edu Report Documentation
More informationCS2253 COMPUTER ORGANIZATION AND ARCHITECTURE 1 KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY
CS2253 COMPUTER ORGANIZATION AND ARCHITECTURE 1 KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK Sub. Code & Name: CS2253 Computer organization and architecture Year/Sem
More informationPredictable Programming on a Precision Timed Architecture
Predictable Programming on a Precision Timed Architecture Ben Lickly, Isaac Liu, Hiren Patel, Edward Lee, University of California, Berkeley Sungjun Kim, Stephen Edwards, Columbia University, New York
More informationA Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems
A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems Sven Goossens, Jasper Kuijsten, Benny Akesson, Kees Goossens Eindhoven University of Technology {s.l.m.goossens,k.b.akesson,k.g.w.goossens}@tue.nl
More informationData Bus Slicing for Contention-Free Multicore Real-Time Memory Systems
Data Bus Slicing for Contention-Free Multicore Real-Time Memory Systems Javier Jalle,, Eduardo Quiñones, Jaume Abella, Luca Fossati, Marco Zulianello, Francisco J. Cazorla, Barcelona Supercomputing Center
More informationUnit In a time - sharing operating system, when the time slot given to a process is completed, the process goes from the RUNNING state to the
Unit - 5 1. In a time - sharing operating system, when the time slot given to a process is completed, the process goes from the RUNNING state to the (A) BLOCKED state (B) READY state (C) SUSPENDED state
More informationLecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )
Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections 7.1-7.9) 1 Role of I/O Activities external to the CPU are typically orders of magnitude slower Example: while
More informationA Comprehensive Analytical Performance Model of DRAM Caches
A Comprehensive Analytical Performance Model of DRAM Caches Authors: Nagendra Gulur *, Mahesh Mehendale *, and R Govindarajan + Presented by: Sreepathi Pai * Texas Instruments, + Indian Institute of Science
More informationOutline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA
CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationChapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)
Chapter 7-1 Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授 V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor) 臺大電機吳安宇教授 - 計算機結構 1 Outline 7.1 Introduction 7.2 The Basics of Caches
More informationComposable Resource Sharing Based on Latency-Rate Servers
Composable Resource Sharing Based on Latency-Rate Servers Benny Akesson 1, Andreas Hansson 1, Kees Goossens 2,3 1 Eindhoven University of Technology 2 NXP Semiconductors Research 3 Delft University of
More informationBounding SDRAM Interference: Detailed Analysis vs. Latency-Rate Analysis
Bounding SDRAM Interference: Detailed Analysis vs. Latency-Rate Analysis Hardik Shah 1, Alois Knoll 2, and Benny Akesson 3 1 fortiss GmbH, Germany, 2 Technical University Munich, Germany, 3 CISTER-ISEP
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationImpact of Resource Sharing on Performance and Performance Prediction: A Survey
Impact of Resource Sharing on Performance and Performance Prediction: A Survey Andreas Abel, Florian Benz, Johannes Doerfert, Barbara Dörr, Sebastian Hahn, Florian Haupenthal, Michael Jacobs, Amir H. Moin,
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationMEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS
MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing
More informationBlueVisor: A Scalable Real-time Hardware Hypervisor for Many-core Embedded System
BlueVisor: A Scalable eal-time Hardware Hypervisor for Many-core Embedded System Zhe Jiang, Neil C Audsley, Pan Dong eal-time Systems Group Department of Computer Science University of York, United Kingdom
More informationMemory. From Chapter 3 of High Performance Computing. c R. Leduc
Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor
More informationCS698Y: Modern Memory Systems Lecture-16 (DRAM Timing Constraints) Biswabandan Panda
CS698Y: Modern Memory Systems Lecture-16 (DRAM Timing Constraints) Biswabandan Panda biswap@cse.iitk.ac.in https://www.cse.iitk.ac.in/users/biswap/cs698y.html Row decoder Accessing a Row Access Address
More informationEfficient Latency Guarantees for Mixed-criticality Networks-on-Chip
Platzhalter für Bild, Bild auf Titelfolie hinter das Logo einsetzen Efficient Latency Guarantees for Mixed-criticality Networks-on-Chip Sebastian Tobuschat, Rolf Ernst IDA, TU Braunschweig, Germany 18.
More information(b) External fragmentation can happen in a virtual memory paging system.
Alexandria University Faculty of Engineering Electrical Engineering - Communications Spring 2015 Final Exam CS333: Operating Systems Wednesday, June 17, 2015 Allowed Time: 3 Hours Maximum: 75 points Note:
More informationLecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)
Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache 2 Virtually Indexed Caches 24-bit virtual address, 4KB page size 12 bits offset and 12 bits
More informationImproving Real-Time Performance on Multicore Platforms Using MemGuard
Improving Real-Time Performance on Multicore Platforms Using MemGuard Heechul Yun University of Kansas 2335 Irving hill Rd, Lawrence, KS heechul@ittc.ku.edu Abstract In this paper, we present a case-study
More informationLecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Reducing Miss Penalty Summary Five techniques Read priority
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More information