Memory Controllers for Real-Time Embedded Systems. Benny Akesson Czech Technical University in Prague
|
|
- Pamela Goodwin
- 5 years ago
- Views:
Transcription
1 Memory Controllers for Real-Time Embedded Systems Benny Akesson Czech Technical University in Prague
2 Trends in Embedded Systems Embedded systems get increasingly complex Increasingly complex applications (more functionality) Growing number of applications integrated in a device More applications execute concurrently Requires increased system performance without increasing power The resulting complex contemporary platforms are heterogeneous multi-processor systems with distributed memory hierarchy to improve performance/power ratio Resources in the system are shared to reduce cost 2
3 Consumer Electronics Examples Source: International Technology Roadmap for Semiconductors (ITRS) Philips Set-Top Box Architecture (approximately one decade old) 3
4 Mixed Time-Criticality Applications have mixed time-criticality Firm real-time requirements (FRT) E.g. software-defined radio application Failure to satisfy requirement may violate correctness No deadline misses tolerable Soft real-time requirements (SRT) E.g. media decoder application Failure to satisfy requirement reduces quality of output Occassional deadline misses tolerable No real-time requirements (NRT) E.g. graphical user interface No timing requirements, but must be responsive 4
5 Simulation-Based Verification Simulation is typically used to verify SRT and NRT applications System simulated with a large set of inputs Resource sharing results in interference between applications Timing behaviors of applications in use-case inter-dependent All use-cases must be verified instead of all applications Verification must be repeated if applications are added or modified Verification by simulation is a slow process with poor coverage Verification is costly and effort is expected to increase in future! 5
6 Formal Verification Formal verification is alternative to simulation Provides analytical bounds on response time or throughput Considers all application inputs Covers all combinations of concurrently running applications Approach requires models of both applications and hardware Application models are not always available Behavior of dynamic applications is not captured accurately Most hardware is not designed with formal analysis in mind 6
7 Problem with SDRAM This presentation covers the CompSoC SDRAM controller SDRAM memories are particularly challenging resources The execution time of a request in an SDRAM is variable Worst-Case Execution Time (WCET) is pessimistic and guaranteed bandwidth is very low SDRAM bandwidth is scarce and must be efficiently utilized Additional interfaces cannot be added due to cost constraints 7
8 Problem Statement Complex systems have mixed time-criticality Firm, soft, and no real-time requirements in one system We refer to this as mixed real-time (MRT) requirements There are suitable SDRAM controllers for either FRT and SRT/NRT No good solutions for mixes between these types We develop a controller that reduce verification effort by enabling formal verification of FRT requirements enable independent verification by simulation for SRT/NRT while using the SDRAM in an efficient and power conscious manner 8
9 Presentation Outline Introduction Memory Overview Memory Controller Architectures Predator SDRAM Controller Conclusions 9
10 Distance from processor Access speed Memory Hierarchy Processor CPU Registers Registers L1 Cache L1 Cache L2 Cache L2 Cache Off-chip memory Off-chip memory Secondary memory Secondary memory (Hard disk) Size (capacity) 10
11 Memory Hierarchy Module Memory type Access Time Capacity Managed by Registers SRAM 1 cycle ~500B Software/compiler L1 Cache SRAM 1-3 cycles ~64KB Hardware L2 Cache SRAM 5-10 cycles 1-10MB Hardware Off-chip memory DRAM ~100 cycles ~10GB Software/OS Secondary memory Disk drive ~1000 cycles ~1TB Software/OS 11 Credits: J.Leverich, Stanford
12 Comparing SRAM and DRAM SRAM DRAM Fast (10x) Bitlines driven by transistors Large cell size (6-10x) 6 transistors Small cell size 1 capacitor and 1 transistor Bit stored as charge on capacitor Capacitor loses charge over time Period refresh required Hence the name dynamic RAM 12 Credits: J.Leverich, Stanford
13 Summary of Characteristics SRAM is preferable for register files and L1/L2 caches Simpler manufacturing (compatible with logic process) Fast random access (on-chip memory) No refreshes Lower density (6 transistors per cell) Higher cost DRAM is preferable for stand-alone memory chips Much higher capacity Higher density Lower cost Slower random access (off-chip memory) 13 Credits: J.Leverich, Stanford
14 SDRAM Architecture An SDRAM is organized in banks, rows and columns A row buffer in each bank stores a currently active (open) row Interface has a command bus, address bus, and a data bus Buses shared between banks to reduce the number of off-chip pins 14
15 Basic SDRAM Operation Memory map decodes address to bank, row, and column Row is activated and copied into the row buffer of the bank Read bursts and/or write bursts are issued to the active row Programmed burst length (BL) of 4 or 8 words Row is precharged and stored back into the memory array 15
16 Timing Constraints Timing constraints determine schedulability of commands More than 20 constraints on minimum time between commands E.g. activate-to-activate, activate-to-read/write, read/write-toprecharge, read-to-write, write-to-read, etc. Timing constraints for a DDR2-400 device 16
17 Memory Efficiency Execution times of requests are variable and traffic dependent Can vary by an order of magnitude Three reasons for overhead cycles: Activating and precharging (opening and closing) rows Switching direction of data bus from read to write Refreshing the memory Memory efficiency The fraction of clock cycles when requested data is transferred The exchange rate between peak bandwidth and net bandwidth High efficiency is required since bandwidth is a scarce resource 17
18 Bank-level Parallelism Enable DRAM access from different banks in parallel Reduces memory access latency and improves efficiency! 18 Credits: J.Leverich, Stanford
19 Presentation Outline Introduction Memory Overview Memory Controller Architectures Predator SDRAM Controller Conclusions 19
20 A General Memory Controller Architecture A general memory controller architecture consists of two parts The front-end buffers requests and responses per memory client schedules requests for memory access is independent of the memory type The back-end translates scheduled requests into SDRAM command sequence is dependent on the memory type 20
21 Front-End Front-end provides buffering and arbitration Arbiter can schedule requests in many different ways Round-Robin, Time-Division Multiplexing (TDM), priority scheduling Scheduled requests are sent to the back-end 21
22 Memory Map Back-end contains a memory map and a command generator Memory map decodes logical address to physical address Physical address is (bank, row, column) Decoding is done by slicing the bits in the logical address Many possibilities choice affects efficiency and power Logical addr. Memory map Physical addr. 0x10FF00 (2, 510, 128) 22
23 Command Generator Command generator generates and schedules commands Customized for a particular memory generation Programmable to handle different timing constraints 23
24 Controller Types Firm Real-Time Controllers Soft/No Real-Time Controllers Mixed Real-Time Controllers 24
25 Firm Real-Time Controllers FRT requirements must be satisfied even in worst-case scenario Typical goals of firm real-time controllers: Maximize the worst-case net bandwidth Minimize the worst-case response time A trade-off between the two, since they are contradictory 25
26 Locality in FRT Controllers SDRAM performance is highly dependent on locality Request served quickly if it targets an open row No overhead of opening and closing rows FRT controllers are typically unable to exploit locality Locality has to be guaranteed also in worst case Difficult for a single executing application Requires intimate knowledge of memory accesses More or less impossible for multiple concurrent applications Memory accesses mixed by memory arbiter 26
27 Close-Page Policy FRT controllers use close-page policies [Paolieri, Reineke] Precharge banks immediately after each request Assumes that every request targets closed rows Benefits of policy Reduces worst-case overhead of opening/closing rows Increases guaranteed net bandwidth Drawbacks of policy Sacrifices best and average-case performance and power Limits best-case efficiency of DDR3-800 with 64B requests to 80% 27
28 Statically Scheduled Controllers Controllers are classified as statically or dynamically scheduled Depends on SDRAM command scheduling mechanism Statically scheduled controllers [Bayliss] Pre-compute SDRAM schedule at design time Bandwidth and execution time bounded by inspecting schedule Suitable for FRT requirements Restricted to applications with well-specified memory behavior 28
29 Dynamically Scheduled Controllers Dynamically scheduled FRT controllers [Lee, Paolieri, Shao] Schedule commands at run-time based on incoming requests Challenge is to analyze command scheduler Required to bound net bandwidth and execution times Analysis often assumes large fixed-size requests [Paolieri, Reineke] Large enough to exploit maximum bank-level parallelism by interleaving Requires B requests depending on memory device 29
30 Predictable Arbitration These FRT controllers all have bounded execution time Bounding response times requires predictable arbitration Bounds number of interfering requests from other memory clients Different controllers uses different arbiters Statically scheduled controllers uses a static schedule [Paolieri] employs Round-Robin arbitration Targeting homogeneous chip multi-processors [Akesson] supports a variety of predictable arbiters E.g. (Weighted) Round-Robin, Credit-Controlled Static-Priority, and Frame-Based Static-Priority Targets heterogeneous MPSoCs 30
31 Controller Types Firm Real-Time Controllers Soft/No Real-Time Controllers Mixed Real-Time Controllers 31
32 Soft/No Real-Time Controllers Same controllers normally used for SRT/NRT requirements Dynamically scheduled high-performance controllers SRT applications are verified by simulation rather than formally Firm transaction-level guarantees are not necessary Sufficient to satisfy application-level deadlines with high probability May correspond to thousands of memory requests Typical goals of soft/no real-time controllers: Maximize the average net bandwidth Minimize the average response time A trade-off between the two, since they are contradictory 32
33 Locality in SRT Controllers SRT controllers do not have to guarantee locality Requires locality to offset miss penalties with high probability Open-page policies are common in SRT controllers Rows are speculatively kept open to exploit locality Average efficiency is hence typically higher than for FRT controllers Best-case memory efficiency is hence around 98% All requests are either reads or writes to the same row Efficiency losses only due to mandatory refresh activities 33
34 Flexibility SRT controllers are flexible and supports most memory traffic SRT controllers are dynamically scheduled Do not require formal analysis of supported memory traffic Enables supports of e.g. variable request sizes Fine-grained scheduling at level of single SDRAM bursts Reduces wasted data when serving small requests Reduces response times of sensitive clients Low worst-case memory efficiency Cannot guarantee locality or bank-level parallelism Worst-case efficiency about 16% for DDR3-800 with BL=8 words 34
35 Improving Memory Efficiency Memory efficiency is optimized using sophisticated mechanisms Preference for requests that target open rows Reduces overhead of opening and closing rows Increases response times for clients targeting closed rows Read/write grouping Reduces read/write switching overhead Increases response times for requests in wrong direction 35
36 Reducing Response Times Preference for reads over writes [Shao] Reads are often blocking while writes are posted Reduces stall cycles on processor Preemption of low-priority requests in service [Lee] Reduces response times of high-priority clients Increases response times of low-priority clients Reduces memory efficiency due to preemption overhead Interactions between mechanisms are complex Difficult to derive useful bounds on bandwidth and response times May even be difficult to guarantee the default 16% net bandwidth 36
37 Controller Types Firm Real-Time Controllers Soft/No Real-Time Controllers Mixed Real-Time Controllers 37
38 Mixed Real-Time Controllers MRT controllers must efficiently support FRT, SRT and NRT Current FRT controllers treat SRT/NRT clients like FRT clients Expensive both in terms of bandwidth and power Current SRT/NRT controllers treat FRT like SRT/NRT clients Guarantees are either not formally proven or very pessimistic Worst-case may be maximum observed case plus a safety margin Deadlines may be missed in corner cases MRT controllers are likely to evolve from current controllers Either from FRT controllers or SRT/NRT controllers 38
39 Presentation Outline Introduction Memory Overview Memory Controller Architectures Predator SDRAM Controller Conclusions 39
40 Positioning Predator is a mixed real-time hybrid controller Evolved from a firm real-time design Combines static and dynamic scheduling Statically computed memory patterns (sub-schedules) that are dynamically scheduled Two key ideas to predictability 1. Predictable WCET by using memory patterns 2. Predictable WCRT by using predictable arbitration 40
41 Predictable SDRAM Predictability through precomputed memory patterns Patterns are precomputed sequences of SDRAM commands There are six types of memory patterns Read, write, r/w switch, w/r switch, refresh, and power down Dynamic request to pattern mapping: Read request read pattern (potentially first w/r switch) Write request write pattern (potentially first r/w switch) Refresh pattern issued when required Power down when idle 41
42 Memory Patterns Patterns enable scheduling at higher level than commands Less state and fewer constraints, making them easier to analyze Bounding memory efficiency Worst sequence of patterns is known (mapping & pattern lengths) Data transferred by patterns is known (by definition) Sequence of patterns for a DDR2-400 memory 42
43 Predictable Front-End Arbitration Controller works with any predictable front-end arbiter Example: Round-Robin, TDM, or our own CCSP arbiter Bounding response times Number of interfering requests is known (arbiter analysis) Worst-case request to pattern mapping is known (mapping) Pattern to cycle mapping is known (pattern lengths) Design provides bounds on net bandwidth and response times For any combination of supported memory and arbiter 43
44 Controller Architecture Front-end components Atomizer splits large requests into smaller pieces Delay Block contains request and response buffers Resource Bus schedules client according to arbiter policy Configuration Bus makes components run-time programmable Back-end Different components for SRAM and DRAM Real-Time Memory Controller 44
45 Pattern Structure Structure of basic access patterns (read and write patterns) At least one activate and precharge command per bank Access patterns of same type must be independent Banks are precharged immediately after access (close-page policy) Example read pattern for a DDR2-400 memory 45
46 Access Pattern Parameters There are three key access pattern parameters Determined at design time Number of banks to interleave (BI) request over Determines bank parallelism Memory efficiency and power consumption increase with BI Number of bursts to each bank, called burst count (BC) Number of words in a burst, the burst length (BL) BC & BL determine amount of data transferred per bank Increasing BC & BL amortizes overhead and increases efficiency 46
47 Access Granularity Parameters determine access granularity (AG) of the memory Amount of data accessed by a pattern AG = BI x BC x BL x word size Execution time of a request is proportional to AG Large requests increase WCET and WCRT Parameters are trade-off between efficiency, WCRT and power Automatically configured based on application requirements Read pattern for 16-bit DDR2-400 BI=4, BC=1, BL=8 AG=64 bytes 47
48 FRT Bandwidth / Power Trade-Off Bandwidth/ power trade-off for FRT applications For different values of BI and BC (BL=8) Labels denote BI 48
49 Conservative Open-Page Policy Controller uses a conservative open-page policy Page kept open as long as possible without sacrificing WCET Enables limited locality to be exploited Reduces execution time and power consumption Policy introduces three new types of read and write patterns With / without activates and precharges, respectively Pattern is chosen when it is known if following request hits or misses Worst-case analysis is only based on miss patterns Request arrivals: Request 1 Request 2 Request 3 Request 4 Request 5 (requests of the same color use the same page) Close-page policy ACT Read 1 PRE ACT Read 2 PRE ACT Read 3 PRE ACT Read 4 PRE ACT Read 5 PRE Open-page policy ACT Read 1 Read 2 Read 3 PRE ACT Read 4 PRE ACT Read 5 Conservative open-page policy ACT Read 1 Read 2 PRE ACT Read 3 PRE 49 ACT Read 4 PRE ACT Read 5 PRE
50 SRT Execution Time / Power Trade-Off SRT application is verified by SystemC system simulation Traffic generator running SRT H.263 trace with 32 B cache lines 2 FRT applications modeled with synthetic traffic (64, 128 B) Results normalized to best configuration Conservative OP policy reduces power and execution time Legend: Labels: (BI, BC) Colored points = close-page policy Thin symbol = violates power budget 50
51 Presentation Outline Introduction Memory Overview Memory Controller Architectures Predator SDRAM Controller Conclusions 51
52 Conclusions Complex SoCs have mixed real-time (MRT) requirements Mix of firm (FRT), soft (SRT), and no real-time (NRT) requirements There are suitable controllers for FRT and SRT/NRT, but not MRT Firm real-time controllers Maximize bandwidth bound and minimize response time bound Static, dynamic, or hybrid SDRAM command scheduling Close-page policies to reduce miss penalty Predictable arbitration Soft/no real-time controllers Maximize average bandwidth and minimize average response time Dynamically scheduled with sophisticated mechanisms Open-page policies to exploit locality 52
53 Conclusions We propose a mixed real-time SDRAM controller Predictability enables formal verification of FRT requirements Achieved by memory patterns and predictable arbitration Configurable trade-off between efficiency, response time, and power Uses a conservative open-page policy to exploit locality for SRT/NRT 53
54 Acknowledgements This work would not have been possible without the memory team Kees Goossens (TU Eindhoven) Karthik Chandrasekar (TU Delft) Sven Goossens (TU Eindhoven) Manil Dev Gomony (TU Eindhoven) Yonghui Li (TU Eindhoven) Anna Minaeva (CTU Prague) Tim Kouters (TU Eindhoven) Williston Hayes jr. (TU Eindhoven) Jasper Kuijsten (TU Eindhoven) Getachew Teshome (TU Delft) Eelke Strooisma (TU Delft) Markus Ringhofer (TU Graz) Anand Khot (TU Delft) Sahar Foroutan (TIMA Laboratories) Gervin Thomas (TU Berlin) Winston Siauw (TU Delft) This research is supported by EU grants T-CREST, FlexTiles, Cobra, & NL grant NEST. CA104 COBRA 54
55 References [Akesson] B. Akesson and K. Goossens. Architectures and Modeling of Predictable Memory Controllers for Improved System Integration. In Proc. DATE, 2011 [Bayliss] S. Bayliss and George A. Constantinides. "Methodology for designing statically scheduled application-specific SDRAM controllers using constrained local search. In Proc. FPT, 2009 [Lee] K. Lee, T. Lin, and C. Jen. An efficient quality-aware memory controller for multimedia platform SoC. In IEEE Trans. on Circuits and Systems for Video Technology,15(5), 2005 [Paolieri] M. Paolieri, E. Quinones, F. Cazorla, and M. Valero. An Analyzable Memory Controller for Hard Real-Time CMPs. In: Embedded Systems Letters, IEEE, 1(4), [Reineke] J. Reineke, Isaac Liu, Hiren Patel, Sungjun Kim, and Edward Lee. PRET DRAM Controller: Bank Privatization for Predictability and Temporal Isolation. In: Proc. CODES+ISSS, 2011 [Shao] J. Shao and B. Davis. A burst scheduling access reordering mechanism. In Proc. HPCA,
56 Questions Thank you for your attention! Any questions? For the full story, please consult our book Memory Controllers for Real-Time Embedded Systems, available from Springer. 56
57
An introduction to SDRAM and memory controllers. 5kk73
An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part
More informationEfficient real-time SDRAM performance
1 Efficient real-time SDRAM performance Kees Goossens with Benny Akesson, Sven Goossens, Karthik Chandrasekar, Manil Dev Gomony, Tim Kouters, and others Kees Goossens
More informationTrends in Embedded System Design
Trends in Embedded System Design MPSoC design gets increasingly complex Moore s law enables increased component integration Digital convergence creates a market for highly integrated devices The resulting
More informationWorst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni
orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26 Multi-equestor Systems CPU CPU CPU
More informationEmbedded Systems. Series Editors
Embedded Systems Series Editors Nikil D. Dutt, Department of Computer Science, Zot Code 3435, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697-3435, USA
More informationtrend: embedded systems Composable Timing and Energy in CompSOC trend: multiple applications on one device problem: design time 3 composability
Eindhoven University of Technology This research is supported by EU grants T-CTEST, Cobra and NL grant NEST. Parts of the platform were developed in COMCAS, Scalopes, TSA, NEVA,
More informationA Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems
A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems Sven Goossens, Jasper Kuijsten, Benny Akesson, Kees Goossens Eindhoven University of Technology {s.l.m.goossens,k.b.akesson,k.g.w.goossens}@tue.nl
More information15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling
More informationExploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization
Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization Karthik Chandrasekar, Sven Goossens 2, Christian Weis 3, Martijn Koedam 2, Benny Akesson 4, Norbert Wehn 3, and Kees
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationIntroduction to memory system :from device to system
Introduction to memory system :from device to system Jianhui Yue Electrical and Computer Engineering University of Maine The Position of DRAM in the Computer 2 The Complexity of Memory 3 Question Assume
More informationLecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )
Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case
More informationWorst Case Analysis of DRAM Latency in Hard Real Time Systems
Worst Case Analysis of DRAM Latency in Hard Real Time Systems by Zheng Pei Wu A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationEEM 486: Computer Architecture. Lecture 9. Memory
EEM 486: Computer Architecture Lecture 9 Memory The Big Picture Designing a Multiple Clock Cycle Datapath Processor Control Memory Input Datapath Output The following slides belong to Prof. Onur Mutlu
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationLecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)
Lecture 15: DRAM Main Memory Systems Today: DRAM basics and innovations (Section 2.3) 1 Memory Architecture Processor Memory Controller Address/Cmd Bank Row Buffer DIMM Data DIMM: a PCB with DRAM chips
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationArchitecture and Optimal Configuration of a Real-Time Multi-Channel Memory Controller
Architecture and Optimal Configuration of a Real-Time Multi-Channel Controller Manil Dev Gomony, Benny Akesson, and Kees Goossens Eindhoven University of Technology, The Netherlands CISTER-ISEP Research
More information15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses
More informationVariability Windows for Predictable DDR Controllers, A Technical Report
Variability Windows for Predictable DDR Controllers, A Technical Report MOHAMED HASSAN 1 INTRODUCTION In this technical report, we detail the derivation of the variability window for the eight predictable
More informationEmbedded Systems. Series Editors
Embedded Systems Series Editors Nikil D. Dutt, Department of Computer Science, Zot Code 3435, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697-3435, USA
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationCOMPUTER ARCHITECTURES
COMPUTER ARCHITECTURES Random Access Memory Technologies Gábor Horváth BUTE Department of Networked Systems and Services ghorvath@hit.bme.hu Budapest, 2019. 02. 24. Department of Networked Systems and
More informationA Comparative Study of Predictable DRAM Controllers
0:1 0 A Comparative Study of Predictable DRAM Controllers Real-time embedded systems require hard guarantees on task Worst-Case Execution Time (WCET). For this reason, architectural components employed
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More information18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013
18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,
More informationCaches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Caches Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationInternal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.
Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Internal Memory http://www.yildiz.edu.tr/~naydin 1 2 Outline Semiconductor main memory Random Access Memory
More informationA Comparative Study of Predictable DRAM Controllers
A A Comparative Study of Predictable DRAM Controllers Danlu Guo,Mohamed Hassan,Rodolfo Pellizzoni and Hiren Patel, {dlguo,mohamed.hassan,rpellizz,hiren.patel}@uwaterloo.ca, University of Waterloo Recently,
More information,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics
,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also
More informationCPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?
cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example
More informationComputer Organization. 8th Edition. Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 8th Edition Chapter 5 Internal Memory Semiconductor Memory Types Memory Type Category Erasure Write Mechanism Volatility Random-access memory (RAM)
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance
More informationMainstream Computer System Components
Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationCENG3420 Lecture 08: Memory Organization
CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory
More informationMain Memory Systems. Department of Electrical Engineering Stanford University Lecture 5-1
Lecture 5 Main Memory Systems Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 5-1 Announcements If you don t have a group of 3, contact us ASAP HW-1 is
More informationLarge and Fast: Exploiting Memory Hierarchy
CSE 431: Introduction to Operating Systems Large and Fast: Exploiting Memory Hierarchy Gojko Babić 10/5/018 Memory Hierarchy A computer system contains a hierarchy of storage devices with different costs,
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third
More informationChapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.
Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.5) Memory Technologies Dynamic Random Access Memory (DRAM) Optimized
More informationEE414 Embedded Systems Ch 5. Memory Part 2/2
EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage
More informationECE 485/585 Microprocessor System Design
Microprocessor System Design Lecture 5: Zeshan Chishti DRAM Basics DRAM Evolution SDRAM-based Memory Systems Electrical and Computer Engineering Dept. Maseeh College of Engineering and Computer Science
More informationAddressing the Memory Wall
Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More informationMainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation
Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer
More informationTopic 21: Memory Technology
Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,
More informationTopic 21: Memory Technology
Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,
More informationWilliam Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 6th Edition Chapter 5 Internal Memory Semiconductor Memory Types Semiconductor Memory RAM Misnamed as all semiconductor memory is random access
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationChapter 8 Memory Basics
Logic and Computer Design Fundamentals Chapter 8 Memory Basics Charles Kime & Thomas Kaminski 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Overview Memory definitions Random Access
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationDRAM Main Memory. Dual Inline Memory Module (DIMM)
DRAM Main Memory Dual Inline Memory Module (DIMM) Memory Technology Main memory serves as input and output to I/O interfaces and the processor. DRAMs for main memory, SRAM for caches Metrics: Latency,
More informationMain Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be
More informationCENG4480 Lecture 09: Memory 1
CENG4480 Lecture 09: Memory 1 Bei Yu byu@cse.cuhk.edu.hk (Latest update: November 8, 2017) Fall 2017 1 / 37 Overview Introduction Memory Principle Random Access Memory (RAM) Non-Volatile Memory Conclusion
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationEECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun
EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,
More informationBasics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS
Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS
More informationMemory Hierarchy and Caches
Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline
More informationCS311 Lecture 21: SRAM/DRAM/FLASH
S 14 L21-1 2014 CS311 Lecture 21: SRAM/DRAM/FLASH DARM part based on ISCA 2002 tutorial DRAM: Architectures, Interfaces, and Systems by Bruce Jacob and David Wang Jangwoo Kim (POSTECH) Thomas Wenisch (University
More informationA Comparative Study of Predictable DRAM Controllers
1 A Comparative Study of Predictable DRAM Controllers DALU GUO, MOHAMED HASSA, RODOLFO PELLIZZOI, and HIRE PATEL, University of Waterloo, CADA Recently, the research community has introduced several predictable
More informationLecture 11 Cache. Peng Liu.
Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative
More informationManaging Memory for Timing Predictability. Rodolfo Pellizzoni
Managing Memory for Timing Predictability Rodolfo Pellizzoni Thanks This work would not have been possible without the following students and collaborators Zheng Pei Wu*, Yogen Krish Heechul Yun* Renato
More informationLecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)
Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache 2 Virtually Indexed Caches 24-bit virtual address, 4KB page size 12 bits offset and 12 bits
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationComposable Resource Sharing Based on Latency-Rate Servers
Composable Resource Sharing Based on Latency-Rate Servers Benny Akesson 1, Andreas Hansson 1, Kees Goossens 2,3 1 Eindhoven University of Technology 2 NXP Semiconductors Research 3 Delft University of
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationUnleashing the Power of Embedded DRAM
Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (III)
COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory
More informationMemories: Memory Technology
Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationBounding SDRAM Interference: Detailed Analysis vs. Latency-Rate Analysis
Bounding SDRAM Interference: Detailed Analysis vs. Latency-Rate Analysis Hardik Shah 1, Alois Knoll 2, and Benny Akesson 3 1 fortiss GmbH, Germany, 2 Technical University Munich, Germany, 3 CISTER-ISEP
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationECE 341. Lecture # 16
ECE 341 Lecture # 16 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 24, 2014 Portland State University Lecture Topics The Memory System Basic Concepts Semiconductor RAM Memories Organization of
More informationLecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)
Lecture: DRAM Main Memory Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3) 1 TLB and Cache Is the cache indexed with virtual or physical address? To index with a physical address, we
More informationComputer Systems Laboratory Sungkyunkwan University
DRAMs Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Main Memory & Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width
More informationLecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)
Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1 More Cache Basics caches are split as instruction and data; L2 and L3 are unified The /L2 hierarchy can be inclusive,
More informationChapter 5 Internal Memory
Chapter 5 Internal Memory Memory Type Category Erasure Write Mechanism Volatility Random-access memory (RAM) Read-write memory Electrically, byte-level Electrically Volatile Read-only memory (ROM) Read-only
More informationComputer Architecture Lecture 24: Memory Scheduling
18-447 Computer Architecture Lecture 24: Memory Scheduling Prof. Onur Mutlu Presented by Justin Meza Carnegie Mellon University Spring 2014, 3/31/2014 Last Two Lectures Main Memory Organization and DRAM
More informationTiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu Carnegie Mellon University HPCA - 2013 Executive
More informationOrganization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition
William Stallings Computer Organization and Architecture 6th Edition Chapter 5 Internal Memory 5.1 Semiconductor Main Memory 5.2 Error Correction 5.3 Advanced DRAM Organization 5.1 Semiconductor Main Memory
More informationRandom Access Memory (RAM)
Random Access Memory (RAM) EED2003 Digital Design Dr. Ahmet ÖZKURT Dr. Hakkı YALAZAN 1 Overview Memory is a collection of storage cells with associated input and output circuitry Possible to read and write
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More information