Managing Memory for Timing Predictability. Rodolfo Pellizzoni
|
|
- Doris Simpson
- 6 years ago
- Views:
Transcription
1 Managing Memory for Timing Predictability Rodolfo Pellizzoni
2 Thanks This work would not have been possible without the following students and collaborators Zheng Pei Wu*, Yogen Krish Heechul Yun* Renato Mancuso, Marco Caccamo, Lui Sha * additional thanks for contributing slides! 2 / 67
3 Outline 1. Introduction and Problem Statement 2. DRAM Background 3. COTS Solution 1. Palloc 2. Memguard 4. Custom Solution (ROC) 1. Call for Action and Conclusions 3 / 67
4 Outline 1. Introduction and Problem Statement 2. DRAM Background 3. COTS Solution 1. Palloc 2. Memguard 4. Custom Solution (ROC) 1. Call for Action and Conclusions 4 / 67
5 Multicore Server Desktop Mobile RT/Embedded 5 / 67
6 Challenges: Shared Memory Hierarchy T1 T2 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 CPU Core 1 Core 2 Core 3 Core 4 Memory Hierarchy Memory Hierarchy Unicore Multicore Performance Impact 6 / 67
7 DO-178C Certification Certification Authorities Software Team (CAST) Position Paper 32, May _software/cast/cast_papers/media/cast-32.pdf All effects due to shared resources in Multi-core Processors must be: Identified Analyzed Mitigated 7 / 67
8 CAST Identifies: DO-178C Certification Shared Cache (L3) Main Memory (DDR2/3) Interconnection 8 but other resources might create interference (more on this later). 8 / 67
9 How Bad Can it Be? 7,0 Slowdown ratio 6,0 5,0 4,0 3,0 2,0 1,0 0,0 Setup: Core0: X-axis, Core1-3: 470.lbm x 3 (interference) Slowdown ratio = Solo IPC / CorunIPC (*) Measured on Intel Xeon 3530 (4 cores), 8GB 1ch DDR3 DRAM 9 9 / 67
10 This Talk Core1 Core2 Core3 Core4 Memory Controller DRAM Focus on main memory implemented as Double Date Rate Dynamic Synchronous RAM (DDR SDRAM DRAM for short). We show how to mitigate and bound the effects of memory contention. 10 / 67
11 Single Core Equivalence (SCE) End goal: Single Core Equivalent (SCE) execution [6] Prove that each core in a M-core platform behaves as a (possibly slower) single-processor system Certify ARINC-653 partition on the (slower) singleprocessor system Hardware Selection Resource Partitioning Software Partition to Core Allocat. Schedulability Analysis I/O Channel Partitioning 11 / 67
12 Outline 1. Introduction and Problem Statement 2. DRAM Background 3. COTS Solution 1. Palloc 2. Memguard 4. Custom Solution (ROC) 1. Call for Action and Conclusions 12 / 67
13 DRAM Background Storage Array contains Data Can only Read/Write to Row Buffer 13 / 67
14 DRAM Background Front End generates the needed commands, Back End issues commands on command bus 14 / 67
15 DRAM Background READ Targeting Data in this Row Row Buffer contain data from a different row 15 / 67
16 DRAM Background READ P, A, R 16 / 67
17 DRAM Background P, A, R PRE ACT ACT: Load the data from array Pre-Charge: into buffer store the data back into array P A Pre-charge command issued on command bus Timing Constraint 17 / 67
18 DRAM Background P, A, R CAS (R/W): reads/writes using buffer READ P A R Data 18 / 67
19 DRAM Background READ R Targeting Data Already in Row Buffer Only Need Read Command Can be issued immediately P A R Data R Data 19 / 67
20 DRAM Background READ Latency of a close request is much longer than the R latency of an open request Latency of a close request Latency of a open request P A R Data R Data 20 / 67
21 Bank Parallelism Bank 1 Bank 2 Bank 3 Bank 4 Accessing data in multiple banks A R Data Multiple data can be pipelined A R Data A R Data A R Data 21 / 67
22 Bank Parallelism Requests should be spread over multiple banks Bank 1 Bank 2 Bank 3 Bank 4 Accessing data in same bank, different rows A R Data P A R Data No pipelining much larger delay 22 / 67
23 Write-to-Read Penalty Transactions of the same type can be pipelined R Data W Data Huge Latency Penalty! R Data W Data but a read after a write cannot. W Data Write-to-Read (WtR) timing constraint R Data 23 / 67
24 Summary To extract maximum bandwidth, we need: Successive requests to the same bank should hit the same row (spatial locality) Successive requests to the same bank should be of the same type (reads or writes) Concurrent requests should hit different banks Problem: the way banks are accessed depends on: The memory mapping (how are physical memory locations mapped to banks?) The memory controller schedule 24 / 67
25 Memory Arbitration COTS memory controllers use First-Ready First-Come- First-Serve (FR-FCFS) arbitration. Each bank has a separate request queue Requests targeting an open row are prioritized over requests targeting a close row Reads are prioritized over writes FCFS arbitration among banks Maximum throughput but very bad worse case delay Front R1:W% W Data W Data R2:R% t W L t B U S t W T R t W L t B U S R t W T R R3:W% W Data t = 0 t = 0 t W L t B U S t W T R t = 4 t = 16 t = 20 t = / 67
26 Our Solutions COTS Solution: PALLOC + Memguard OS-level solution No hardware modifications Solves bank parallelism, bandwidth allocation Custom Solution: ROC Controller redesign As before + write-read optimization Supports differentiated service (predictable latency vs optimized throughput) 26 / 67
27 Outline 1. Introduction and Problem Statement 2. DRAM Background 3. COTS Solution 1. Palloc 2. Memguard 4. Custom Solution (ROC) 1. Call for Action and Conclusions 27 / 67
28 Problem SMP OS Core1 Core2 Core3 Core4 L3 OS does NOTknow DRAM banks OS memory pages are spread all over multiple banks Memory Controller (MC)???? DRAM DIMM Bank 1 Bank 2 Bank 3 Bank 4 Unpredictable memory performance
29 Best-case Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Fast Peak = 10.6 GB/s Bank 1 Bank 2 Bank 3 Bank 4 DDR3 1333Mhz Out-of-order processors
30 Best-case Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Fast Peak = 10.6 GB/s Bank 1 Bank 2 Bank 3 Bank 4 DDR3 1333Mhz
31 Most-cases Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Mess Bank 1 Bank 2 Bank 3 Bank 4 Performance =??
32 Worst-case Core 1 Core 2 Core 3 Core 4 Request order Bank 1 L3 Memory Controller (MC) Bank 2 Bank 3 DRAM DIMM Bank 4 time Row 1 Row 2 Row 3 Row 4 Row 1 Row 2 1 bank & always row misses ~1/10 peak b/w
33 PALLOC SMP OS Core1 Core2 Core3 Core4 CPC Memory Controller (MC) OS is aware of DRAM mapping Each pagecan be allocated to a desired DRAM bank DRAM DIMM Bank 1 Bank 2 Bank 3 Bank 4 Flexible allocation policy
34 PALLOC Core1 Core2 Core3 Core4 Private banking L3 Memory Controller (MC) Allocatepages on certain exclusively assigned banks DRAM DIMM Bank 1 Bank 2 Bank 3 Bank 4 Eliminate Inter-core bank conflicts
35 PALLOC Implementation Modified Linux kernel s buddy allocator DRAM bank-aware page frame allocation at each page fault Example: per-core private banking (PB) # cd /sys/fs/cgroup # mkdir core0 core1 core2 core3 create 4 cgroup partitions # echo 0-3 > core0/palloc.dram_bank assign bank 0 ~ 3 for the core0 partition. # echo 4-7 > core1/palloc.dram_bank # echo 8-11 > core2/palloc.dram_bank # echo > core3/palloc.dram_bank 35
36 Performance Impact on Unicore 1,20 4banks 8banks 16banks Buddy Normalized IPC 1,00 0,80 0,60 0,40 0,20 0,00 #of bank vs. performance on a single core Finding: bank partitioning negatively affect performance due to reduced MLP, but not significant for most benchmarks 36
37 Performance Isolation on 4 Cores 7,00 buddy PB PB+PC Slowdown ratio 6,00 5,00 4,00 3,00 2,00 1,00 0,00 Setup: Core0: X-axis, Core1-3: 470.lbm x 3 (interference) PB: DRAM bank partitioning only; PB+PC: DRAM bank and Cache partitioning Finding: bank (and cache) partitioning improves isolation, but far from ideal 37
38 Bus Bandwidth Management Why PB+PC not ideal? Cores still contend for access to memory data bus Bandwidth allocation can be highly unfair -> unpredictable behavior FR FCFS scheduling gives higher priority to earlier requests An out-of-order core performing multiple concurrent requests can receive an inordinate amount of bandwidth Solution: bandwidth reservation Similar to a real-time server, but guarantees memory bandwidth rather than CPU time 38
39 MemGuard: Managing Memory MemGuard Bandwidth Reclaim Manager Operating System BW BW Regulator 0.9GB/s BW BW Regulator 0.1GB/s Regulator 0.1GB/s Regulator 0.1GB/s PMC PMC PMC PMC Core1 Core2 Core3 Core4 Memory Controller DRAM DIMM Multicore Processor Reservation: minimum memory bandwidth guarantee Reclaiming/sharing: additional memory bandwidth 39
40 Reservation Idea Reserve per-core memory bandwidth via the OS scheduler Use h/w PMC to monitor memory request rate Budget Core activity 2 1 Suspend the RT idle task Guarantee: as long as sum budgets <= guaranteed memory bandwidth, queuing delay in the memory controller is small (more details in [3]) 0 1ms Schedule a RT idle task 2ms computation memory fetch 40
41 Reclaiming/Sharing Idea Redistribute unused bandwidth to demanding cores Divide core into protected and unprotected cores Protected cores have guaranteed bandwidth; do not require additional bandwidth Unprotected cores use budget prediction Unneeded budget is donated to the system Reclaim on-demand basis 41
42 Reclaiming/Sharing 4 units of budgets donated at time 10, 2 at time 20 Global budget reclaimed when core budget is depleted 42
43 Effects of MemGuard T1's exec. Time (ms) Perf. guarantee No deadline violation 60% solo corun solo corun solo corun T1 Core1 L2 Core2 L2 DRAM Original 94% T2 T1 900M/s Core1 L2 DRAM MemGuard (Reserve only) 14% T2 T1 69% T2 200M/s Core2 L2 900M/s Core1 L2 DRAM 200M/s Core2 L2 MemGuard (Reclaim + Share) 43
44 Outline 1. Introduction and Problem Statement 2. DRAM Background 3. COTS Solution 1. Palloc 2. Memguard 4. Custom Solution (ROC) 1. Call for Action and Conclusions 44 / 67
45 ROC Key idea: design new architectural optimizations targeted at reducing worst case, not average case latency ROC: Rank-switching, Open-row Controller We discuss: Design Latency Analysis Implementation Results
46 Multi-Rank Device Memory device with up to 4 ranks, each comprising separate bank set
47 Multi-Rank Device Long Write-to-Read constraint replaced by much shorter Rank-to-Rank one W Data RtR R Data
48 Example: Write-Read-Write-Read 1 Rank: Latency 52 cycles Challenge: Design a command scheduling algorithm to minimize worst case latency by forcing alternation among ranks 2Ranks: Latency 35 cycles 4 Ranks: Latency 29 cycles
49 Worst Case Analysis Total # of Requestors and Ranks Memory Device Parameters Worst Case Single Request Latency Analysis Latency for different types of request Open Read Close Read Open Write Close Write Task Under Analysis # of open reads # of close reads # of open writes # of close writes Cumulative Worst Case Execution Time WCET
50 Back End Model Front End adds constant delay focus on Back End Three levels arbitration Rank 1 Command Queues Level 3 Arbitration P/A Arbiter Level 2 Arbitration Controller Back End CAS Arbiter P/A Arbiter CAS Arbiter Level 1 Command Arbitration To Devic e Rank R
51 Back End Model L1: CAS (R/W) commands have higher priority than PRE/ACT commands priority to data bus contention Rank 1 Command Queues Level 3 Arbitration P/A Arbiter Level 2 Arbitration Controller Back End CAS Arbiter P/A Arbiter CAS Arbiter Level 1 Command Arbitration To Devic e Rank R
52 Back End Model L2 alternates among ranks L3 alternates among requestors within a rank Rank 1 Command Queues Level 3 Arbitration P/A Arbiter Level 2 Arbitration Controller Back End CAS Arbiter P/A Arbiter CAS Arbiter Level 1 Command Arbitration To Devic e Rank R
53 Back End Model Let: : number of ranks : number of requestors for rank Then given the alternation in L2/L3, the latency of each command is a function of Isolation property: the latency of a requestor does not depend on the # of requestors or scheduling policy used in other ranks We can dedicate some ranks to hard real-time requestors, and others to soft real-time requestors Optimize hard requestors for latency, soft requestors for bandwidth
54 CAS Rank Switching Rule Key idea: Schedule ranks in Round-Robin order if transfers on the data bus are separated We by prove: at most Rank-to-Rank constraint When this is not possible (due to Write-to-Read constraint), reorder ranks W Rank 2 can read twice while Rank 1 is waiting for WtR 1. Reordering does not increase worstcase latency if R = 4 2. If the system is backlogged, no more Data than RtRbetween data transfers => WtR Data does not degrade memory RtR throughput R R Data R Rank 1 Data RtR Rank 2
55 Results Comparison against Analyzable Memory Controller (AMC) [1] Predictable memory controller with WCET guarantees for hard real-time tasks Fair arbitration (Round Robin) similar to our approach Synthetic Benchmarks Used to show how worst case latency bound varies as parameters are changed CHStone Benchmarks Memory traces are obtained from gem5 simulator and used as inputs to both analysis and simulators Core under analysis is in-order Interfering requestors are out-of-order running lbm
56 Results DDR3-1333H memory device 64 and 32 bits data bus width Simulations Python simulators base controller [5] and AMC [1] ROC Implementation Three stages pipelined implementation in Verilog RTL Synthesizes to Xilinx FPGA at 340Mhz (original soft memory controller: 400Mhz) ASIC implementation could likely be significantly faster Code available at
57 Results Synthetic Benchmarks, 20% write Synthetic: 8 Requestors-64bits 400 Avg Worst Case Latency (ns) % 20% 40% 60% 80% 100% Row Hit % This improvement due to ROC-4 private bank has parallelism > 33% lower latency than [5] for all cases..this due to open-row policy..this due to rank-swithing AMC Analysis [5] ROC-2Rank ROC-4Rank 20/24
58 Results CHStone, 16 requestors 64bits data bus ROC-4 has between 5 and 35% lower WCET than [5] T-bars: worst case bound Solid bars: simulation 2.00 AMC Analysis[5] ROC-2 Rank ROC-4 Rank Normalized Execution Time /26 adpcm aes bf gsm jpeg mips motion sha dfadd dfdiv dfmul dfsin
59 Summary Bank Contention Bandwidth Partitioning Open Row Write-to- Read Opt Differentiated Service ROC Other pred. mem. cont. PALLOC MemGuard PALLOC + MemGuard 59 / 67
60 Outline 1. Introduction and Problem Statement 2. DRAM Background 3. COTS Solution 1. Palloc 2. Memguard 4. Custom Solution (ROC) 1. Call for Action and Conclusions 60 / 67
61 Other Sources Of Interference Help is needed in mitigating and analyzing other sources of interference Cache Allocation interference: multiple tasks/cores contending for cache space. Can be solved with cache partitioning/locking. Timing interference: simultaneous accesses by multiple cores. Generally low in modern architectures. Interconnects How does this work? We rarely have detailed info What about cache coherency? 61 / 67
62 Other Sources Of Interference Other sources of interference might not be well understood Example: Miss Status Holding Register (MSHR) in Intel platforms Caches are non-blocking allow multiple concurrent requests by same/different cores Pending requests are enqueued in global MSHR register A core can be stalled due to contention for MSHR access Based on preliminary measures, we believe this can account for a 2x delay factor 62 / 67
63 Hardware Profiles The described OS techniques rely on basic hw support How do we know if a platform can support them? I strongly believe the community should join to define a set of SCE hardware profiles Techniques can then be specified based on required profile Profile Mem-Predict A hardware platform satisfies this profile if it supports: DRAM controller with per-bank queues Virtual memory Per-core performance counters with interrupt functionality counting number of memory accesses This is sufficient for PALLOC + MemGuard 63 / 67
64 Academic Involvement in Multi-Core Certification The community needs to come together to: Identify and review the inter-core interference channels Share and review solution approaches Develop metrics, analysis and testing procedures for the quality of solution approaches Foster the development of a shared process for the certification of multicore avionics Basic timeline agreed upon with FAA and IEEE TCRTS leaderships. Academic workshops to be followed by workshops with FAA and industry First workshop at next CPSWeek in April 64 / 67
65 Conclusions DO-178C certification on multi-core: a huge challenge Main memory is a major interference channel Intra-bank interference Inter-bank interference We discussed solution to mitigate the effect of the memory channel COTS solution relies on available hardware support, mitigates interference but at the cost of performance Custom hardware requires controller re-design A lot more remains to be done consider getting involved! 65 / 67
66 Bibliography [1] M. Paolieri, E. Quinones, F. Cazorla, and M. Valero, An Analyzable Memory Controller for Hard Real-Time CMPs, Embedded Systems Letters, IEEE, vol. 1, no. 4, pp , [2] Y. Krish, Z. P. Wu and R. Pellizzoni, ROC: A Rank-switching, Open-row DRAM Controller for Time-predictable Systems, IEEE ECRTS, July 2014 [3] H. Yun, G. Yao, R. Pellizzoni, M. Caccamoand L. Sha, MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isolation in Multi-core Platforms, 19th IEEE RTAS, April 2013 [4] H. Yun, R. Mancuso, Z. P. Wu, R. Pellizzoni, PALLOC: DRAM Bank-Aware Memory Allocator for Performance Isolation on Multicore Platforms, 20th IEEE RTAS, April 2014 [5] Z. P. Wu, Y. Krish, and R. Pellizzoni, Worst Case Analysis of DRAM Latency in Multi-Requestor Systems, 34 th IEEE RTSS, December 2013 [6] LuiShaet al.: Single Core Equivalent Virtual Machines for Hard Real-Time Computing on Multicore Processors. Technical Report, 66 / 67
67 Questions? 67 / 67
EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun
EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,
More informationImproving Real-Time Performance on Multicore Platforms Using MemGuard
Improving Real-Time Performance on Multicore Platforms Using MemGuard Heechul Yun University of Kansas 2335 Irving hill Rd, Lawrence, KS heechul@ittc.ku.edu Abstract In this paper, we present a case-study
More informationSingle Core Equivalence Framework (SCE) For Certifiable Multicore Avionics
Single Core Equivalence Framework (SCE) For Certifiable Multicore Avionics http://rtsl-edge.cs.illinois.edu/sce/ a collaboration of: Presenters: Lui Sha, Marco Caccamo, Heechul Yun, Renato Mancuso, Jung-Eun
More informationTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems
More informationWorst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni
orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26 Multi-equestor Systems CPU CPU CPU
More informationDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi $, Prathap Kumar Valsan^, Renato Mancuso *, Heechul Yun $ $ University of Kansas, ^ Intel, * Boston University
More informationVariability Windows for Predictable DDR Controllers, A Technical Report
Variability Windows for Predictable DDR Controllers, A Technical Report MOHAMED HASSAN 1 INTRODUCTION In this technical report, we detail the derivation of the variability window for the eight predictable
More informationWorst Case Analysis of DRAM Latency in Hard Real Time Systems
Worst Case Analysis of DRAM Latency in Hard Real Time Systems by Zheng Pei Wu A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied
More informationSingle Core Equivalent Virtual Machines. for. Hard Real- Time Computing on Multicore Processors
Single Core Equivalent Virtual Machines for Hard Real- Time Computing on Multicore Processors Lui Sha, Marco Caccamo, Renato Mancuso, Jung- Eun Kim, and Man- Ki Yoon University of Illinois at Urbana- Champaign
More information562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016
562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016 Memory Bandwidth Management for Efficient Performance Isolation in Multi-Core Platforms Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Member,
More informationAn introduction to SDRAM and memory controllers. 5kk73
An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part
More informationUnderstanding Shared Memory Bank Access Interference in Multi-Core Avionics
Understanding Shared Memory Bank Access Interference in Multi-Core Avionics Andreas Löfwenmark and Simin Nadjm-Tehrani Department of Computer and Information Science Linköping University, Sweden {andreas.lofwenmark,
More informationEvaluating the Isolation Effect of Cache Partitioning on COTS Multicore Platforms
Evaluating the Isolation Effect of Cache Partitioning on COTS Multicore Platforms Heechul Yun, Prathap Kumar Valsan University of Kansas {heechul.yun, prathap.kumarvalsan}@ku.edu Abstract Tasks running
More informationThis is the published version of a paper presented at MCC14, Seventh Swedish Workshop on Multicore Computing, Lund, Nov , 2014.
http://www.diva-portal.org This is the published version of a paper presented at MCC14, Seventh Swedish Workshop on Multicore Computing, Lund, Nov. 27-28, 2014. Citation for the original published paper:
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationSpring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand
Main Memory & DRAM Nima Honarmand Main Memory Big Picture 1) Last-level cache sends its memory requests to a Memory Controller Over a system bus of other types of interconnect 2) Memory controller translates
More informationEvaluating the Isolation Effect of Cache Partitioning on COTS Multicore Platforms
Evaluating the Isolation Effect of Cache Partitioning on COTS Multicore Platforms Heechul Yun, Prathap Kumar Valsan University of Kansas {heechul.yun, prathap.kumarvalsan}@ku.edu Abstract Tasks running
More information15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses
More informationComputer Architecture Lecture 24: Memory Scheduling
18-447 Computer Architecture Lecture 24: Memory Scheduling Prof. Onur Mutlu Presented by Justin Meza Carnegie Mellon University Spring 2014, 3/31/2014 Last Two Lectures Main Memory Organization and DRAM
More informationAdministrivia. Mini project is graded. 1 st place: Justin (75.45) 2 nd place: Liia (74.67) 3 rd place: Michael (74.49)
Administrivia Mini project is graded 1 st place: Justin (75.45) 2 nd place: Liia (74.67) 3 rd place: Michael (74.49) 1 Administrivia Project proposal due: 2/27 Original research Related to real-time embedded
More information18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013
18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,
More informationTiming analysis and timing predictability
Timing analysis and timing predictability Architectural Dependences Reinhard Wilhelm Saarland University, Saarbrücken, Germany ArtistDesign Summer School in China 2010 What does the execution time depends
More informationStaged Memory Scheduling
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit
More informationA Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study
A Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study Javier Jalle,, Eduardo Quiñones, Jaume Abella, Luca Fossati, Marco Zulianello, Francisco J. Cazorla, Barcelona
More informationarxiv: v4 [cs.ar] 19 Apr 2018
Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi, Prathap Kumar Valsan, Renato Mancuso, Heechul Yun University of Kansas, {farshchi, heechul.yun}@ku.edu Intel,
More informationMemory Bandwidth Management for Efficient Performance Isolation in Multi-core Platforms
Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platforms Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, Lui Sha University of Kansas, USA. heechul@ittc.ku.edu
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationLecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)
Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling
More informationOverview of Potential Software solutions making multi-core processors predictable for Avionics real-time applications
Overview of Potential Software solutions making multi-core processors predictable for Avionics real-time applications Marc Gatti, Thales Avionics Sylvain Girbal, Xavier Jean, Daniel Gracia Pérez, Jimmy
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationMARACAS: A Real-Time Multicore VCPU Scheduling Framework
: A Real-Time Framework Computer Science Department Boston University Overview 1 2 3 4 5 6 7 Motivation platforms are gaining popularity in embedded and real-time systems concurrent workload support less
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationDesigning High-Performance and Fair Shared Multi-Core Memory Systems: Two Approaches. Onur Mutlu March 23, 2010 GSRC
Designing High-Performance and Fair Shared Multi-Core Memory Systems: Two Approaches Onur Mutlu onur@cmu.edu March 23, 2010 GSRC Modern Memory Systems (Multi-Core) 2 The Memory System The memory system
More informationDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi University of Kansas, USA farshchi@ku.edu Prathap Kumar Valsan Intel, USA prathap.kumar.valsan@intel.com Renato
More informationA Comparative Study of Predictable DRAM Controllers
0:1 0 A Comparative Study of Predictable DRAM Controllers Real-time embedded systems require hard guarantees on task Worst-Case Execution Time (WCET). For this reason, architectural components employed
More informationBalancing DRAM Locality and Parallelism in Shared Memory CMP Systems
Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard
More informationCompositional Analysis for the Multi-Resource Server *
Compositional Analysis for the Multi-Resource Server * Rafia Inam, Moris Behnam, Thomas Nolte, Mikael Sjödin Management and Operations of Complex Systems (MOCS), Ericsson AB, Sweden rafia.inam@ericsson.com
More informationNEW REAL-TIME MEMORY CONTROLLER DESIGN FOR EMBEDDED MULTI-CORE SYSTEM By Ahmed Shafik Shafie Mohamed
NEW REAL-TIME MEMORY CONTROLLER DESIGN FOR EMBEDDED MULTI-CORE SYSTEM By Ahmed Shafik Shafie Mohamed A Thesis Submitted to the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More information15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationCOTS Multicore Processors in Avionics Systems: Challenges and Solutions
COTS Multicore Processors in Avionics Systems: Challenges and Solutions Dionisio de Niz Bjorn Andersson and Lutz Wrage dionisio@sei.cmu.edu, baandersson@sei.cmu.edu, lwrage@sei.cmu.edu Report Documentation
More informationLecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)
Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More information4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.
Chapter 4: CPU 4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.8 Control hazard 4.14 Concluding Rem marks Hazards Situations that
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationA Comparative Study of Predictable DRAM Controllers
A A Comparative Study of Predictable DRAM Controllers Danlu Guo,Mohamed Hassan,Rodolfo Pellizzoni and Hiren Patel, {dlguo,mohamed.hassan,rpellizz,hiren.patel}@uwaterloo.ca, University of Waterloo Recently,
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationThe Migra*on of Safety- Cri*cal RT So6ware to Mul*core. Marco Caccamo University of Illinois at Urbana- Champaign
The Migra*on of Safety- Cri*cal RT So6ware to Mul*core Marco Caccamo University of Illinois at Urbana- Champaign Outline Mo*va*on Memory- centric scheduling theory Background: PRedictable Execu*on Model
More informationChapter Seven Morgan Kaufmann Publishers
Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationDRAM Main Memory. Dual Inline Memory Module (DIMM)
DRAM Main Memory Dual Inline Memory Module (DIMM) Memory Technology Main memory serves as input and output to I/O interfaces and the processor. DRAMs for main memory, SRAM for caches Metrics: Latency,
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationMemGuard on Raspberry Pi 3
EECS 750 Mini Project #1 MemGuard on Raspberry Pi 3 In this mini-project, you will first learn how to build your own kernel on raspberry pi3. You then will learn to compile and use an out-of-source-tree
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationECE 5730 Memory Systems
ECE 5730 Memory Systems Spring 2009 More on Memory Scheduling Lecture 16: 1 Exam I average =? Announcements Course project proposal What you propose to investigate What resources you plan to use (tools,
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationResource Sharing and Partitioning in Multicore
www.bsc.es Resource Sharing and Partitioning in Multicore Francisco J. Cazorla Mixed Criticality/Reliability Workshop HiPEAC CSW Barcelona May 2014 Transition to Multicore and Manycores Wanted or imposed
More informationMultimedia Systems 2011/2012
Multimedia Systems 2011/2012 System Architecture Prof. Dr. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de Sitemap 2 Hardware
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationLecture notes for CS Chapter 2, part 1 10/23/18
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationA Comparative Study of Predictable DRAM Controllers
1 A Comparative Study of Predictable DRAM Controllers DALU GUO, MOHAMED HASSA, RODOLFO PELLIZZOI, and HIRE PATEL, University of Waterloo, CADA Recently, the research community has introduced several predictable
More informationCS311 Lecture 21: SRAM/DRAM/FLASH
S 14 L21-1 2014 CS311 Lecture 21: SRAM/DRAM/FLASH DARM part based on ISCA 2002 tutorial DRAM: Architectures, Interfaces, and Systems by Bruce Jacob and David Wang Jangwoo Kim (POSTECH) Thomas Wenisch (University
More informationImproving DRAM Performance by Parallelizing Refreshes with Accesses
Improving DRAM Performance by Parallelizing Refreshes with Accesses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu Executive Summary DRAM refresh interferes
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationI, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.
5 Solutions Chapter 5 Solutions S-3 5.1 5.1.1 4 5.1.2 I, J 5.1.3 A[I][J] 5.1.4 3596 8 800/4 2 8 8/4 8000/4 5.1.5 I, J 5.1.6 A(J, I) 5.2 5.2.1 Word Address Binary Address Tag Index Hit/Miss 5.2.2 3 0000
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationAnalyzing the Effect Of Predictability of Memory References on WCET
NORTH CAROLINA STATE UNIVERSITY Analyzing the Effect Of Predictability of Memory References on WCET Amir Bahmani Department of Computer Science North Carolina State University abahaman@ncsu. edu Vishwanathan
More informationBasics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS
Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS
More informationReal-Time Cache Management for Multi-Core Virtualization
Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More informationLecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.
Lecture: SMT, Cache Hierarchies Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Problem 0 Consider the following LSQ and when operands are
More informationSudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Active thread Idle thread
Intra-Warp Compaction Techniques Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Goal Active thread Idle thread Compaction Compact threads in a warp to coalesce (and eliminate)
More information250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019
250P: Computer Systems Architecture Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019 The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction and instr
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationLecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models
Lecture: Memory, Multiprocessors Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models 1 Refresh Every DRAM cell must be refreshed within a 64 ms window A row
More informationLecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )
Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More informationLecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.
Lecture: SMT, Cache Hierarchies Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Problem 1 Consider the following LSQ and when operands are
More informationPerformance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews
Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationarxiv: v1 [cs.dc] 25 Jul 2014
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems arxiv:1407.7448v1 [cs.dc] 25 Jul 2014 Heechul Yun University of Kansas, USA. heechul.yun@ku.edu July 29, 2014 Abstract In
More informationCEC 450 Real-Time Systems
CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation
More information