Administrivia. Mini project is graded. 1 st place: Justin (75.45) 2 nd place: Liia (74.67) 3 rd place: Michael (74.49)

Size: px
Start display at page:

Download "Administrivia. Mini project is graded. 1 st place: Justin (75.45) 2 nd place: Liia (74.67) 3 rd place: Michael (74.49)"

Transcription

1 Administrivia Mini project is graded 1 st place: Justin (75.45) 2 nd place: Liia (74.67) 3 rd place: Michael (74.49) 1

2 Administrivia Project proposal due: 2/27 Original research Related to real-time embedded systems/cps Building a cyber-physical system (robot) Must include real-time performance evaluation on a selected hardware platform Repeating the evaluation of a chosen paper Any one of the suggested papers. 2

3 Administrivia Addition presentation schedule 2 papers/day on Week 15 (a week before final) eliminate individual meeting Or 2 papers/day on Week 11,12,13 Keep individual meeting 3

4 Real-Time DRAM Controller Heechul Yun 4

5 Memory Performance Isolation Part 1 Part 2 Part 3 Part 4 Core1 Core2 Core3 Core4 LLC LLC LLC LLC Memory Controller DRAM Q. How to guarantee predictable memory performance? 5

6 RQUST #1 ARRIVS How Page Works close the previous page and load new one RQUST #1 COMPLTS, RQUST #2 ARRIVS PR ACT RAD Latency of Request #1 DATA RQUST #2 COMPLTS * Latency First Access page is already open, just issue read command Latency Further Accesses Single Core Data Cycles for each core RAD DATA Latency of Request #2 (with open page) in clock cycles on a JDC-compliant DDR3 module

7 ffects of Contention ALL RQUSTS ARRIV AT TH SAM TIM, TARGTD AT SAM BANK AND RANK P A R D P A R D P A R D * Latency First Access Latency Further Accesses Single Core Multiple Cores same bank/rank 35*N 35*N 4 Data Cycles for each core

8 ALL RQUSTS ARRIV AT TH SAM TIM, TARGTD AT DIFFRNT RANKS ffects of Contention PR ACT R DATA PR ACT R DATA PR ACT R DATA * Latency First Access Latency Further Accesses Single Core N Cores same bank/rank N Cores different ranks *(N-1) *(N-1) *(N-1) 9 + 4*(N-1) 4 Data Cycles used by each access

9 Real-Time Memory Controllers Provided guaranteed performance in accessing DRAM. 9

10 Real-Time Memory Controllers Common techniques Command grouping Force to use ALL banks for each memory access Private banking Assign private DRAM banks to cores Scheduling Use analysis friendly scheduling (e.g., round-robin) over difficult ones (e.g., FR-FCFS) 10

11 Predator 11

12 Worst-case Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Slow 1bank b/w Bank 1 Bank 2 Bank 3 Bank 4 Less than peak b/w How much?

13 Worst-Case For Single-Bank: Horrible 13

14 Bank Interleaving and Groups 14

15 Arbitration: CCSP 15

16 Controller Architecture 16

17 Real-Time Memory Controllers (RTMC) Predator Command grouping, CCSP arbitration AMC Command grouping, round-robin arbitration PRT-MC Private bank, TDMA arbitration DcMc, MDUSA RR + FR-FCFS hybrid, bank partitioning Read/Write Bundling Reduce bus turn-around overhead.. 17

18 RTMC References Predator: a predictable sdram memory controller. CODS+ISSS An analyzable memory controller for hard real-time CMPs, I mbedded Systems Letters, 2009 PRT DRAM controller: Bank privatization for predictability and temporal isolation, CODS+ISSS, 2011 A dual-criticality memory controller (dcmc): Proposal and evaluation of a space case study, RTAS, 2015 Improved DRAM Timing Bounds for Real-Time DRAM Controllers with Read/Write Bundling, 2016 A Comprehensive Study of DRAM Controllers in Real-Time Systems. Danlu Guo, MS Thesis, University of Waterloo,

19 Real-Time Multi/Many-Core Architecture Why is it difficult to analyze WCT? Projects on Real-Time CPU Architectures 19

20 Worst-Case xecution Time (WCT) Image source: [Wilhelm et al., 2008] Real-time scheduling theory is based on the assumption of known WCTs of real-time tasks 20

21 Computing WCT Static analysis Input: program code, architecture model output: WCT Problem: architecture model is hard and pessimistic (recall Parallelism-aware paper) Measurement No guarantee on true worst-case But, widely used in practice 21

22 Memory Hierarchies, Pipelines, and Buses for Future Architectures in Time-Critical mbedded Systems 22

23 Problematic CPU Features Architectures are optimized to reduce average performance WCT estimation is hard because of Pipelining TLBs/Caches Super-scalar Out-of-order scheduling Branch predictors Hardware prefetchers Basically anything that affect processor state 23

24 Static Timing Analysis 968 I TRANSACTIONS ON COMPUTR-AIDD DSIGN OF INTGRATD Fig. 1. Main components of a timing-analysis framework and their interaction. number of exec of value-analys the obtained tim 5) Microarchitectu bounds on the forming an abs into account th ulation concep approximations point. Pipeline through the pip resources like q average-case-en cise bounds. 6) Global bound bounds on exe formation abou combined to co through the pro formation prov 24 analyses.

25 Control Flow Graph (CFG) Analyze code Split basic blocks Compute per-block WCT use abstract CPU model 25

26 Timing Anomalies Locally faster!= globally faster Image source: [Wilhelm et al., 2008] 26

27 Timing Anomalies Locally faster!= globally faster Image source: [Wilhelm et al., 2008] 27

28 Real-Time CPU Architectures PRT UC Berkeley. MRASA/parMRASA project U ACROSS U ARAMIS Germany MC2 U 28

29 29

30 PRT Pipeline Thread 1, Instruction 1 Thread 1, Instruction 2 THRAD#1 FTCH DCOD RGACC MM XCUT XCPT FTCH DCOD RGACC MM XCUT XCPT THRAD#2 FTCH DCOD RGACC MM XCUT XCPT FTCH DCOD RGACC MM XCUT THRAD#3 FTCH DCOD RGACC MM XCUT XCPT FTCH DCOD RGACC MM THRAD#4 FTCH DCOD RGACC MM XCUT XCPT FTCH DCOD RGACC THRAD#5 FTCH DCOD RGACC MM XCUT XCPT FTCH DCOD THRAD#6 FTCH DCOD RGACC MM XCUT XCPT FTCH 1 clock t 30

31 FlexPRT Pipeline 31

32 MRASA Multicore 32

33 33

34 Acknowledgement Some slides are from: Prof. Rodolfo Pellizzoni, University of Waterloo Prof. dward A. Lee, University of Berkeley 34

35 Summary Timing anomalies Locally fast!= globally fast on non-timing compositional architectures (i.e., most architectures) Timing compositional architecture Free of timing anomalies 35

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26 Multi-equestor Systems CPU CPU CPU

More information

Managing Memory for Timing Predictability. Rodolfo Pellizzoni

Managing Memory for Timing Predictability. Rodolfo Pellizzoni Managing Memory for Timing Predictability Rodolfo Pellizzoni Thanks This work would not have been possible without the following students and collaborators Zheng Pei Wu*, Yogen Krish Heechul Yun* Renato

More information

Variability Windows for Predictable DDR Controllers, A Technical Report

Variability Windows for Predictable DDR Controllers, A Technical Report Variability Windows for Predictable DDR Controllers, A Technical Report MOHAMED HASSAN 1 INTRODUCTION In this technical report, we detail the derivation of the variability window for the eight predictable

More information

Deterministic Memory Abstraction and Supporting Multicore System Architecture

Deterministic Memory Abstraction and Supporting Multicore System Architecture Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi $, Prathap Kumar Valsan^, Renato Mancuso *, Heechul Yun $ $ University of Kansas, ^ Intel, * Boston University

More information

A Comparative Study of Predictable DRAM Controllers

A Comparative Study of Predictable DRAM Controllers 1 A Comparative Study of Predictable DRAM Controllers DALU GUO, MOHAMED HASSA, RODOLFO PELLIZZOI, and HIRE PATEL, University of Waterloo, CADA Recently, the research community has introduced several predictable

More information

Worst Case Analysis of DRAM Latency in Hard Real Time Systems

Worst Case Analysis of DRAM Latency in Hard Real Time Systems Worst Case Analysis of DRAM Latency in Hard Real Time Systems by Zheng Pei Wu A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied

More information

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,

More information

A Comparative Study of Predictable DRAM Controllers

A Comparative Study of Predictable DRAM Controllers A A Comparative Study of Predictable DRAM Controllers Danlu Guo,Mohamed Hassan,Rodolfo Pellizzoni and Hiren Patel, {dlguo,mohamed.hassan,rpellizz,hiren.patel}@uwaterloo.ca, University of Waterloo Recently,

More information

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know.

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know. Administrivia HW0 scores, HW1 peer-review assignments out. HW2 out, due Nov. 2. If you re having Cython trouble with HW2, let us know. Review on Wednesday: Post questions on Piazza Introduction to GPUs

More information

Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256

Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256 Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256 Jan Reineke @ saarland university computer science ACACES Summer School 2017

More information

Timing analysis and timing predictability

Timing analysis and timing predictability Timing analysis and timing predictability Architectural Dependences Reinhard Wilhelm Saarland University, Saarbrücken, Germany ArtistDesign Summer School in China 2010 What does the execution time depends

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems

More information

CSE 490/590 Computer Architecture Homework 2

CSE 490/590 Computer Architecture Homework 2 CSE 490/590 Computer Architecture Homework 2 1. Suppose that you have the following out-of-order datapath with 1-cycle ALU, 2-cycle Mem, 3-cycle Fadd, 5-cycle Fmul, no branch prediction, and in-order fetch

More information

Impact of Resource Sharing on Performance and Performance Prediction: A Survey

Impact of Resource Sharing on Performance and Performance Prediction: A Survey Impact of Resource Sharing on Performance and Performance Prediction: A Survey Andreas Abel, Florian Benz, Johannes Doerfert, Barbara Dörr, Sebastian Hahn, Florian Haupenthal, Michael Jacobs, Amir H. Moin,

More information

An introduction to SDRAM and memory controllers. 5kk73

An introduction to SDRAM and memory controllers. 5kk73 An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part

More information

Resource Sharing and Partitioning in Multicore

Resource Sharing and Partitioning in Multicore www.bsc.es Resource Sharing and Partitioning in Multicore Francisco J. Cazorla Mixed Criticality/Reliability Workshop HiPEAC CSW Barcelona May 2014 Transition to Multicore and Manycores Wanted or imposed

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

A Comparative Study of Predictable DRAM Controllers

A Comparative Study of Predictable DRAM Controllers 0:1 0 A Comparative Study of Predictable DRAM Controllers Real-time embedded systems require hard guarantees on task Worst-Case Execution Time (WCET). For this reason, architectural components employed

More information

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard

More information

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim Farzad Farshchi, Qijing Huang, Heechul Yun University of Kansas, University of California, Berkeley SiFive Internship Rocket

More information

-- the Timing Problem & Possible Solutions

-- the Timing Problem & Possible Solutions ARTIST Summer School in Europe 2010 Autrans (near Grenoble), France September 5-10, 2010 Towards Real-Time Applications on Multicore -- the Timing Problem & Possible Solutions Wang Yi Uppsala University,

More information

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections ) Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case

More information

A Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study

A Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study A Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study Javier Jalle,, Eduardo Quiñones, Jaume Abella, Luca Fossati, Marco Zulianello, Francisco J. Cazorla, Barcelona

More information

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand Main Memory & DRAM Nima Honarmand Main Memory Big Picture 1) Last-level cache sends its memory requests to a Memory Controller Over a system bus of other types of interconnect 2) Memory controller translates

More information

This is the published version of a paper presented at MCC14, Seventh Swedish Workshop on Multicore Computing, Lund, Nov , 2014.

This is the published version of a paper presented at MCC14, Seventh Swedish Workshop on Multicore Computing, Lund, Nov , 2014. http://www.diva-portal.org This is the published version of a paper presented at MCC14, Seventh Swedish Workshop on Multicore Computing, Lund, Nov. 27-28, 2014. Citation for the original published paper:

More information

Staged Memory Scheduling

Staged Memory Scheduling Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

Designing Predictable Real-Time and Embedded Systems

Designing Predictable Real-Time and Embedded Systems Designing Predictable Real-Time and Embedded Systems Juniorprofessor Dr. Jian-Jia Chen Karlsruhe Institute of Technology (KIT), Germany 0 KIT Feb. University 27-29, 2012 of at thetu-berlin, State of Baden-Wuerttemberg

More information

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,

More information

Final Lecture. A few minutes to wrap up and add some perspective

Final Lecture. A few minutes to wrap up and add some perspective Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection

More information

NEW REAL-TIME MEMORY CONTROLLER DESIGN FOR EMBEDDED MULTI-CORE SYSTEM By Ahmed Shafik Shafie Mohamed

NEW REAL-TIME MEMORY CONTROLLER DESIGN FOR EMBEDDED MULTI-CORE SYSTEM By Ahmed Shafik Shafie Mohamed NEW REAL-TIME MEMORY CONTROLLER DESIGN FOR EMBEDDED MULTI-CORE SYSTEM By Ahmed Shafik Shafie Mohamed A Thesis Submitted to the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Multiprocessor scheduling, part 1 -ChallengesPontus Ekberg

Multiprocessor scheduling, part 1 -ChallengesPontus Ekberg Multiprocessor scheduling, part 1 -ChallengesPontus Ekberg 2017-10-03 What is a multiprocessor? Simplest answer: A machine with >1 processors! In scheduling theory, we include multicores in this defnition

More information

Design and Analysis of Real-Time Systems Predictability and Predictable Microarchitectures

Design and Analysis of Real-Time Systems Predictability and Predictable Microarchitectures Design and Analysis of Real-Time Systems Predictability and Predictable Microarcectures Jan Reineke Advanced Lecture, Summer 2013 Notion of Predictability Oxford Dictionary: predictable = adjective, able

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

15-740/ Computer Architecture, Fall 2011 Midterm Exam II

15-740/ Computer Architecture, Fall 2011 Midterm Exam II 15-740/18-740 Computer Architecture, Fall 2011 Midterm Exam II Instructor: Onur Mutlu Teaching Assistants: Justin Meza, Yoongu Kim Date: December 2, 2011 Name: Instructions: Problem I (69 points) : Problem

More information

Advanced Caching Techniques

Advanced Caching Techniques Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory

More information

Design and Analysis of Real-Time Systems Microarchitectural Analysis

Design and Analysis of Real-Time Systems Microarchitectural Analysis Design and Analysis of Real-Time Systems Microarchitectural Analysis Jan Reineke Advanced Lecture, Summer 2013 Structure of WCET Analyzers Reconstructs a control-flow graph from the binary. Determines

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Recap. Run to completion in order of arrival Pros: simple, low overhead, good for batch jobs Cons: short jobs can stuck behind the long ones

Recap. Run to completion in order of arrival Pros: simple, low overhead, good for batch jobs Cons: short jobs can stuck behind the long ones Recap First-Come, First-Served (FCFS) Run to completion in order of arrival Pros: simple, low overhead, good for batch jobs Cons: short jobs can stuck behind the long ones Round-Robin (RR) FCFS with preemption.

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Lecture 25: Busses. A Typical Computer Organization

Lecture 25: Busses. A Typical Computer Organization S 09 L25-1 18-447 Lecture 25: Busses James C. Hoe Dept of ECE, CMU April 27, 2009 Announcements: Project 4 due this week (no late check off) HW 4 due today Handouts: Practice Final Solutions A Typical

More information

Memory Efficient Scheduling for Multicore Real-time Systems

Memory Efficient Scheduling for Multicore Real-time Systems Memory Efficient Scheduling for Multicore Real-time Systems by Ahmed Alhammad A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy

More information

arxiv: v1 [cs.dc] 25 Jul 2014

arxiv: v1 [cs.dc] 25 Jul 2014 Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems arxiv:1407.7448v1 [cs.dc] 25 Jul 2014 Heechul Yun University of Kansas, USA. heechul.yun@ku.edu July 29, 2014 Abstract In

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4. Chapter 4: CPU 4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.8 Control hazard 4.14 Concluding Rem marks Hazards Situations that

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Properties of Processes

Properties of Processes CPU Scheduling Properties of Processes CPU I/O Burst Cycle Process execution consists of a cycle of CPU execution and I/O wait. CPU burst distribution: CPU Scheduler Selects from among the processes that

More information

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

COSC 6385 Computer Architecture - Memory Hierarchy Design (III) COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses

More information

TDT 4260 lecture 7 spring semester 2015

TDT 4260 lecture 7 spring semester 2015 1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding

More information

ECE/CS 757: Homework 1

ECE/CS 757: Homework 1 ECE/CS 757: Homework 1 Cores and Multithreading 1. A CPU designer has to decide whether or not to add a new micoarchitecture enhancement to improve performance (ignoring power costs) of a block (coarse-grain)

More information

COSC 6385 Computer Architecture - Memory Hierarchies (II)

COSC 6385 Computer Architecture - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Improving Real-Time Performance on Multicore Platforms Using MemGuard

Improving Real-Time Performance on Multicore Platforms Using MemGuard Improving Real-Time Performance on Multicore Platforms Using MemGuard Heechul Yun University of Kansas 2335 Irving hill Rd, Lawrence, KS heechul@ittc.ku.edu Abstract In this paper, we present a case-study

More information

CISC 7310X. C05: CPU Scheduling. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 3/1/2018 CUNY Brooklyn College

CISC 7310X. C05: CPU Scheduling. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 3/1/2018 CUNY Brooklyn College CISC 7310X C05: CPU Scheduling Hui Chen Department of Computer & Information Science CUNY Brooklyn College 3/1/2018 CUNY Brooklyn College 1 Outline Recap & issues CPU Scheduling Concepts Goals and criteria

More information

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM

More information

arxiv: v1 [cs.ar] 5 Jul 2012

arxiv: v1 [cs.ar] 5 Jul 2012 Dynamic Priority Queue: An SDRAM Arbiter With Bounded Access Latencies for Tight WCET Calculation Hardik Shah 1, Andreas Raabe 1 and Alois Knoll 2 1 ForTISS GmbH, Guerickestr. 25, 80805 Munich 2 Department

More information

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS

More information

CMU Introduction to Computer Architecture, Spring Midterm Exam 2. Date: Wed., 4/17. Legibility & Name (5 Points): Problem 1 (90 Points):

CMU Introduction to Computer Architecture, Spring Midterm Exam 2. Date: Wed., 4/17. Legibility & Name (5 Points): Problem 1 (90 Points): Name: CMU 18-447 Introduction to Computer Architecture, Spring 2013 Midterm Exam 2 Instructions: Date: Wed., 4/17 Legibility & Name (5 Points): Problem 1 (90 Points): Problem 2 (35 Points): Problem 3 (35

More information

Lecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)

Lecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

arxiv: v4 [cs.ar] 19 Apr 2018

arxiv: v4 [cs.ar] 19 Apr 2018 Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi, Prathap Kumar Valsan, Renato Mancuso, Heechul Yun University of Kansas, {farshchi, heechul.yun}@ku.edu Intel,

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

GPUs have enormous power that is enormously difficult to use

GPUs have enormous power that is enormously difficult to use 524 GPUs GPUs have enormous power that is enormously difficult to use Nvidia GP100-5.3TFlops of double precision This is equivalent to the fastest super computer in the world in 2001; put a single rack

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Single Core Equivalent Virtual Machines. for. Hard Real- Time Computing on Multicore Processors

Single Core Equivalent Virtual Machines. for. Hard Real- Time Computing on Multicore Processors Single Core Equivalent Virtual Machines for Hard Real- Time Computing on Multicore Processors Lui Sha, Marco Caccamo, Renato Mancuso, Jung- Eun Kim, and Man- Ki Yoon University of Illinois at Urbana- Champaign

More information

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3) Lecture 15: DRAM Main Memory Systems Today: DRAM basics and innovations (Section 2.3) 1 Memory Architecture Processor Memory Controller Address/Cmd Bank Row Buffer DIMM Data DIMM: a PCB with DRAM chips

More information

SE300 SWE Practices. Lecture 10 Introduction to Event- Driven Architectures. Tuesday, March 17, Sam Siewert

SE300 SWE Practices. Lecture 10 Introduction to Event- Driven Architectures. Tuesday, March 17, Sam Siewert SE300 SWE Practices Lecture 10 Introduction to Event- Driven Architectures Tuesday, March 17, 2015 Sam Siewert Copyright {c} 2014 by the McGraw-Hill Companies, Inc. All rights Reserved. Four Common Types

More information

EEM 486: Computer Architecture. Lecture 9. Memory

EEM 486: Computer Architecture. Lecture 9. Memory EEM 486: Computer Architecture Lecture 9 Memory The Big Picture Designing a Multiple Clock Cycle Datapath Processor Control Memory Input Datapath Output The following slides belong to Prof. Onur Mutlu

More information

A Timing Effects of DDR Memory Systems in Hard Real-Time Multicore Architectures: Issues and Solutions

A Timing Effects of DDR Memory Systems in Hard Real-Time Multicore Architectures: Issues and Solutions A Timing Effects of DDR Memory Systems in Hard Real-Time Multicore Architectures: Issues and Solutions MARCO PAOLIERI, Barcelona Supercomputing Center (BSC) EDUARDO QUIÑONES, Barcelona Supercomputing Center

More information

Memory Controllers for Real-Time Embedded Systems. Benny Akesson Czech Technical University in Prague

Memory Controllers for Real-Time Embedded Systems. Benny Akesson Czech Technical University in Prague Memory Controllers for Real-Time Embedded Systems Benny Akesson Czech Technical University in Prague Trends in Embedded Systems Embedded systems get increasingly complex Increasingly complex applications

More information

ECE 5730 Memory Systems

ECE 5730 Memory Systems ECE 5730 Memory Systems Spring 2009 More on Memory Scheduling Lecture 16: 1 Exam I average =? Announcements Course project proposal What you propose to investigate What resources you plan to use (tools,

More information

Bounding and Reducing Memory Interference in COTS-based Multi-Core Systems

Bounding and Reducing Memory Interference in COTS-based Multi-Core Systems Bounding and Reducing Memory Interference in COTS-based Multi-Core Systems Hyoseung Kim hyoseung@cmu.edu Dionisio de Niz dionisio@sei.cmu.edu Björn Andersson baandersson@sei.cmu.edu Mark Klein mk@sei.cmu.edu

More information

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013)

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013) CPU Scheduling Daniel Mosse (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013) Basic Concepts Maximum CPU utilization obtained with multiprogramming CPU I/O Burst Cycle Process

More information

Single Core Equivalence Framework (SCE) For Certifiable Multicore Avionics

Single Core Equivalence Framework (SCE) For Certifiable Multicore Avionics Single Core Equivalence Framework (SCE) For Certifiable Multicore Avionics http://rtsl-edge.cs.illinois.edu/sce/ a collaboration of: Presenters: Lui Sha, Marco Caccamo, Heechul Yun, Renato Mancuso, Jung-Eun

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast

More information

Precision Timed (PRET) Machines

Precision Timed (PRET) Machines Precision Timed (PRET) Machines Edward A. Lee Robert S. Pepper Distinguished Professor UC Berkeley BWRC Open House, Berkeley, CA February, 2012 Key Collaborators on work shown here: Steven Edwards Jeff

More information

Computer Architecture Lecture 24: Memory Scheduling

Computer Architecture Lecture 24: Memory Scheduling 18-447 Computer Architecture Lecture 24: Memory Scheduling Prof. Onur Mutlu Presented by Justin Meza Carnegie Mellon University Spring 2014, 3/31/2014 Last Two Lectures Main Memory Organization and DRAM

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory CS 152 Computer Architecture and Engineering Lecture 6 - Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs152!

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina TEC- OpenSPARC T1 The T1 is a new-from-the-ground-up SPARC microprocessor implementation

More information

Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core

Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core 3 rd HiPEAC Workshop IBM, Haifa 17-4-2007 Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core, P. Michaud, L. He, D. Fetis, C. Ioannou, P. Charalambous and A. Seznec

More information

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis Hardware-Software Codesign 9. Worst Case Execution Time Analysis Lothar Thiele 9-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis

More information

Stride- and Global History-based DRAM Page Management

Stride- and Global History-based DRAM Page Management 1 Stride- and Global History-based DRAM Page Management Mushfique Junayed Khurshid, Mohit Chainani, Alekhya Perugupalli and Rahul Srikumar University of Wisconsin-Madison Abstract To improve memory system

More information

Lecture 18: DRAM Technologies

Lecture 18: DRAM Technologies Lecture 18: DRAM Technologies Last Time: Cache and Virtual Memory Review Today DRAM organization or, why is DRAM so slow??? Lecture 18 1 Main Memory = DRAM Lecture 18 2 Basic DRAM Architecture Lecture

More information

Design and Analysis of Time-Critical Systems Introduction

Design and Analysis of Time-Critical Systems Introduction Design and Analysis of Time-Critical Systems Introduction Jan Reineke @ saarland university ACACES Summer School 2017 Fiuggi, Italy computer science Structure of this Course 2. How are they implemented?

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view

Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view 1 Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view Pierre Michaud INRIA HiPEAC 11, January 26, 2011 2 Outline Self-performance contract Proposition for

More information