Predictable Cache Coherence for Multi- Core Real-Time Systems

Size: px
Start display at page:

Download "Predictable Cache Coherence for Multi- Core Real-Time Systems"

Transcription

1 Predictable Cache Coherence for Multi- Core Real-Time Systems Mohamed Hassan, Anirudh M. Kaushik and Hiren Patel RTAS 2017

2 Motivation: Data sharing in multi-core real-time systems Core 1 RT tasks L1 D/I $ P1 Private data Interconnect LLC/Main memory S1 Shared data PAGE 2

3 Motivation: Data sharing in multi-core real-time systems No caching of shared data [RTSS 09, RTNS 10] T1 T2 T3 P1 P2 S1 S2 P1 P2 S3 S4 Simpler timing analysis Hardware support Long execution time PAGE 3

4 Motivation: Data sharing in multi-core real-time systems No caching of shared data [RTSS 09, RTNS 10] Task scheduling on shared data [ECRTS 10, RTSS 16] Scheduler T1 T2 T3 T1 T2 T3 P1 P2 P1 S1 P2 S1 P1 S3 S1 P1 S3 S2 P2 S4 S2 P2 S4 Simpler timing analysis Hardware support Long execution time Private cache hits on shared data No hardware support Limited multi-core parallelism Changes to OS scheduler PAGE 4

5 Motivation: Data sharing in multi-core real-time systems No caching of shared data [RTSS 09, RTNS 10] Task scheduling on shared data [ECRTS 10, RTSS 16] Scheduler Software cache coherence [SAMOS 14] T1 T2 T3 P1 P2 T1 P1 T2 S1 T3 P2 T1 P1 T2 S1 /* Access T3 to * private data */ P2 /* Access to * shared data */ S1 P1 S3 S1 P1 S3 S1 P1 S3 S2 P2 S4 S2 P2 S4 S2 P2 S4 Simpler timing analysis Hardware support Long execution time Private cache hits on shared data No hardware support Limited multi-core parallelism Changes to OS scheduler PAGE 5 Private cache hits on shared data Hardware support Source code changes

6 Motivation: Data sharing in multi-core real-time systems No caching of shared data [RTSS 09, RTNS 10] S1 S2 P1 P2 S3 S4 Task scheduling on shared data [ECRTS 10, RTSS 16] S1 S2 P1 P2 S3 S4 Software cache coherence [SAMOS 14] Scheduler /* Access to T1 T2 T3 T1 T2 T3 T1 T2 T3 * private data */ P1Disabling P2 simultaneous P1 S1 cached P2 shared data P1 S1for timing P2 /* Access to analysis at the cost of lower performance and OS and * shared data */ application changes. S1 S2 P1 P2 S3 S4 Simpler timing analysis Hardware support Long execution time Private cache hits on shared data No hardware support Limited multi-core parallelism Changes to OS scheduler PAGE 6 Private cache hits on shared data Hardware support Source code changes

7 Motivation: Data sharing in multi-core real-time systems No caching of shared data [RTSS 09, RTNS 10] S1 Task scheduling on shared data [ECRTS 10, RTSS 16] Software cache coherence [SAMOS 14] Scheduler T1 T2 T3 T1 T2This work: T3 T1 T2 /* Access to T3 * private data */ Allowing P1 predictable P2 P1simultaneous S1 P2 cached P1 S1 shared P2 data /* Access to accesses through hardware cache coherence, and provide * shared data */ significant performance improvements with no S2 P1 P2 S3 S4 changes to S1OS P1and S3 applications. S1 P1 S2 P2 S4 S2 P2 S3 S4 Simpler timing analysis Hardware support Long execution time Private cache hits on shared data No hardware support Limited multi-core parallelism Changes to OS scheduler PAGE 7 Private cache hits on shared data Hardware support Source code changes

8 Outline Overview of hardware cache coherence Challenges of applying conventional hardware cache coherence on RT multi-core systems Our solution: Predictable Modified-Shared-Invalid (PMSI) cache coherence protocol Latency analysis of PMSI Results Conclusion PAGE 8

9 Overview of hardware cache coherence Modified-Shared-Invalid protocol Core 1 I M OwnGETM OwnUPG OtherGETS S OwnGETS or OtherGETS GETS: Memory load GETM/UPG: Memory store PUTM: Replacement PAGE 9

10 Applying hardware cache coherence to multi-core RT systems Modified-Shared-Invalid protocol Core 1 I L1 D/I $ L1 D/I $ Real-time interconnect + LLC/Main memory M OwnGETM OwnUPG OtherGETS S OwnGETS or OtherGETS PAGE 10

11 Applying hardware cache coherence to multi-core RT systems Modified-Shared-Invalid protocol Core 1 I Multi-core L1 D/I $ RT system L1 D/I $ with predictable arbitration to shared components + + Real-time interconnect Well studied cache coherence protocol = OwnUPG M S Predictable latencies for shared data accesses? LLC/Main memory OwnGETM OtherGETS OwnGETS or OtherGETS PAGE 11

12 Applying hardware cache coherence to multi-core RT systems Modified-Shared-Invalid protocol Core 1 I Multi-core L1 D/I $ RT system L1 D/I $ with predictable arbitration to shared components + + Real-time interconnect Well studied cache coherence protocol = OwnUPG M S Predictable latencies for shared data accesses? LLC $/Main memory OwnGETM OtherGETS OwnGETS or OtherGETS PAGE 12

13 Sources of unpredictable memory access latencies due to hardware cache coherence 1. Inter-core coherence interference on same cache line 2. Inter-core coherence interference on different cache lines 3. Inter-core coherence interference due to write hits 4. Intra-core coherence interference PAGE 13

14 System model TDM schedule Core 3 Core 1 Core 3 Core 1 Core 3 A:100 M A:- I A:- I A:- I Real-time interconnect TDM arbitration A:50 M B:42 S C:31 S PR: Pending requests PWB: Pending write backs PAGE 14

15 Intra-core coherence interference TDM schedule Core 3 Core 1 Core 3 Core Event Read A Core 1 Core 3 PWB A:100 M A A:- I A:- I * A:- I : GetS A A:50 M B:42 S C:31 S PR: Pending requests PWB: Pending write backs * Denotes waiting for data PAGE 15

16 Intra-core coherence interference TDM schedule Core 3 Core 1 Core 3 Core 1 Core 3 A:100 M B:42 S PWB A A:- I A:- * A:- I Core Event Read A Read B : GetS B A:50 M B:42 S C:31 S PR: Pending requests PWB: Pending write backs PAGE 16

17 Intra-core coherence interference TDM schedule Core 3 Core 1 Core 3 Core 1 Core 3 A:100 M B:42 S A:- I A:- * A:- I Core Event Read A Read B Waiting for data A:50 M B:42 S C:31 S * Denotes waiting for data PAGE 17

18 Intra-core coherence interference TDM schedule Core 3 Core 1 Core 3 Core 1 Core 3 A:100 M B:42 S PWB A A:- I A:- * A:- I Core Event Read A Read B Waiting for data Read C A:50 M B:42 S C:31 S PR: Pending requests PWB: Pending write backs PAGE 18

19 Intra-core coherence interference TDM schedule Core 3 Core 1 Core 3 Core 1 Core 3 A:100 M B:42 S PWB A A:- I A:- * A:- I Core Event Read A Read B Waiting for data Read C A:50 M B:42 S C:31 S Unbounded latency for * Denotes waiting for data PAGE 19

20 Intra-core coherence interference TDM schedule Core 3 Core 1 Core 3 Core 1 Core 3 A:100 M B:42 S PWB A A:- I A:- * A:- I Core Event Read A Read B Waiting for data Read C : GetS B/GetS C A:50 M B:42 S C:31 S PR: Pending requests PWB: Pending write backs * Denotes waiting for data PAGE 20

21 Intra-core coherence interference TDM schedule Core 3 Core 1 Core 3 Solution Core1: 1 Architectural modification Core 3 A:100 M Predictable A:- I A:- B:42 S arbitration * A:- I Core Event Read A Read B Waiting for data Read C WB A Read B : GetS B/GetS C A:50 M B:42 S C:31 S PAGE 21

22 Intra-core coherence interference TDM schedule Core 3 Core 1 Core 3 Solution Core1: 1 Architectural modification Core 3 A:100 M Predictable A:- I A:- B:42 S arbitration * A:- I Core Event Read A Read B Waiting for data Read C WB A Read B : GetS B/GetS C A:50 M B:42 S C:31 S PAGE 22

23 Intra-core coherence interference Solution Core1: 1 Architectural modification Core 3 A:100 M B:42 S TDM schedule Core 3 Core 1 Core 3 Predictable A:- I arbitration A:- X A:- I Core Event Read A Read B Waiting for data Read C WB A Read B : GetS B/GetS C A:50 M C:31 S B:42 S Solution 2: Coherence protocol modifications Need to maintain state to indicate s pending WB of A PAGE 23

24 Sources of unpredictable memory access latencies due to cache coherence 1. Inter-core coherence interference on same cache line ü Solution: Architectural modification to shared LLC/memory 2. Inter-core coherence interference on different cache lines ü Architectural modification + Coherence modifications = Predictable Modified-Shared-Invalidate cache coherence protocol (PMSI) Solution: Architectural modification to the core + Coherence modifications 3. Inter-core coherence interference due to write hits ü Predictable latencies for simultaneous shared data access ü ü No application/os changes 4. Intra-core coherence üinterference Significant performance benefits Solution: Architectural modification to the core + Coherence modifications ü Solution: Architectural modification to the core + Coherence modifications PAGE 24

25 PMSI cache coherence protocol: Protocol modifications State Core events Bus events Load Store Replacement OwnData OwnUpg OwnPutM OtherGetS OtherGetM I Issue GetS/IS d Issue GetM/IM d X X X X N/A N/A N/A N/A S Hit Issue Upg/SM w X N/A X X N/A I I X M Hit Hit Issue PutM/MI wb X X X Issue PutM/MS wb Issue PutM/MI wb X X IS d X X X Read/S X X N/A IS d I IS d I N/A IM d X X X Write/S X X IM d S IM d I X N/A IM d I X X X Write/MI wb X X N/A X X N/A IS d I X X X Read/I X X N/A X X N/A IM d S X X X Write/MS wb X X N/A X X N/A SM w X X X N/A Store/ M X N/A I I X MI wb Hit Hit N/A X X Send data/i N/A N/A X X MS wb Hit Hit MI wb X X Send data/s N/A MI wb X X Other Upg Other PutM PAGE 25

26 PMSI cache coherence protocol: Protocol modifications State Core events Bus events Load Store Replacement OwnData OwnUpg OwnPutM OtherGetS OtherGetM I Issue GetS/IS d Issue GetM/IM d X X X X N/A N/A N/A N/A S Hit Issue Upg/SM w X N/A X X N/A I I X M Hit Hit Issue PutM/MI wb X X X Issue PutM/MS wb Issue PutM/MI wb X X IS d X X X Read/S X X N/A IS d I IS d I N/A IM d X X X Write/S X X IM d S IM d I X N/A IM d I X X X Write/MI wb X X N/A X X N/A IS d I X X X Read/I X X N/A X X N/A IM d S X X X Write/MS wb X X N/A X X N/A SM w X X X N/A Store/ M X N/A I I X MI wb Hit Hit N/A X X Send data/i N/A N/A X X MS wb Hit Hit MI wb X X Send data/s N/A MI wb X X Other Upg Other PutM PAGE 26

27 Intra-core coherence interference A:100 TDM schedule Core 3 Core 1 Core 3 TDM Core 1 Core 3 PR PWB M A:- I A:- MS wb Read B A I * State Core Bus events Event Read A A:- IOtherGets OtherGetM M Issue PutM/MS wb Issue PutM/MI wb : GetS A A:50 M B:42 S C:31 S PR LUT Address Core ID Request A 2 GetS PAGE 27

28 Intra-core coherence interference TDM Core 1 Core 3 A:100 MSwb S TDM schedule Core 3 Core 1 Core 3 WB slot PR PWB State Read BA:- AI A:- IS d A:- I MS wb Bus events OwnPutM Send data/s Core Event Read A WB A : Send A A: M S B:42 S C:31 S PR LUT Address Core ID Request A 2 GetS PAGE 28

29 Intra-core coherence interference TDM schedule Core 3 Core 1 Core 3 Core 1 Core 3 A:100 S A:- I A:100 * S A:- I Core Event Read A WB A Complete read A Data response A:100 A:100 S B:42 S C:31 S PR LUT Address Core ID Request A 2 GetS PAGE 29

30 Intra-core coherence interference TDM Core 1 Core 3 A:100 S TDM schedule Core 3 Core 1 Core 3 B:42 I S PR PWB Request slot Read BA:- I A:100 S A:- I Read C Core Event Read A WB A Complete read A Read B : GetS B A:100 S B:42 S C:31 S PAGE 30

31 Latency analysis of PMSI WCLarb N: Number of cores Core 3 Core 1 Core 3 TDM schedule Core 1 Core 3 Pending WB Pending request A:- IM d I A:- IM d I A:- IM d A:50 M MI wb L acc Real-time interconnect TDM arbitration TDM Address Core ID Request A 0 GetM A 1 GetM LLC $/Main memory WCL intercoh WCL intracoh A 2 GetM PAGE 31

32 Latency analysis of PMSI PAGE 32 WCL of memory request = L acc + WCL arb ( N)+ WCL intracoh ( N) + WCL intercoh ( N 2 )

33 Results Cycle accurate Gem5 simulator 2, 4, 8 cores, 16KB direct mapped private L1-D/I caches In-order cores, 2GHz operating frequency TDM bus arbitration with slot width = 50 cycles Benchmarks: SPLASH-2, synthetic benchmarks to stress worst-case scenarios Simulation framework available at PAGE 33

34 Results: Observed worst case latency per request PMSI Unpredictable 2 Unpredictable 3 Unpredictable 4 PAGE 34

35 Results: Observed worst case latency per request Latency bound PMSI Unpredictable 2 Unpredictable 3 Unpredictable 4 PAGE 35

36 Results: Observed worst case latency per request Latency bound PMSI Unpredictable 2 Unpredictable 3 Unpredictable 4 PAGE 36

37 Results: Slowdown relative to MESI cache coherence Uncache all Single core Uncache shared PMSI MSI MESI PAGE 37

38 Results: Slowdown relative to MESI cache coherence Uncache all Single core Uncache shared PMSI MSI MESI PAGE 38

39 Conclusion Enumerate sources of unpredictability when applying conventional hardware cache coherence on multi-core real-time systems Propose PMSI cache coherence that allows predictable simultaneous shared data accesses with no changes to application/os Hardware overhead of PMSI is ~ 128 bytes WCL of memory access with PMSI square of number of cores ( N 2 ) PMSI outperforms prior techniques for handling shared data accesses (upto 4x improvement over uncache-shared), and suffers on average 46% slowdown compared to conventional MESI cache coherence protocol PAGE 39

40 THANK YOU & QUESTIONS? PAGE 40

Presented by Anh Nguyen

Presented by Anh Nguyen Presented by Anh Nguyen u When a cache has a block in state M or E and receives a GetS from another core, if using the MSI protocol or the MESI protocol, the cache must u change the block state from M

More information

WMC. MPSoCs for Mixed-Criticality Systems: Challenges and Opportunities. Mohamed Hassan

WMC. MPSoCs for Mixed-Criticality Systems: Challenges and Opportunities. Mohamed Hassan WMC MPSoCs for Mixed-Criticality Systems: Challenges and Opportunities Mohamed Hassan 1 IBM s Acorn Smart Phones Automotive 1943 1989 2010s 1981 2000s Now-Near Colossus NEC s UltaLite Wearables IoT/Smart

More information

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico

More information

Case Study 1: Single-Chip Multicore Multiprocessor

Case Study 1: Single-Chip Multicore Multiprocessor 44 Solutions to Case Studies and Exercises Chapter 5 Solutions Case Study 1: Single-Chip Multicore Multiprocessor 5.1 a. P0: read 120 P0.B0: (S, 120, 0020) returns 0020 b. P0: write 120 80 P0.B0: (M, 120,

More information

Deterministic Memory Abstraction and Supporting Multicore System Architecture

Deterministic Memory Abstraction and Supporting Multicore System Architecture Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi $, Prathap Kumar Valsan^, Renato Mancuso *, Heechul Yun $ $ University of Kansas, ^ Intel, * Boston University

More information

Managing Memory for Timing Predictability. Rodolfo Pellizzoni

Managing Memory for Timing Predictability. Rodolfo Pellizzoni Managing Memory for Timing Predictability Rodolfo Pellizzoni Thanks This work would not have been possible without the following students and collaborators Zheng Pei Wu*, Yogen Krish Heechul Yun* Renato

More information

Variability Windows for Predictable DDR Controllers, A Technical Report

Variability Windows for Predictable DDR Controllers, A Technical Report Variability Windows for Predictable DDR Controllers, A Technical Report MOHAMED HASSAN 1 INTRODUCTION In this technical report, we detail the derivation of the variability window for the eight predictable

More information

Shared Cache Aware Task Mapping for WCRT Minimization

Shared Cache Aware Task Mapping for WCRT Minimization Shared Cache Aware Task Mapping for WCRT Minimization Huping Ding & Tulika Mitra School of Computing, National University of Singapore Yun Liang Center for Energy-efficient Computing and Applications,

More information

Real-Time Cache Management for Multi-Core Virtualization

Real-Time Cache Management for Multi-Core Virtualization Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation

More information

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26 Multi-equestor Systems CPU CPU CPU

More information

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems

More information

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,

More information

Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. Dana Vantrease, Mikko Lipasti, Nathan Binkert

Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. Dana Vantrease, Mikko Lipasti, Nathan Binkert Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols Dana Vantrease, Mikko Lipasti, Nathan Binkert 1 Executive Summary Problem: Cache coherence races make protocols complicated

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P

More information

Interconnect Routing

Interconnect Routing Interconnect Routing store-and-forward routing switch buffers entire message before passing it on latency = [(message length / bandwidth) + fixed overhead] * # hops wormhole routing pipeline message through

More information

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and

More information

Overview: Shared Memory Hardware

Overview: Shared Memory Hardware Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing

More information

ECE/CS 757: Homework 1

ECE/CS 757: Homework 1 ECE/CS 757: Homework 1 Cores and Multithreading 1. A CPU designer has to decide whether or not to add a new micoarchitecture enhancement to improve performance (ignoring power costs) of a block (coarse-grain)

More information

WCET Analysis of Parallel Benchmarks using On-Demand Coherent Cache

WCET Analysis of Parallel Benchmarks using On-Demand Coherent Cache WCET Analysis of Parallel Benchmarks using On-Demand Coherent Arthur Pyka, Lillian Tadros, Sascha Uhrig University of Dortmund Hugues Cassé, Haluk Ozaktas and Christine Rochange Université Paul Sabatier,

More information

Lecture 7: Implementing Cache Coherence. Topics: implementation details

Lecture 7: Implementing Cache Coherence. Topics: implementation details Lecture 7: Implementing Cache Coherence Topics: implementation details 1 Implementing Coherence Protocols Correctness and performance are not the only metrics Deadlock: a cycle of resource dependencies,

More information

Exploration of Cache Coherent CPU- FPGA Heterogeneous System

Exploration of Cache Coherent CPU- FPGA Heterogeneous System Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) 1 MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) Chapter 5 Appendix F Appendix I OUTLINE Introduction (5.1) Multiprocessor Architecture Challenges in Parallel Processing Centralized Shared Memory

More information

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware. Department of Computer Science, Institute for System Architecture, Operating Systems Group Real-Time Systems '08 / '09 Hardware Marcus Völp Outlook Hardware is Source of Unpredictability Caches Pipeline

More information

A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs. Marco Bekooij & Frank Ophelders

A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs. Marco Bekooij & Frank Ophelders A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs Marco Bekooij & Frank Ophelders Outline Context What is cache coherence Addressed challenge Short overview of related work Related

More information

Optimizing Replication, Communication, and Capacity Allocation in CMPs

Optimizing Replication, Communication, and Capacity Allocation in CMPs Optimizing Replication, Communication, and Capacity Allocation in CMPs Zeshan Chishti, Michael D Powell, and T. N. Vijaykumar School of ECE Purdue University Motivation CMP becoming increasingly important

More information

Lecture 25: Multiprocessors

Lecture 25: Multiprocessors Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed

More information

Memory Hierarchy in a Multiprocessor

Memory Hierarchy in a Multiprocessor EEC 581 Computer Architecture Multiprocessor and Coherence Department of Electrical Engineering and Computer Science Cleveland State University Hierarchy in a Multiprocessor Shared cache Fully-connected

More information

SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core

SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core Sebastian Hahn and Jan Reineke RTSS, Nashville December, 2018 saarland university computer science SIC: Provably Timing-Predictable

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Scalable Cache Coherence

Scalable Cache Coherence Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

-- the Timing Problem & Possible Solutions

-- the Timing Problem & Possible Solutions ARTIST Summer School in Europe 2010 Autrans (near Grenoble), France September 5-10, 2010 Towards Real-Time Applications on Multicore -- the Timing Problem & Possible Solutions Wang Yi Uppsala University,

More information

Advanced Parallel Programming I

Advanced Parallel Programming I Advanced Parallel Programming I Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Levels of Parallelism RISC Software GmbH Johannes Kepler University

More information

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,

More information

Author's personal copy

Author's personal copy J. Parallel Distrib. Comput. 68 (2008) 1413 1424 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Two proposals for the inclusion of

More information

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies

More information

Introducing Multi-core Computing / Hyperthreading

Introducing Multi-core Computing / Hyperthreading Introducing Multi-core Computing / Hyperthreading Clock Frequency with Time 3/9/2017 2 Why multi-core/hyperthreading? Difficult to make single-core clock frequencies even higher Deeply pipelined circuits:

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an

More information

Improving Virtual Machine Scheduling in NUMA Multicore Systems

Improving Virtual Machine Scheduling in NUMA Multicore Systems Improving Virtual Machine Scheduling in NUMA Multicore Systems Jia Rao, Xiaobo Zhou University of Colorado, Colorado Springs Kun Wang, Cheng-Zhong Xu Wayne State University http://cs.uccs.edu/~jrao/ Multicore

More information

Performance study example ( 5.3) Performance study example

Performance study example ( 5.3) Performance study example erformance study example ( 5.3) Coherence misses: - True sharing misses - Write to a shared block - ead an invalid block - False sharing misses - ead an unmodified word in an invalidated block CI for commercial

More information

Modern CPU Architectures

Modern CPU Architectures Modern CPU Architectures Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2014 16.04.2014 1 Motivation for Parallelism I CPU History RISC Software GmbH Johannes

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Problem Points 1 /20 2 /20 3 /20 4 /20 5 /20 Total /100

Problem Points 1 /20 2 /20 3 /20 4 /20 5 /20 Total /100 EE382: Processor Design Final Examination March 20, 1998 Please do not open the exam book or begin work on the exam until instructed to do so. You have a total of 3 hours to complete this exam. You will

More information

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5)

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5) CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived

More information

A Comparison of Capacity Management Schemes for Shared CMP Caches

A Comparison of Capacity Management Schemes for Shared CMP Caches A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip

More information

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017 CS433 Homework 6 Assigned on 11/28/2017 Due in class on 12/12/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017 CS433 Homework 6 Assigned on 11/28/2017 Due in class on 12/12/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

12 Cache-Organization 1

12 Cache-Organization 1 12 Cache-Organization 1 Caches Memory, 64M, 500 cycles L1 cache 64K, 1 cycles 1-5% misses L2 cache 4M, 10 cycles 10-20% misses L3 cache 16M, 20 cycles Memory, 256MB, 500 cycles 2 Improving Miss Penalty

More information

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Worst Case Analysis of DRAM Latency in Hard Real Time Systems

Worst Case Analysis of DRAM Latency in Hard Real Time Systems Worst Case Analysis of DRAM Latency in Hard Real Time Systems by Zheng Pei Wu A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Foundations of Computer Systems

Foundations of Computer Systems 18-600 Foundations of Computer Systems Lecture 21: Multicore Cache Coherence John P. Shen & Zhiyi Yu November 14, 2016 Prevalence of multicore processors: 2006: 75% for desktops, 85% for servers 2007:

More information

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations 1 Design Issues, Optimizations When does memory get updated? demotion from modified to shared? move from modified in

More information

Improving Real-Time Performance on Multicore Platforms Using MemGuard

Improving Real-Time Performance on Multicore Platforms Using MemGuard Improving Real-Time Performance on Multicore Platforms Using MemGuard Heechul Yun University of Kansas 2335 Irving hill Rd, Lawrence, KS heechul@ittc.ku.edu Abstract In this paper, we present a case-study

More information

CANDY: Enabling Coherent DRAM Caches for Multi-Node Systems

CANDY: Enabling Coherent DRAM Caches for Multi-Node Systems CANDY: Enabling Coherent DRAM Caches for Multi-Node Systems Chiachen Chou Aamer Jaleel Moinuddin K. Qureshi School of Electrical and Computer Engineering Georgia Institute of Technology {cc.chou, moin}@ece.gatech.edu

More information

Speculative Versioning Cache: Unifying Speculation and Coherence

Speculative Versioning Cache: Unifying Speculation and Coherence Speculative Versioning Cache: Unifying Speculation and Coherence Sridhar Gopal T.N. Vijaykumar, Jim Smith, Guri Sohi Multiscalar Project Computer Sciences Department University of Wisconsin, Madison Electrical

More information

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power

More information

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Active thread Idle thread

Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Active thread Idle thread Intra-Warp Compaction Techniques Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Goal Active thread Idle thread Compaction Compact threads in a warp to coalesce (and eliminate)

More information

Multiprocessor Cache Coherency. What is Cache Coherence?

Multiprocessor Cache Coherency. What is Cache Coherence? Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by

More information

CSC526: Parallel Processing Fall 2016

CSC526: Parallel Processing Fall 2016 CSC526: Parallel Processing Fall 2016 WEEK 5: Caches in Multiprocessor Systems * Addressing * Cache Performance * Writing Policy * Cache Coherence (CC) Problem * Snoopy Bus Protocols PART 1: HARDWARE Dr.

More information

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014 Cache Coherence Introduction to High Performance Computing Systems (CS1645) Esteban Meneses Spring, 2014 Supercomputer Galore Starting around 1983, the number of companies building supercomputers exploded:

More information

CPE 631 Advanced Computer Systems Architecture: Homework #3,#4

CPE 631 Advanced Computer Systems Architecture: Homework #3,#4 Issued: 04/09/2007 Due: 04/18/2007 CPE 631 Advanced Computer Systems Architecture: Homework #3,#4 Name: Q1 Q2 Q3 Q4 Q5 Total 20 20 20 60 40 160 Question #1: (20 points) 1.1 (20 points) Consider the following

More information

Caches in Real-Time Systems. Instruction Cache vs. Data Cache

Caches in Real-Time Systems. Instruction Cache vs. Data Cache Caches in Real-Time Systems [Xavier Vera, Bjorn Lisper, Jingling Xue, Data Caches in Multitasking Hard Real- Time Systems, RTSS 2003.] Schedulability Analysis WCET Simple Platforms WCMP (memory performance)

More information

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols

More information

Processor Architecture

Processor Architecture Processor Architecture Shared Memory Multiprocessors M. Schölzel The Coherence Problem s may contain local copies of the same memory address without proper coordination they work independently on their

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256

Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256 Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256 Jan Reineke @ saarland university computer science ACACES Summer School 2017

More information

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks : Defending Against Cache-Based Side Channel Attacks Mengjia Yan, Bhargava Gopireddy, Thomas Shull, Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Presented by Mengjia

More information

Lecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Lecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012) Lecture 11: Snooping Cache Coherence: Part II CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Assignment 2 due tonight 11:59 PM - Recall 3-late day policy Assignment

More information

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved

More information

Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view

Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view 1 Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view Pierre Michaud INRIA HiPEAC 11, January 26, 2011 2 Outline Self-performance contract Proposition for

More information

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy

More information

Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems

Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems Zimeng Zhou, Lei Ju, Zhiping Jia, Xin Li School of Computer Science and Technology Shandong University, China Outline

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

Alternate IPC Mechanisms

Alternate IPC Mechanisms Alternate IPC Mechanisms A Comparison of Their Use Within An ORB Framework Chuck Abbott Objective Interface Systems Overview Rationale Goals Introduction Analysis Conclusions 2 2 1 Rationale All IPC Mechanisms

More information

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM Low-Cost Inter-Linked ubarrays (LIA) Enabling Fast Inter-ubarray Data Movement in DRAM Kevin Chang rashant Nair, Donghyuk Lee, augata Ghose, Moinuddin Qureshi, and Onur Mutlu roblem: Inefficient Bulk Data

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

A Server-based Approach for Predictable GPU Access Control

A Server-based Approach for Predictable GPU Access Control A Server-based Approach for Predictable GPU Access Control Hyoseung Kim * Pratyush Patel Shige Wang Raj Rajkumar * University of California, Riverside Carnegie Mellon University General Motors R&D Benefits

More information

COMP Parallel Computing. CC-NUMA (1) CC-NUMA implementation

COMP Parallel Computing. CC-NUMA (1) CC-NUMA implementation COP 633 - Parallel Computing Lecture 10 September 27, 2018 CC-NUA (1) CC-NUA implementation Reading for next time emory consistency models tutorial (sections 1-6, pp 1-17) COP 633 - Prins CC-NUA (1) Topics

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Memory Access Pattern-Aware DRAM Performance Model for Multi-core Systems

Memory Access Pattern-Aware DRAM Performance Model for Multi-core Systems Memory Access Pattern-Aware DRAM Performance Model for Multi-core Systems ISPASS 2011 Hyojin Choi *, Jongbok Lee +, and Wonyong Sung * hjchoi@dsp.snu.ac.kr, jblee@hansung.ac.kr, wysung@snu.ac.kr * Seoul

More information

EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES

EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES MICRO 2011 @ Porte Alegre, Brazil Gabriel H. Loh [1] and Mark D. Hill [2][1] December 2011 [1] AMD Research [2] University

More information

Special Course on Computer Architecture

Special Course on Computer Architecture Special Course on Computer Architecture #9 Simulation of Multi-Processors Hiroki Matsutani and Hideharu Amano Outline: Simulation of Multi-Processors Background [10min] Recent multi-core and many-core

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 11: Reducing Hit Time Cache Coherence Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering and Computer Science Source: Lecture based

More information

Caches in Real-Time Systems. Instruction Cache vs. Data Cache

Caches in Real-Time Systems. Instruction Cache vs. Data Cache Caches in Real-Time Systems [Xavier Vera, Bjorn Lisper, Jingling Xue, Data Caches in Multitasking Hard Real- Time Systems, RTSS 2003.] Schedulability Analysis WCET Simple Platforms WCMP (memory performance)

More information

Flynn s Classification

Flynn s Classification Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:

More information

Single Chip Heterogeneous Multiprocessor Design

Single Chip Heterogeneous Multiprocessor Design Single Chip Heterogeneous Multiprocessor Design JoAnn M. Paul July 7, 2004 Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213 The Cell Phone, Circa 2010 Cell

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

Dynamic Flow Regulation for IP Integration on Network-on-Chip

Dynamic Flow Regulation for IP Integration on Network-on-Chip Dynamic Flow Regulation for IP Integration on Network-on-Chip Zhonghai Lu and Yi Wang Dept. of Electronic Systems KTH Royal Institute of Technology Stockholm, Sweden Agenda The IP integration problem Why

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation

More information

Lecture 24: Virtual Memory, Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large

More information