Verification of Cache Coherency Formal Test Generation

Size: px
Start display at page:

Download "Verification of Cache Coherency Formal Test Generation"

Transcription

1 Dr. Monica Farkash NXP Semiconductors, Inc. EE 382M-11, Department of Electrical and Computer Engineering The University of Texas at Austin 1 Cache Coherency Caches and their coherency Challenge Verification of Cache Coherency Formal Test Generation 2 ECE department, University of Texas at Austin 1

2 Data storage to service future requests for that data Hardware implements cache as a block of memory for temporary storage of data likely to be used again CPUs and hard drives frequently use a cache, as do web browsers and web servers For our purpose we will assume a cache is a hardware block 3 Cache Coherence (coherency) refers to the consistency of data stored in local caches of a shared resource Clients memory contents is maintained in caches Same memory locations stored in multiple local caches Coherency plans give a way to maintain data correctness between accesses of these shared locations Cache coherence is intended to manage such conflicts and maintain consistency between cache and memory 4 ECE department, University of Texas at Austin 2

3 Various models and protocols have been devised for maintaining cache coherence Choice of the consistency model is crucial to designing a cache coherent system Coherence models differ in performance and scalability; each must be evaluated for every system design Protocol Examples MSI protocol MESI MOSI MOESI MERSI MESIF write-once Synapse Berkeley Firefly Dragon protocol 5 6 ECE department, University of Texas at Austin 3

4 The letters of the protocol name identify the possible states in which a cache line can be For MSI, each block contained inside a cache can have one of three possible states M: The block has been modified in the cache The data in the cache is then inconsistent with storage (e.g memory) A cache with a block in the M state needs to write the block to the storage when that block is evicted S: This block is unmodified and exists in at least one cache The cache can evict the data without writing it to the backing store I: This block is invalid The block must be fetched from memory or another cache if the block is to be stored in this cache 7 Cache Another Cache M: The block has been modified in the cache The data in the cache is then inconsistent with storage (e.g memory) A cache with a block in the M state needs to write the block to the storage when that block is evicted Invalid Event Modified S: This block is unmodified and exists in at least one cache The cache can evict the data without writing it to the backing store Shared I: This block is invalid The block must be fetched from memory or another cache if the block is to be stored in this cache Protocol Representation State Meaning 8 ECE department, University of Texas at Austin 4

5 Cache Another Cache If the block is in the M or S states, the cache supplies the data If the block is not in the cache (in the I state), it must verify that the line is not in the M state in any other cache. Different caching architectures handle this differently: bus architectures often perform snooping, where the read request is broadcast to all of the caches Other architectures include cache directories which have agents (a.k.a. directories) that know which caches last had copies of a particular cache block If another cache has the block in the M state, then that cache must write back the data to the backing store and go to the S or I states Once any M line is written back, the cache obtains the block from either the storage, or another cache with the data in the S state. The cache can then supply the data to the requestor, Protocol Representation Behavior Explained: READ REQUEST 9 Cache Another Cache Protocol Representation When a write request arrives at a cache for a block for a block in the M state, the cache writes and thus modifies the data locally block is in S:the cache must notify all other caches that might contain the block in the S state to evict the block. Then the data may be locally modified The notification may be via bus snooping or a directory block is in I: the cache must notify any other caches that might contain the block in the S or M states that they must evict the block If the block is in another cache in the M state, that cache must either write the data to the storage or supply it to the requesting cache If at this point the cache does not yet have the block locally, the block is read from the storage before being modified in the cache After the data is modified, the cache block is in the "M" state Behavior Explained: WRITE REQUEST 10 ECE department, University of Texas at Austin 5

6 A state machine for the Read and Write Operations If a pair of cache blocks are found in one of the X then it is a bug Usage: Formal Methods: Can I ever get into one of the X states? Checkers (Assertions): While running tests do I ever get into one of the X combinations ECE department, University of Texas at Austin 6

7 13 14 ECE department, University of Texas at Austin 7

8 Cache Coherency ONE ADDRESS!!! Data in the system is coherent Everybody will see the latest version for the data for that address in the memory Memory Consistency TWO ADDRESSES!!! The order in which changes are seen by different cores Different CPUs can see different order in which changes happen 15 Sequential Mem consistency Model Initial condition: A=0 B=0 (1) A = 1 (2) print (B) (3) B = 1 (4) print (A) Potential interleaving of operations: (1) (2) (3) (4) => is printed 01 (3) (4) (1) (2) => is printed 01 (1) (3) (2) (4) => is printed 11 (3) (1) (2) (4) => is printed 11. NEVER ECE department, University of Texas at Austin 8

9 Initial condition: A=0 B=0 (1) A = 1 (2) print (B) (3) B = 1 (4) print (A) There is no reason to wait with the print until the write A=1 reaches the memory => you can see 00 Weak order ARM - Collier tests Collier tests (ARM) 17 Formal Methods Protocol Formal Validation Proprietary protocols Develop a model of the protocol! High level Deadlock /Livelock Protocol Implementation Validation The implemented version extract the model Run rules Equivalence checking protocol <-> implementation 18 ECE department, University of Texas at Austin 9

10 Testing Generate tests to cover cache dedicated scenarios Run tests (simulation/emulation) Check correctness Metric coverage 19 Concurrency One test for each client act independently Strict ordering of (some) events to generate desired scenarios Generation offline: need to control the order of events during run-time Cache Another Cache 20 ECE department, University of Texas at Austin 10

11 Take advantage of known state to generate interesting scenarios TCG Multi-core => stop other activity DUT Delays testbench 21 Runtime controls DUT Control over what is being run testbench Complex scenarios (proactive not reactive) Coverage of scenarios small suite Accountability Runtime controls to partially order events 22 ECE department, University of Texas at Austin 11

12 Test contents Random generation efficiency issue Scenario = sequence of events cache dedicated scenario = cover the protocol PSWG (maybe) future language Client 1 Client 2 23 Local Knowledge Low Level SOC COMPLEXITY OF THE MODEL USED IN TEST/STIMULI GENERATION Specification + Executable model ge n g g e e g n n e n gen ge n gen gen GRAPH- BASED /Obsidian /GenesysPro One single model Contains ALL relevant info - Choices/Decisions - Test Contents/Consequences /RAPTOR ECE department, University of Texas at Austin 12

13 Testbench valid cond GRAPH BASED Executable + Spec model COMPLEXITY OF THE MODEL USED IN TEST/STIMULI GENERATION ge n g e n g eg ne n gen TREK /RAPTOR gen ge n gen Perspec System Verifier /Obsidian Questa infact /GenesysPro Different methods of generation Graph based (Cadence, Mentor Graphics, Breker) Path through the graph = test decisions Adapting the graph generation to protocol needs Need long tests Many passages 26 ECE department, University of Texas at Austin 13

14 TCG - Depth 27 A company might have several types of TCG Depends on the level (unit-subsystem-system) Eg. major types TCG on the same product All are constraint solvers at some level Behavioral model based (C++ model) Obsidian (ARM)/ GenesysPro (IBM) / Raptor (Freescale) - requires specialization CPU processors Very complex all instruction sets specific Graph Based Can be extended to cover CPU Easier to use Reactive (runtime) testbench 28 ECE department, University of Texas at Austin 14

15 Testing Generate tests to cover cache dedicated scenarios Challenges Concurrency Scenarios Depth Tool types (graph) - companies Run tests (simulation/emulation) Check correctness Metric coverage 29 TCG <-> Checking needs TCG self checking tests Runtime controls TCG online can immediately validate result Random contents (?) =>Test independent Checking Test independent Checking Online (runtime) runs with the HW & test Passive observer reads values while HW executes Decides if what it sees is correct What means sees? What means correct? Different names/implementation Everybody has it in one shape or form!!! 30 ECE department, University of Texas at Austin 15

16 A Trace-Driven Validation Methodology for Multi-Processor SoCs Jay Bhadra a.o test Pseudo-random sequences User preferences Reference MP System run beh. model MP model (RTL/gates/C model /full-chip/block level) Comparison and analysis pass, fail, reason run implementation Problem Cannot fully check pseudo-random MP programs with data races > static vs. dynamic static scenario core0 #1: a = 11; #2: b = 33; core0 #1: a = 11; #2: b = 33; core1 #3: c = 22; #4: d = 44; dynamic scenario core1 #3: a = 22; #4: b = 44; 32 ECE department, University of Texas at Austin 16

17 tdvm uses abstraction Takes a global view of the MP SoC Specification employs MP abstraction > Abstract C++ reference model not cycle-accurate Implementation TM Slide 6 employs abstraction via traces > MP models are simulated and abstract traces are obtained > Uses abstract information on coherency events CPU instructions, bus transactions Performs assume-guarantee > Aims at detecting MP related system-level bugs 33 stimuli simulation environment MP SoC model (RTL) Tracers Abstract simulation traces (impl) Global order calculator Abstract MP SoC in C++ (spec) Expected res. generator Checker tdvm environment FAIL PASS Abstract error trace 34 ECE department, University of Texas at Austin 17

18 core0 core1 #1: a=11; #3: a = 22; #2 b=33; #4: b = 44; simulation environment stimuli MP SoC model (RTL) Tracers Abstract simulation traces (impl) Global order calculator Abstract MP SoC in C++ (spec) Expected res. generator Checker tdvm environment FAIL PASS Abstract error trace 35 core0 core1 #1: a=11; #3: a = 22; #2 b=33; #4: b = 44; stimuli simulation environment MP SoC model (RTL) Tracers Abstract simulation traces (impl) a = 22, b = 44 Global order calculator Abstract MP SoC in C++ (spec) Expected res. generator Checker tdvm environment FAIL PASS Abstract error trace 36 ECE department, University of Texas at Austin 18

19 core0 core1 #1: a=11; #3: a = 22; #2 b=33; #4: b = 44; stimuli simulation environment MP SoC model (RTL) Tracers Abstract simulation traces (impl) a = 22, b = 44 order = 1; 3; 4; 2 Global order calculator Abstract MP SoC in C++ (spec) Expected res. generator Checker tdvm environment FAIL PASS Abstract error trace 37 core0 core1 #1: a=11; #3: a = 22; #2 b=33; #4: b = 44; stimuli simulation environment MP SoC model (RTL) Tracers Abstract simulation traces (impl) a = 22, b = 44 order = 1; 3; 4; 2 Global order calculator Abstract MP SoC in C++ (spec) Expected res. generator Checker tdvm environment a = 22, b = 33 FAIL PASS Abstract error trace 38 ECE department, University of Texas at Austin 19

20 LSU L1$ CPU tracer tdvm Abs sim CPU tracer LSU L1$ L2$/BIU traces L2$/BIU core0 Bus core1 tracer System memory bus Coherency bus MP SoC platform manager 39 stimuli simulation environment MP SoC model (RTL) Tracers Abstract simulation traces (impl) Global order calculator Abstract MP SoC in C++ (spec) Expected res. generator Checker tdvm environment FAIL PASS Abstract error trace 40 ECE department, University of Texas at Austin 20

21 Instruction time stamps issue time (it), internal perform time (ipt), perform time (pt), completion time (ct) LOAD Instruction: {issued} it = t1 {l1 hit} pt = t2 {l1 miss l2 hit} pt = t3 {l2 miss bus req issued} {trans visible} pt = t5 {trans done} {data access} {instr complete} ct = t8 t1 t2 t3 t4 t5 t6 t7 t8 41 Instruction time stamps issue time (it), internal perform time (ipt), perform time (pt), completion time (ct) STORE Instruction: {issued} it = t1 {allocate store Q} ipt = t2 {instr complete} ct = t3 {l1 hit} pt = t4 {l1 miss l2 hit} pt = t5 {bus {trans req visible} issued} pt = t7 {bus trans done} {data updated} t1 t2 t3 t4 t5 t6 t7 t8 t9 42 ECE department, University of Texas at Austin 21

22 Bus transaction time stamps bus issue time (bit), bus presentation time (prt), bus perform time (bpt), bus completion time (bct) BUS Transaction: {issued} bit = t1 {trans presented to cores devices} prt = t2 {response obtained} {trans visible} bpt = t4 {data obtained or no data} bct = t5 t1 t2 t3 t4 t5 The global order of events can be computed by sorting instruction and bus transaction time-stamps 43 stimuli simulation environment MP SoC model (RTL) Tracers Abstract simulation traces (impl) Global order calculator Abstract MP SoC in C++ (spec) Expected res. generator Checker tdvm environment FAIL PASS Abstract error trace C++ spec ECE department - UT at Austin 44 ECE department, University of Texas at Austin 22

23 plat1 plat2 plat3 Platform (mp-soc) C++ spec Represents inheritance ECE department - UT at Austin 45 device coherency processor p1 bus p2 bus c1 c2 c3 C++ spec Represents inheritance ECE department - UT at Austin 46 ECE department, University of Texas at Austin 23

24 device coherency processor p1 bus p2 bus c1 c2 {1} {2+} c3 plat1 plat2 plat3 Represents inheritance platform C++ spec Represents composition ECE department - UT at Austin 47 device coherency processor p1 bus p2 bus c1 c2 {1} {2+} c3 plat1 plat2 plat3 Represents inheritance platform C++ spec Represents composition ECE department - UT at Austin 48 ECE department, University of Texas at Austin 24

25 device coherency processor p1 bus p2 bus c1 c2 {1} {1+} c3 {1+} plat1 plat2 plat3 Represents inheritance platform C++ spec Represents composition ECE department - UT at Austin 49 coherency event bus op instruction p1 bus op p2 bus op presentation PowerPC core other core read flush pre_ld syn clean write load add originator target snooper C++ spec ECE department - UT at Austin 50 ECE department, University of Texas at Austin 25

26 Coherency checks static or dynamic validation checks for both true or false sharing platforms Barrier order check i1 ; i2; b ; i3 ; i4 Collision order check Producer/consumer order check Completion order check checks for in-order execution Matching transaction check matches load/store/touch instructions with bus transactions Mutex check checks reservation granules for each thread Synchronization check pairs up all sync instructions with sync bus transactions 51 Testing Generate tests to cover cache dedicated scenarios Challenges: concurrency, scenarios, depth Tool types (graph) - companies Run tests (simulation/emulation) Check correctness Challenges: Independent trace based checker internal model Metric/coverage 52 ECE department, University of Texas at Austin 26

27 Test Generator Coverage Accountability Coverage on the generation model (graph) Ex. Test x {Graph Transitions} Coverage per transition Overall suite coverage Run-time monitoring Run-time events coverage (Assertions) Implementation coverage Source code coverage RTL functional / toggle 53 Caches Protocols Protocol Validation Formal Testing Generation Run Check correctness Metric/coverage 54 ECE department, University of Texas at Austin 27

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Portland State University ECE 588/688. Cache Coherence Protocols

Portland State University ECE 588/688. Cache Coherence Protocols Portland State University ECE 588/688 Cache Coherence Protocols Copyright by Alaa Alameldeen 2018 Conditions for Cache Coherence Program Order. A read by processor P to location A that follows a write

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and

More information

Overview: Shared Memory Hardware

Overview: Shared Memory Hardware Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing

More information

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Cache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri

Cache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri Cache Coherence (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri mainakc@cse.iitk.ac.in 1 Setting Agenda Software: shared address space Hardware: shared memory multiprocessors Cache

More information

Lecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Lecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012) Lecture 11: Snooping Cache Coherence: Part II CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Assignment 2 due tonight 11:59 PM - Recall 3-late day policy Assignment

More information

Processor Architecture

Processor Architecture Processor Architecture Shared Memory Multiprocessors M. Schölzel The Coherence Problem s may contain local copies of the same memory address without proper coordination they work independently on their

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually

More information

Lecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 11: Cache Coherence: Part II Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Bang Bang (My Baby Shot Me Down) Nancy Sinatra (Kill Bill Volume 1 Soundtrack) It

More information

Interconnect Routing

Interconnect Routing Interconnect Routing store-and-forward routing switch buffers entire message before passing it on latency = [(message length / bandwidth) + fixed overhead] * # hops wormhole routing pipeline message through

More information

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P

More information

Shared Memory Multiprocessors

Shared Memory Multiprocessors Parallel Computing Shared Memory Multiprocessors Hwansoo Han Cache Coherence Problem P 0 P 1 P 2 cache load r1 (100) load r1 (100) r1 =? r1 =? 4 cache 5 cache store b (100) 3 100: a 100: a 1 Memory 2 I/O

More information

Staffan Berg. European Applications Engineer Digital Functional Verification. September 2017

Staffan Berg. European Applications Engineer Digital Functional Verification. September 2017 Portable Stimulus Specification The Next Big Wave in Functional Verification Staffan Berg European Applications Engineer Digital Functional Verification September 2017 AGENDA Why Portable Stimulus? What

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

Suggested Readings! What makes a memory system coherent?! Lecture 27 Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality! 1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and

More information

Advanced OpenMP. Lecture 3: Cache Coherency

Advanced OpenMP. Lecture 3: Cache Coherency Advanced OpenMP Lecture 3: Cache Coherency Cache coherency Main difficulty in building multiprocessor systems is the cache coherency problem. The shared memory programming model assumes that a shared variable

More information

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location.

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location. Cache Coherence This lesson discusses the problems and solutions for coherence. Different coherence protocols are discussed, including: MSI, MOSI, MOESI, and Directory. Each has advantages and disadvantages

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM

More information

Computer Architecture

Computer Architecture 18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University

More information

Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. Dana Vantrease, Mikko Lipasti, Nathan Binkert

Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. Dana Vantrease, Mikko Lipasti, Nathan Binkert Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols Dana Vantrease, Mikko Lipasti, Nathan Binkert 1 Executive Summary Problem: Cache coherence races make protocols complicated

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Multicore Workshop. Cache Coherency. Mark Bull David Henty. EPCC, University of Edinburgh

Multicore Workshop. Cache Coherency. Mark Bull David Henty. EPCC, University of Edinburgh Multicore Workshop Cache Coherency Mark Bull David Henty EPCC, University of Edinburgh Symmetric MultiProcessing 2 Each processor in an SMP has equal access to all parts of memory same latency and bandwidth

More information

Snooping-Based Cache Coherence

Snooping-Based Cache Coherence Lecture 10: Snooping-Based Cache Coherence Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Tunes Elle King Ex s & Oh s (Love Stuff) Once word about my code profiling skills

More information

Parallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University

Parallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University 18-742 Parallel Computer Architecture Lecture 5: Cache Coherence Chris Craik (TA) Carnegie Mellon University Readings: Coherence Required for Review Papamarcos and Patel, A low-overhead coherence solution

More information

Module 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.

Module 10: Design of Shared Memory Multiprocessors Lecture 20: Performance of Coherence Protocols MOESI protocol. MOESI protocol Dragon protocol State transition Dragon example Design issues General issues Evaluating protocols Protocol optimizations Cache size Cache line size Impact on bus traffic Large cache line

More information

Shared memory. Caches, Cache coherence and Memory consistency models. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16

Shared memory. Caches, Cache coherence and Memory consistency models. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16 Shared memory Caches, Cache coherence and Memory consistency models Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Shared memory Caches, Cache

More information

Contents 1 Introduction 2 Functional Verification: Challenges and Solutions 3 SystemVerilog Paradigm 4 UVM (Universal Verification Methodology)

Contents 1 Introduction 2 Functional Verification: Challenges and Solutions 3 SystemVerilog Paradigm 4 UVM (Universal Verification Methodology) 1 Introduction............................................... 1 1.1 Functional Design Verification: Current State of Affair......... 2 1.2 Where Are the Bugs?.................................... 3 2 Functional

More information

Foundations of Computer Systems

Foundations of Computer Systems 18-600 Foundations of Computer Systems Lecture 21: Multicore Cache Coherence John P. Shen & Zhiyi Yu November 14, 2016 Prevalence of multicore processors: 2006: 75% for desktops, 85% for servers 2007:

More information

Tough Bugs Vs Smart Tools - L2/L3 Cache Verification using System Verilog, UVM and Verdi Transaction Debugging

Tough Bugs Vs Smart Tools - L2/L3 Cache Verification using System Verilog, UVM and Verdi Transaction Debugging 2016 17th International Workshop on Microprocessor and SOC Test and Verification Tough Bugs Vs Smart Tools - L2/L3 Cache Verification using System Verilog, UVM and Verdi Transaction Debugging Vibarajan

More information

PVCoherence. Zhang, Bringham, Erickson, & Sorin. Amlan Nayak & Jay Zhang 1

PVCoherence. Zhang, Bringham, Erickson, & Sorin. Amlan Nayak & Jay Zhang 1 PVCoherence Zhang, Bringham, Erickson, & Sorin Amlan Nayak & Jay Zhang 1 (16) OM Overview M Motivation Background Parametric Verification Design Guidelines PV-MOESI vs OP-MOESI Results Conclusion E MI

More information

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today

More information

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming CS758: Multicore Programming Today s Outline: Shared Memory Review Shared Memory & Concurrency Introduction to Shared Memory Thread-Level Parallelism Shared Memory Prof. David A. Wood University of Wisconsin-Madison

More information

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing

More information

Transactional Memory Subsystem Verification for an ARMv8 server class CPU

Transactional Memory Subsystem Verification for an ARMv8 server class CPU Transactional Memory Subsystem Verification for an ARMv8 server class CPU Ramdas M Parveez Ahamed Brijesh Reddy Jayanto Minocha Accellera Systems Initiative 1 Agenda Memory Sub System Verification Challenges

More information

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an

More information

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252

More information

Two Case Studies in Formal Deployment on ARM CPUs : Instruction-Fetch and Floating-Point datapath

Two Case Studies in Formal Deployment on ARM CPUs : Instruction-Fetch and Floating-Point datapath Two Case Studies in Formal Deployment on ARM CPUs : Instruction-Fetch and Floating-Point datapath DVClub May 15, 2018 Vaibhav Agrawal CPU Validation, Austin Instruction-Fetch unit Why formal on Instruction-Fetch

More information

Universal Verification Methodology (UVM) Module 5

Universal Verification Methodology (UVM) Module 5 Universal Verification Methodology (UVM) Module 5 Venky Kottapalli Prof. Michael Quinn Spring 2017 Agenda Assertions CPU Monitor System Bus Monitor (UVC) Scoreboard: Cache Reference Model Virtual Sequencer

More information

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes. Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some

More information

System Debugging and Verification : A New Challenge. Center for Embedded Computer Systems University of California, Irvine

System Debugging and Verification : A New Challenge. Center for Embedded Computer Systems   University of California, Irvine System Debugging and Verification : A New Challenge Daniel Gajski Samar Abdi Center for Embedded Computer Systems http://www.cecs.uci.edu University of California, Irvine Overview Simulation and debugging

More information

Test Scenarios and Coverage

Test Scenarios and Coverage Test Scenarios and Coverage Testing & Verification Dept. of Computer Science & Engg,, IIT Kharagpur Pallab Dasgupta Professor, Dept. of Computer Science & Engg., Professor-in in-charge, AVLSI Design Lab,

More information

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All

More information

Memory Hierarchy in a Multiprocessor

Memory Hierarchy in a Multiprocessor EEC 581 Computer Architecture Multiprocessor and Coherence Department of Electrical Engineering and Computer Science Cleveland State University Hierarchy in a Multiprocessor Shared cache Fully-connected

More information

Lecture-22 (Cache Coherence Protocols) CS422-Spring

Lecture-22 (Cache Coherence Protocols) CS422-Spring Lecture-22 (Cache Coherence Protocols) CS422-Spring 2018 Biswa@CSE-IITK Single Core Core 0 Private L1 Cache Bus (Packet Scheduling) Private L2 DRAM CS422: Spring 2018 Biswabandan Panda, CSE@IITK 2 Multicore

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 11: Reducing Hit Time Cache Coherence Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering and Computer Science Source: Lecture based

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 18 Scalable Parallel Caches Overview ost cache protocols are more complicated than two state Snooping not effective for network-based systems Consider three

More information

Verification of Power Management Protocols through Abstract Functional Modeling

Verification of Power Management Protocols through Abstract Functional Modeling Verification of Power Management Protocols through Abstract Functional Modeling G. Kamhi, T. Levy, Niranjan M, M. Mhameed, H. Rawlani, R. B. Rajput, E. Singerman, V. Vedula, Y. Zbar Motivation Microprocessor

More information

Shared Memory. SMP Architectures and Programming

Shared Memory. SMP Architectures and Programming Shared Memory SMP Architectures and Programming 1 Why work with shared memory parallel programming? Speed Ease of use CLUMPS Good starting point 2 Shared Memory Processes or threads share memory No explicit

More information

EE382 Processor Design. Processor Issues for MP

EE382 Processor Design. Processor Issues for MP EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I EE 382 Processor Design Winter 98/99 Michael Flynn 1 Processor Issues for MP Initialization Interrupts Virtual Memory TLB Coherency

More information

Productive Design of Extensible Cache Coherence Protocols!

Productive Design of Extensible Cache Coherence Protocols! P A R A L L E L C O M P U T I N G L A B O R A T O R Y Productive Design of Extensible Cache Coherence Protocols!! Henry Cook, Jonathan Bachrach,! Krste Asanovic! Par Lab Summer Retreat, Santa Cruz June

More information

Fall 2015 :: CSE 610 Parallel Computer Architectures. Cache Coherence. Nima Honarmand

Fall 2015 :: CSE 610 Parallel Computer Architectures. Cache Coherence. Nima Honarmand Cache Coherence Nima Honarmand Cache Coherence: Problem (Review) Problem arises when There are multiple physical copies of one logical location Multiple copies of each cache block (In a shared-mem system)

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core

More information

Lecture 20: Multi-Cache Designs. Spring 2018 Jason Tang

Lecture 20: Multi-Cache Designs. Spring 2018 Jason Tang Lecture 20: Multi-Cache Designs Spring 2018 Jason Tang 1 Topics Split caches Multi-level caches Multiprocessor caches 2 3 Cs of Memory Behaviors Classify all cache misses as: Compulsory Miss (also cold-start

More information

Cache Coherence in Bus-Based Shared Memory Multiprocessors

Cache Coherence in Bus-Based Shared Memory Multiprocessors Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition

More information

Thread- Level Parallelism. ECE 154B Dmitri Strukov

Thread- Level Parallelism. ECE 154B Dmitri Strukov Thread- Level Parallelism ECE 154B Dmitri Strukov Introduc?on Thread- Level parallelism Have mul?ple program counters and resources Uses MIMD model Targeted for?ghtly- coupled shared- memory mul?processors

More information

Shared Symmetric Memory Systems

Shared Symmetric Memory Systems Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014 Cache Coherence Introduction to High Performance Computing Systems (CS1645) Esteban Meneses Spring, 2014 Supercomputer Galore Starting around 1983, the number of companies building supercomputers exploded:

More information

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico

More information

VERIFICATION OF RISC-V PROCESSOR USING UVM TESTBENCH

VERIFICATION OF RISC-V PROCESSOR USING UVM TESTBENCH VERIFICATION OF RISC-V PROCESSOR USING UVM TESTBENCH Chevella Anilkumar 1, K Venkateswarlu 2 1.2 ECE Department, JNTU HYDERABAD(INDIA) ABSTRACT RISC-V (pronounced "risk-five") is a new, open, and completely

More information

Multiprocessor Synchronization

Multiprocessor Synchronization Multiprocessor Systems Memory Consistency In addition, read Doeppner, 5.1 and 5.2 (Much material in this section has been freely borrowed from Gernot Heiser at UNSW and from Kevin Elphinstone) MP Memory

More information

Lecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol

Lecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol Lecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol CSE 564 Computer Architecture Fall 2016 Department of Computer Science and Engineering Yonghong

More information

Génération de tests basés sur les modèles pour des systèmes sur puce avec cohérence de caches

Génération de tests basés sur les modèles pour des systèmes sur puce avec cohérence de caches Génération de tests basés sur les modèles pour des systèmes sur puce avec cohérence de s Massimo Zendri & Abderahman Kriouile STMicroelectronics DCG / IP dev / FVS Model based test generation for coherent

More information

TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model

TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model Sudheendra Hangal, Durgam Vahia, Chaiyasit Manovit, Joseph Lu and Sridhar Narayanan tsotool@sun.com ISCA-2004 Sun Microsystems

More information

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5)

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5) CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived

More information

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Aleksandar Milenkovich 1

Aleksandar Milenkovich 1 Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Protocols Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152

More information

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins 4 Chip Multiprocessors (I) Robert Mullins Overview Coherent memory systems Introduction to cache coherency protocols Advanced cache coherency protocols, memory systems and synchronization covered in the

More information

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,

More information

Three Things You Need to Know to Use the Accellera PSS

Three Things You Need to Know to Use the Accellera PSS Three Things You Need to Know to Use the Accellera PSS Sharon Rosenberg, Senior Solutions Architect, Cadence Three primary considerations for adopting the Accellera Portable Stimulus Standard (PSS) are

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18: Directory-Based Cache Protocols John Wawrzynek EECS, University of California at Berkeley http://inst.eecs.berkeley.edu/~cs152 Administrivia 2 Recap:

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville Lecture 18: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Parallel Computers Definition: A parallel computer is a collection

More information

Lecture 25: Multiprocessors

Lecture 25: Multiprocessors Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed

More information

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Software Driven Verification at SoC Level. Perspec System Verifier Overview Software Driven Verification at SoC Level Perspec System Verifier Overview June 2015 IP to SoC hardware/software integration and verification flows Cadence methodology and focus Applications (Basic to

More information

Verifying big.little using the Palladium XP. Deepak Venkatesan Murtaza Johar ARM India

Verifying big.little using the Palladium XP. Deepak Venkatesan Murtaza Johar ARM India Verifying big.little using the Palladium XP Deepak Venkatesan Murtaza Johar ARM India 1 Agenda PART 1 big.little overview What is big.little? ARM Functional verification methodology System Validation System

More information

CMSC 611: Advanced. Distributed & Shared Memory

CMSC 611: Advanced. Distributed & Shared Memory CMSC 611: Advanced Computer Architecture Distributed & Shared Memory Centralized Shared Memory MIMD Processors share a single centralized memory through a bus interconnect Feasible for small processor

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

Review: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology

Review: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology Review: Multiprocessor CPE 631 Session 21: Multiprocessors (Part 2) Department of Electrical and Computer Engineering University of Alabama in Huntsville Basic issues and terminology Communication: share

More information

PCS - Part Two: Multiprocessor Architectures

PCS - Part Two: Multiprocessor Architectures PCS - Part Two: Multiprocessor Architectures Institute of Computer Engineering University of Lübeck, Germany Baltic Summer School, Tartu 2008 Part 2 - Contents Multiprocessor Systems Symmetrical Multiprocessors

More information

CSC 631: High-Performance Computer Architecture

CSC 631: High-Performance Computer Architecture CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 10: Memory Part II CSC 631: High-Performance Computer Architecture 1 Two predictable properties of memory references: Temporal Locality:

More information

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem

More information

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications José éf. Martínez * and Josep Torrellas University of Illinois ASPLOS 2002 * Now at Cornell University Overview Allow

More information

Esterel Studio Update

Esterel Studio Update Esterel Studio Update Kim Sunesen Esterel EDA Technologies www.esterel-eda.com Synchron, November 2007, Bamberg Germany Agenda Update on Esterel Studio Architecture Diagrams Formal Verification IEEE standardization

More information

Caches. Parallel Systems. Caches - Finding blocks - Caches. Parallel Systems. Parallel Systems. Lecture 3 1. Lecture 3 2

Caches. Parallel Systems. Caches - Finding blocks - Caches. Parallel Systems. Parallel Systems. Lecture 3 1. Lecture 3 2 Parallel ystems Parallel ystems Parallel ystems Outline for lecture 3 s (a quick review) hared memory multiprocessors hierarchies coherence nooping protocols» nvalidation protocols (, )» Update protocol

More information

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O 6.823, L21--1 Cache Coherence Protocols: Implementation Issues on SMP s Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Coherence Issue in I/O 6.823, L21--2 Processor Processor

More information