A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs. Marco Bekooij & Frank Ophelders
|
|
- Regina Richard
- 5 years ago
- Views:
Transcription
1 A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs Marco Bekooij & Frank Ophelders Outline Context What is cache coherence Addressed challenge Short overview of related work Related issue: memory consistency Proposed software cache coherence protocol Performance evaluation results Concluding remarks 1
2 Multi-stream car-entertainment system Car-radio IC of NXP Accelerators Cordic Cordic FIR Peripherals IF IN 1x SRC Keyed AGC 1x Radio 8*fs In + out PCM I/f Audio ADC 4x Audio DAC 4x Ext SPDIF-in Host Host/ext IIS-in 2x IIS-out 2x IIS-out 1x Ext IIS-in 3x Digital In Out (DIO) Switch Tile 0 Tile 1 DSP EPICS AHB if Tile 2 DSP EPICS ITC Tile 3 DSP EPICS AHB if ITC DSP EPICS AHB if ITC AHB if ITC Inter Tile Communication (ITC) Multi-layer AHB bus (3 layer) Controller DMA ARM SPI CD Block Dec. ARM based subsystem AHB2VPB AHB2VPB AHB2VPB VPB VPB VPB Domain 2 Domain 1 Domain 0 Unsuitable for general purpose applications (e.g. Pthread) 2
3 Developed experimental embedded multiprocessor system Instruction Memory PE1 ARM926EJ-S PE1 ARM926EJ-S PE2 $ I D D I $ Instruction Memory PE2 $ $ Peripherals RS232 Display Touchscreen Audio in/out Video in/out Timers Æthereal network-on-chip TDM TDM RR SDRAM 256 MB Processors communicate through shared memory Processors have private caches Cache coherence problem! Shared Memory 1 8 MB Shared Memory 2 8 MB Virtex 4 Cache coherence problem X: 10 3 P1 P2 Read returns 3!!! $ $ X: 3 Shared memory X: 10 3 A cache coherency protocol ensures that eventually s become visible to all processors 3
4 Addressed challenge Define a cache coherence protocol that is suitable for real-time embedded systems with a NoC and with off-the-shelf processors Related work on cache coherency Hardware cache coherency protocols Snooping based protocols: Requires processors to observe all memory accesses Does not match well with a NoC: preferably point-2-point communication instead of broadcasting Directory based protocols Significant overhead as a result of accessing the directory Transactional memory Relies on speculation: suitable for real-time systems? > Remark: most embedded processors do not support a hardware cache coherency protocol Software cache coherency protocols Require a specific programming style: explicit coupling between each synchronization operation and data-structure it protects Prevent cache coherency issues: put shared data in uncached address range Low efficiency 4
5 Issues in sharing cache lines Cache operations often operate on lines P1 A B P2 A B A B A B A B Related issue: memory consistency A = 1 flag = 1 P1 P2 while ( flag!= 1 ); print A Memory accesses reordering by Memory system Processor Compiler We need a memory consistency model Defines constraints on the order in which memory operations become visible to other processors Enables programmers to reason about outcome 5
6 Sequential Memory Consistency P1 P2 P3 A=1 while (A!=1); B = 1 Network-on-chip while (B!=1); Print A All s must be seen in one single order by all processors ( atomicity) Likely to be inefficient in combination with a NoC P1 P2 P3 read lock read unlock 6
7 Proposed software cache coherence protocol Tuneable software cache coherence protocol Proposed software cache coherence protocol Minimal hardware requirements Suitable for heterogeneous MPSoCs with a NoC Off-the-shelf processors and caches are supported Should support cache maintenance operations (clean, invalidate) Sufficient for POSIX threads (Pthreads) explicit synchronization operations Tuneable Separate shared and private data Shared in -through and private in -back cache region Minimize unnecessary invalidations Putting shared data in a specific cache way Suitable for real-time systems Bounded protocol overhead, WCET is independent of accesses other processors 7
8 Release Consistency Ensuring sequential consistency efficiently is (too) costly apple support release consistency Acquire Guarantees reading most recent data from memory Release Makes s visible to other processors Cache coherence operations only required on acquire and release read acquire(s) read release(s) SWCC protocol in POSIX threads POSIX threads No two threads can access data at the same memory location simultaneously while at least one of the threads is modifying the location... Pthread_mutex_lock (acquire) Obtain lock Clean & invalidate Dcache Pthread_mutex_unlock (release) Clean Dcache Release lock Pthread_mutex_lock(S) (exclusive access to shared data) Pthread_mutex_unlock(S) 8
9 Tuning the protocol Place shared and private data in different address ranges Private data does not need to become visible to other processors Private data in -back region of the cache Shared data solve the sharing problem Shared data in -through region of the cache Execution time FFT Memory accesses FFT Experiments Embedded the software cache coherence operations in POSIX threads calls Clean and invalidate entire shared address range on each synchronization Entire cache Way with shared data Address range (MVA) Executed Splash2 applications ARM926EJ-S PE1 I D ARM926EJ-S PE2 D I Low latency 4 cycles / word Each processor gets equal budget TDM arbitration on memory port $ $ $ $ Instruction Memory PE1 TDM Instruction Memory PE2 Peripherals RS232 Display Touchscreen Audio in/out Video in/out Timers Shared Memory 16 MB 9
10 Cost of cache coherence operations Two cost types: cost of the cache maintenance operation cost of unnecessary invalidations Speedup Splash2 applications Speedup between 1.89 and
11 Increase of memory accesses Protocol does not increase number of memory accesses significantly Conclusion Presented a cache coherence protocol that is suitable for real-time systems with a NoC and with off-the-shelf processors Most important optimization is separation shared and private data Experimental results Speedup between 1.89 and 2.01 Higher synchronization/computation ratio (e.g. hardware floating point support) lower speed-up? Protocol does not significantly increase memory bandwidth requirements Suitable for real-time systems because software cache coherency protocol overhead is predictable 11
12 Questions? Backup slides 12
13 SWCC protocol in POSIX threads P1 Pthread_mutex_lock(S) P2 Pthread_mutex_lock (acquire) Obtain lock Clean and invalidate Pthread_mutex_unlock (release) Clean Release lock (exclusive access to shared data) Pthread_mutex_unlock(S) Pthread_mutex_lock(S) (exclusive access to shared data) P1 NoC P2 Pthread_mutex_unlock(S) Memory Existing cache coherence protocols Transactional Memory multiprocessor systems are based on speculation Suitable for real-time systems? Hardware protocols Snooping in a NoC Requires processors to observe all memory accesses Writes to one location are serialized P 1... P n $ $ Bus snoop Shared Memory Cache to Memory transaction 13
14 Existing cache coherence protocols Hardware protocols Directories in a NoC A directory is consulted on memory accesses Increase in memory access latency P 1... P n $ $ Interconnect Shared Memory Directory Hardware protocols require support from processors Supported by off-the-shelf processors? Existing cache coherence protocols Software protocols Explicitly coupling between synchronization and data structure Conditional invalidation [Tartalja, HICSS 1992] Shared regions [Sandhu, ACM SIGPLAN 1993] Enter critical region (D) 1) Check administration 2) Invalidate? Access D Exit critical region (D) Clean if -back Private data cached, shared data not cached In [Petrot, DSD 2006] 14
Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationLecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections
Lecture 18: Coherence and Synchronization Topics: directory-based coherence protocols, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationChapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST
Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationCloud Computing CS
Cloud Computing CS 15-319 Programming Models- Part I Lecture 4, Jan 25, 2012 Majd F. Sakr and Mohammad Hammoud Today Last 3 sessions Administrivia and Introduction to Cloud Computing Introduction to Cloud
More informationChapter 5 Thread-Level Parallelism. Abdullah Muzahid
Chapter 5 Thread-Level Parallelism Abdullah Muzahid 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors is saturating + Modern multiple issue processors are becoming very complex
More informationChapter 5. Thread-Level Parallelism
Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated
More informationLecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization
Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,
More informationAutomatic Parallelization of NLPs with Non-Affine Index- Expressions. Marco Bekooij (NXP research) Tjerk Bijlsma (University of Twente)
Automatic Parallelization of NLPs with Non-Affine Index- Expressions Marco Bekooij (NXP research) Tjerk Bijlsma (University of Twente) Outline Context: car-entertainment applications Mapping Flow Motivation
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P
More informationLecture 25: Multiprocessors
Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed
More informationCOMP Parallel Computing. CC-NUMA (1) CC-NUMA implementation
COP 633 - Parallel Computing Lecture 10 September 27, 2018 CC-NUA (1) CC-NUA implementation Reading for next time emory consistency models tutorial (sections 1-6, pp 1-17) COP 633 - Prins CC-NUA (1) Topics
More informationLecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the
More informationLimitations of parallel processing
Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors
More informationCMSC 611: Advanced. Distributed & Shared Memory
CMSC 611: Advanced Computer Architecture Distributed & Shared Memory Centralized Shared Memory MIMD Processors share a single centralized memory through a bus interconnect Feasible for small processor
More informationMULTIPROCESSORS AND THREAD LEVEL PARALLELISM
UNIT III MULTIPROCESSORS AND THREAD LEVEL PARALLELISM 1. Symmetric Shared Memory Architectures: The Symmetric Shared Memory Architecture consists of several processors with a single physical memory shared
More informationCS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017
CS433 Homework 6 Assigned on 11/28/2017 Due in class on 12/12/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationChapter-4 Multiprocessors and Thread-Level Parallelism
Chapter-4 Multiprocessors and Thread-Level Parallelism We have seen the renewed interest in developing multiprocessors in early 2000: - The slowdown in uniprocessor performance due to the diminishing returns
More informationArquitecturas y Modelos de. Multicore
Arquitecturas y Modelos de rogramacion para Multicore 17 Septiembre 2008 Castellón Eduard Ayguadé Alex Ramírez Opening statements * Some visionaries already predicted multicores 30 years ago And they have
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationGLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs
GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs Authors: Jos e L. Abell an, Juan Fern andez and Manuel E. Acacio Presenter: Guoliang Liu Outline Introduction Motivation Background
More informationEE382 Processor Design. Processor Issues for MP
EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I EE 382 Processor Design Winter 98/99 Michael Flynn 1 Processor Issues for MP Initialization Interrupts Virtual Memory TLB Coherency
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working
More informationMultiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism
Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,
More informationPortland State University ECE 588/688. Cray-1 and Cray T3E
Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationOverview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware
Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and
More informationOverview: Shared Memory Hardware
Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing
More informationOverview: Memory Consistency
Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering
More informationCMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today
More informationParallel Computer Architecture Spring Memory Consistency. Nikos Bellas
Parallel Computer Architecture Spring 2018 Memory Consistency Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture 1 Coherence vs Consistency
More informationMultiprocessors and Locking
Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationParallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?
Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing
More informationMultiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems
Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing
More informationThe Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006
The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content
More informationCS510 Advanced Topics in Concurrency. Jonathan Walpole
CS510 Advanced Topics in Concurrency Jonathan Walpole Threads Cannot Be Implemented as a Library Reasoning About Programs What are the valid outcomes for this program? Is it valid for both r1 and r2 to
More information«Real Time Embedded systems» Multi Masters Systems
«Real Time Embedded systems» Multi Masters Systems rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours rene.beuchat@hesge.ch LSN/hepia Prof. HES 1 Multi Master on Chip On a System On Chip, Master can
More informationCache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem
Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationCS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017
CS433 Homework 6 Assigned on 11/28/2017 Due in class on 12/12/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationMultiprocessors 1. Outline
Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis Interconnection Networks Massively processor networks (MPP) Thousands of nodes
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationARM Cortex core microcontrollers 3. Cortex-M0, M4, M7
ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 Scherer Balázs Budapest University of Technology and Economics Department of Measurement and Information Systems BME-MIT 2018 Trends of 32-bit microcontrollers
More informationToken Coherence. Milo M. K. Martin Dissertation Defense
Token Coherence Milo M. K. Martin Dissertation Defense Wisconsin Multifacet Project http://www.cs.wisc.edu/multifacet/ University of Wisconsin Madison (C) 2003 Milo Martin Overview Technology and software
More informationEITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor
EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationSpeculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution
Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding
More informationMultiprocessor Cache Coherency. What is Cache Coherence?
Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by
More informationPage 1. Cache Coherence
Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale
More informationIntroduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico
More informationShared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB
Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an
More informationIntro to Multiprocessors
The Big Picture: Where are We Now? Intro to Multiprocessors Output Output Datapath Input Input Datapath [dapted from Computer Organization and Design, Patterson & Hennessy, 2005] Multiprocessor multiple
More informationLecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections
Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System
More informationLecture 5: Directory Protocols. Topics: directory-based cache coherence implementations
Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations 1 Flat Memory-Based Directories Block size = 128 B Memory in each node = 1 GB Cache in each node = 1 MB For 64 nodes
More informationCS/COE1541: Intro. to Computer Architecture
CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationComputer Architecture
18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationCOSC4201 Multiprocessors
COSC4201 Multiprocessors Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Multiprocessing We are dedicating all of our future product development to multicore
More informationCache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri
Cache Coherence (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri mainakc@cse.iitk.ac.in 1 Setting Agenda Software: shared address space Hardware: shared memory multiprocessors Cache
More informationAleksandar Milenkovich 1
Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection
More informationShared Symmetric Memory Systems
Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationMultiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.
Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationSri Vidya College of Engineering and Technology. EC6703 Embedded and Real Time Systems Unit IV Page 1.
Sri Vidya College of Engineering and Technology ERTS Course Material EC6703 Embedded and Real Time Systems Page 1 Sri Vidya College of Engineering and Technology ERTS Course Material EC6703 Embedded and
More informationROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define
More informationECE/CS 757: Homework 1
ECE/CS 757: Homework 1 Cores and Multithreading 1. A CPU designer has to decide whether or not to add a new micoarchitecture enhancement to improve performance (ignoring power costs) of a block (coarse-grain)
More informationThreads Cannot Be Implemented As a Library
Threads Cannot Be Implemented As a Library Authored by Hans J. Boehm Presented by Sarah Sharp February 18, 2008 Outline POSIX Thread Library Operation Vocab Problems with pthreads POSIX Thread Library
More informationLecture 13. Shared memory: Architecture and programming
Lecture 13 Shared memory: Architecture and programming Announcements Special guest lecture on Parallel Programming Language Uniform Parallel C Thursday 11/2, 2:00 to 3:20 PM EBU3B 1202 See www.cse.ucsd.edu/classes/fa06/cse260/lectures/lec13
More informationAleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture 18: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Parallel Computers Definition: A parallel computer is a collection
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationLecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single
More informationThe Cache-Coherence Problem
The -Coherence Problem Lecture 12 (Chapter 6) 1 Outline Bus-based multiprocessors The cache-coherence problem Peterson s algorithm Coherence vs. consistency Shared vs. Distributed Memory What is the difference
More informationESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence
Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationComputer Architecture Area Fall 2009 PhD Qualifier Exam October 20 th 2008
Computer Architecture Area Fall 2009 PhD Qualifier Exam October 20 th 2008 This exam has nine (9) problems. You should submit your answers to six (6) of these nine problems. You should not submit answers
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationComputer Architecture
Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core
More informationLecture 26: Multiprocessors. Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing
Lecture 26: Multiprocessors Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationCS 5220: Shared memory programming. David Bindel
CS 5220: Shared memory programming David Bindel 2017-09-26 1 Message passing pain Common message passing pattern Logical global structure Local representation per processor Local data may have redundancy
More informationOutline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA
CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra
More informationCMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ
CMPE 511 TERM PAPER Distributed Shared Memory Architecture by Seda Demirağ 2005701688 1. INTRODUCTION: Despite the advances in processor design, users still demand more and more performance. Eventually,
More informationMassively Parallel Processor Breadboarding (MPPB)
Massively Parallel Processor Breadboarding (MPPB) 28 August 2012 Final Presentation TRP study 21986 Gerard Rauwerda CTO, Recore Systems Gerard.Rauwerda@RecoreSystems.com Recore Systems BV P.O. Box 77,
More informationWCET Analysis of Parallel Benchmarks using On-Demand Coherent Cache
WCET Analysis of Parallel Benchmarks using On-Demand Coherent Arthur Pyka, Lillian Tadros, Sascha Uhrig University of Dortmund Hugues Cassé, Haluk Ozaktas and Christine Rochange Université Paul Sabatier,
More informationData-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.
Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some
More information