Introducing the SCSD \ Shared Cache for Shared Data" Multiprocessor Architecture. Nagi N. Mekhiel
|
|
- Alberta Martina Blake
- 5 years ago
- Views:
Transcription
1 Introducing the SD \ Shared Cache for Shared Data" ultiprocessor Architecture Nagi N. ekhiel Department of Electrical and Computer Engineering Ryerson Polytechnic University, Toronto, Ontario 5B 2K3 Yarc Systems, Newbury Park, CA91320 Abstract The model improves the performance of the shared memory multiprocessor systems by separating shared data from private data. Private data migrate to the local cache of each processor and the shared data to a shared cache. We present the architecture and protocols for the SD model. The protocols need not to do consistency check which reduces the demand for the shared bus. Results show that the SD model reduces the cost of an access and if it uses a dual bus, the cost could become independent on the data sharing. 1 Introduction Shared memory multiprocessor systems provide programmer with a simple and easy programming environment and use a single bus for all processors to access the memory. Single bus shared memory systems suer from bus saturation. An eective solution to the bus saturation problem is to use a local cache for each processor. A cache coherency protocol is needed for each cache to keep the consistency between the same data items in dierent caches [8],[6]. Coherency protocols use the shared bus to snoop the data which increase the demand for the bus and limits the scalability of the system [4],[6]. The cache coherency problem could be eliminated or reduced with the use of a single shared cache [1],[3]. The problem with sharing a single cache is that more than one processor access it at the same time and becomes the system bottleneck (access conicts). Dierent research work discussed and evaluated the shared cache architecture [1],[2],[3]. In all of these research work, private and shared data eisted in the same cache thus competing with each other which makes shared cache less ecient and causes access con- icts. In this paper, we introduce the SD and present a suitable architecture, protocols and a cost model to evaluate its performance. 2 SD Architecture and Concept Figure 1 shows the architecture of SD with a single and a dual bus. The Processors use local caches for private data and share a single cache for the shared data. The and caches could use write through or write back policies. The cache tag does not use private/shared or valid/invalid bits. Shared data eists only in and private data eists only in caches and there is no need for having a valid or shared bits. The single bus model uses one bus for both S (shared memory), local caches and the (shared cache). The dual bus model uses one bus for the shared memory and another bus for the. Both buses could be used simultaneously. Bus snooping is only needed to identify and convert items from not shared to shared. All processors are of RI, Harvard type architecture running 1 instruction per clock with pipelining and share a single address space and same memory. 2.1 SD Concept Figure 2 shows the concept of the SD model. Private items like 1 and n map to (local caches) of and. Shared items like is transferred to one of the local caches when it's requested for the rst time, if later requested by another processor, the system transfers it to the shared cache using a swap operation. The system moves to the and makes it shared, and moves ' (the replaced item in ) to the same location of in local cache (swap operation).
2 -Pr,Pw=Processor read and write -br,bw=bus read and write -S=shared -NS=not shared br, bw Pr, Pw Pr,Pw,br,bw NS S Figure 3: SD No coherency Write Through Protocol Single bus SD model Dual bus SD model Figure 1: SD Architecture 3 The SD no coherency Protocols The protocols for the SD model need not to do any consistency check because the shared data and private data eist in a separate caches. Only one copy of private data and one copy of shared data eist in the caches. Snooping is needed only when a processor requires a shared data item that eists in another processor local cache as a private item. The main purpose of the protocol is to separate shared from private data and forbid multiple copies of same data items to eists in caches. 3.1 SD No coherency Write Through Protocol 1 y n y1 yn 1 n Swap Figure 2: SD Concept Figure 3 shows the SD write through no coherency protocol. The item enters the NS (not shared) state, when a processor requests data from main memory for the rst time. The item enters the S (shared) state when a processor requests a NS item that is in the local cache of another processor. A Processor reads or writes an item in state NS cache does not change the state of the item. Other processors (using the bus) read or write an item in state NS causes the item to be shared (goes to state S). A Processor or any other processor reads or writes an item in state S does not change the state of the item. A d item in state S when it's replaced by another item in cache causes the item to be not shared (goes to state NS). The protocol does not use invalidate or update policies (no coherency check). It only snoops the bus when the requested shared item is not found in unit. 3.2 SD No coherency Write Back Protocol Figure 4 shows the SD write back no coherency protocol. The data item enters the NS (not shared)
3 -Pr,Pw=Processor read and write -br,bw=bus read and write -S=shared -NS=not shared -D=dirty br Operation Read hit Read miss S cost Pv.T1 + (1-Pv).(2T1 + Tm + Tb) 2T1 + Tb + Tm SD cost + Ps.(T1 + Tb1) (1-Ps).(T1 + Tb +Tm) + Ps.(2T1 + Tb1) NS S Pr,br Write hit Tb + Tm + 2T1 (1-Ps).(T1 + Tb +Tm) + Ps.(T1 + Tb + Tm) Pr Pw,Pr Pw NS & D br,bw bw S & D Pw,bw Pr,Pw,br,bw Figure 4: SD No coherency Write Back Protocol state, when a processor request data from main memory for the rst time. The item enters the S (shared) state when a processor requests a NS item that is in the local cache of another processor. The item in state S or NS becomes dirty after a write operation. A Processor reads an item in state NS from its local cache does not change the state of the item, a write changes the state of the item to NS&D (not shared and dirty). Other processors (using the bus) read an item in state NS changes the state to S (shared) and a write to a not shared item (state NS) changes it to shared and dirty (goes to state S&D). An item that is not shared and dirty (in state NS&D) does not change its state if the local processor reads or writes to this item and goes to state shared and dirty (S&D) if other processor reads or writes to it. A Processor or any other processor reads a shared item (state S) in the shared cache does not change the state of the item. The writes change the state of the item to shared and dirty (S&D). An item that is shared and dirty (in state S&D) does not change state if the local processor or other processor reads or writes to this item. A d shared item (state S) when it's replaced by another item in cache causes the item to be not shared (goes to state NS). A d shared and dirty item (in state S&D) when it's replaced by another item in cache causes the item to be not shared and dirty (goes to state NS&D). The protocol does not use invalidate or update policies (no coherency check). It does only snoop the bus when the requested shared item is not found in unit. Write miss Figure 5: Through Tb + Tm + 2T1 (1-Ps).(T1 + Tb +Tm) + Ps.(2T1 + Tb1) Cost models for S and SD Write 4 The Cost odel To evaluate the SD model, we construct an approimate cost models for the SD. The model total cost is obtained by rst multiplying the cost of each operation by its probability and then add each latency. We dene the following parameters: T1=Access time of or, Tm=Access time of main memory (does not include bus time), Tb=mean bus waiting time for a shared memory model, Tb1=mean bus waiting time for a shared cache using the dual bus architecture, Pr=Probability of memory access to be a read, (1-Pr)=Probability of memory access to be a write, Ps=Probability of memory access to be shared, (1-Ps)=Probability of memory access to be not shared, Pv=Probability of memory access to be valid for the shared memory (S) model, (1-Pv)=Probability of memory access to be not valid for the shared memory (S) model, Pd=Probability of memory access to be dirty, (1-Pd)=Probability of memory access to be clean, 1=miss rate for local caches and s=miss rate for shared cache. We nd the cost model for the SD model and compare it with the cost model of the known single bus shared memory \S" architecture. The table of Figure 5 shows the cost models for the S as in [8] and SD architecture using the write through protocol. The table of Figure 6 shows the cost models for the S as in [8] and SD architecture using the write back protocol. 5 The Results We use the the following values for the models parameters: T1=1 cycle, Tm=20 cycles, Tb=100 cycles, Tb1=5 cycles for dual bus architecture, Tb1=100 cycles for the single bus architecture, Pr=.7, Pv=.4, Pd=.4 and Ps=.05 to.5 and 1=s=.05.
4 Operation Read hit S cost + Ps.Pv.T1+Ps.(1-Pv).(Tb+Tm+2T1) SD cost + Ps.(T1+Tb1) Cost in cycles WT(S)= WT shared memory architecture WT(SD)=WT SD dual bus architecture WB(S)=WB shared memory architecture WB(SD)=WB SD dual bus architecture Read miss Pd.(3T1+2Tm+Tb) + (1-Pd).(3T1+Tm+Tb) (1-Ps).Pd.(T1+2Tm+Tb) +(1-Ps).(1-pd).(T1+Tm+Tb) + Ps.(2T1+Tb1) WT(S) 60 Write hit + Ps.(2T1+Tb) + Ps.(T1+ Tb1) WT(SD) WB(S) 30 Write miss T1+Tm+Tb (1-Ps).(T1+Tm+Tb) +Ps.(2T1+Tb1) WB(SD) sharing ratio Figure 6: Cost models for S and SD Write Back Figure 8: Results of Dual bus S and SD models Cost in cycles WT(S) WT(S)=WT shared memory architecture WT(SD)= WT SD single bus architecture WB(S)= WB shared memory architecture WB(SD)=WB SD single bus architecure WT(SD) WB(SD) WB(S) sharing ratio Figure 7: Results of Single bus S and SD models and write back assuming that SD uses the dual bus architecture and the value of Tb1=5 cycles. The results show that the SD model reduces the cost of an access for the write through policy by more than %50. The cost of the SD model is much smaller than the cost of the S model for the write back policy. The cost of an access to the SD model for either write through or write back does not depend on the sharing ratio, which indicates that this model could be scalable to a large number of processors. In the above results we did not account for the eect of invalidation or miss rate dierences between S and SD models. The values of the above parameters are selected to match the values of similar shared memory multiprocessor systems as in [5]. Figure 7 shows the total cost for the shared memory model compared to the total cost of the SD model for write through and write back. In this case we assume that S and SD use the single bus architecture. The value of Tb1=100 cycles (the same of memory bus). The results show that the SD model reduces the cost of an access for the write through policy by %50 for low sharing ratio and by %25 for high sharing ratio. The cost of the SD model is similar to the cost of the S model for the write back policy. In the above results we did not account for the eect of invalidation on bus delay (should be much less in the SD) and further more, the miss rate for the shared cache and local caches for the SD are assumed to be the same as in S model (The separation of shared data from private data should reduce the miss rates for the SD). Figure 8 shows the total cost for shared memory compared to the SD no coherency for write through 6 Conclusions and Future Work We have introduced a new SD model that uses separate local caches for private data and one single shared cache for the shared data and presented two dierent architecture to implement this model using a cost eective single bus system and a high performance dual bus system. Two no coherency write through and write back protocols are given. The protocols implement the SD concept without any coherency check. The results of an approimate cost models show that the SD architecture gives more performance than the shared memory architecture for a write through protocol and for the dual bus architecture, the performance of the SD system is greatly improved and could become independent on the ratio of shared data. Our future plans include studying other architecture for the SD model like using a multi-bank cache with fast network for the and accurately evaluate this model (using trace simulation).
5 References [1] Basem A. Nayfeh and Kunle Olukotun, \Eploring the Design Space for a Shared-Cache ultiprocessor", 21 Intl. Symp. on Comp. Arch. pages , [2] Erick Hagersten, Anders Landin, and Sief Haridi, \DD- A Cache-Only emory Architecture", Computer vol.25, No.9, pp September [3] Phil C.C. Yeh, Janak H. Patel, and Edward S. Davidson, \Shared Cache for ultiple-stream Computer Systems", IEEE Transaction on Computers vol. C-32, No.1, pp January [4] K. Uchiyama, H. Aoki, \Design of a secondlevel cache chip for shared-bus multimicroprocessor systems", IEEE Solid state circuits vol. 26,No 4,pp April [5]. Vernon, E. D. Lazowska, \An accurate and ecient performance analysis technique for multiprocessor snooping cache-consistency protocols", Proc. 15th Annu. Symp. Comput. Architecture, Honolulu, HI, June 1988, pp [6]. C. Chiang, G. S. Sohi, \Evaluating design choices for shared bus multiprocessors in a throughput-oriented environment", IEEE Transaction on Computers vol.41, No.3, pp arch [7] John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, organ Kaufmann, San eteo, California,1990. [8] Faye A. Briggs, \Synchronization, Coherence, and Event Ordering in ultiprocessors", Computer pp.9-21 February 1988.
Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures
Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Nagi N. Mekhiel Department of Electrical and Computer Engineering Ryerson University, Toronto, Ontario M5B 2K3
More informationThe Cache Write Problem
Cache Coherency A multiprocessor and a multicomputer each comprise a number of independent processors connected by a communications medium, either a bus or more advanced switching system, such as a crossbar
More informationEITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor
EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationOverview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware
Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and
More informationOverview: Shared Memory Hardware
Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationChapter Seven. Idea: create powerful computers by connecting many smaller ones
Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:
More informationShared vs. Snoop: Evaluation of Cache Structure for Single-chip Multiprocessors
vs. : Evaluation of Structure for Single-chip Multiprocessors Toru Kisuki,Masaki Wakabayashi,Junji Yamamoto,Keisuke Inoue, Hideharu Amano Department of Computer Science, Keio University 3-14-1, Hiyoshi
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationCache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014
Cache Coherence Introduction to High Performance Computing Systems (CS1645) Esteban Meneses Spring, 2014 Supercomputer Galore Starting around 1983, the number of companies building supercomputers exploded:
More informationMULTIPROCESSOR system has been used to improve
arallel Vector rocessing Using Multi Level Orbital DATA Nagi Mekhiel Abstract Many applications use vector operations by applying single instruction to multiple data that map to different locations in
More informationCOMP Parallel Computing. CC-NUMA (1) CC-NUMA implementation
COP 633 - Parallel Computing Lecture 10 September 27, 2018 CC-NUA (1) CC-NUA implementation Reading for next time emory consistency models tutorial (sections 1-6, pp 1-17) COP 633 - Prins CC-NUA (1) Topics
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationUniversity of Toronto Faculty of Applied Science and Engineering
Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science
More informationLecture 17: Parallel Architectures and Future Computer Architectures. Shared-Memory Multiprocessors
Lecture 17: arallel Architectures and Future Computer Architectures rof. Kunle Olukotun EE 282h Fall 98/99 1 Shared-emory ultiprocessors Several processors share one address space» conceptually a shared
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationCACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás
CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationChapter 5 Thread-Level Parallelism. Abdullah Muzahid
Chapter 5 Thread-Level Parallelism Abdullah Muzahid 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors is saturating + Modern multiple issue processors are becoming very complex
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P
More informationLimitations of parallel processing
Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors
More informationPage 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence
SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it
More informationIncoherent each cache copy behaves as an individual copy, instead of as the same memory location.
Cache Coherence This lesson discusses the problems and solutions for coherence. Different coherence protocols are discussed, including: MSI, MOSI, MOESI, and Directory. Each has advantages and disadvantages
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More information3/13/2008 Csci 211 Lecture %/year. Manufacturer/Year. Processors/chip Threads/Processor. Threads/chip 3/13/2008 Csci 211 Lecture 8 4
Outline CSCI Computer System Architecture Lec 8 Multiprocessor Introduction Xiuzhen Cheng Department of Computer Sciences The George Washington University MP Motivation SISD v. SIMD v. MIMD Centralized
More informationIntro to Multiprocessors
The Big Picture: Where are We Now? Intro to Multiprocessors Output Output Datapath Input Input Datapath [dapted from Computer Organization and Design, Patterson & Hennessy, 2005] Multiprocessor multiple
More informationPage 1. Cache Coherence
Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale
More informationThe Impact of Write Back on Cache Performance
The Impact of Write Back on Cache Performance Daniel Kroening and Silvia M. Mueller Computer Science Department Universitaet des Saarlandes, 66123 Saarbruecken, Germany email: kroening@handshake.de, smueller@cs.uni-sb.de,
More informationMultiprocessors 1. Outline
Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples
More informationLecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"
Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3
More informationAN OVERVIEW OF HARDWARE BASED CACHE OPTIMIZATION TECHNIQUES
AN OVERVIEW OF HARDWARE BASED CACHE OPTIMIZATION TECHNIQUES Swadhesh Kumar 1, Dr. P K Singh 2 1,2 Department of Computer Science and Engineering, Madan Mohan Malaviya University of Technology, Gorakhpur,
More informationTradeos in the Design of Single Chip. Department of Electrical and Computer Engineering, University of Massachusetts,
Tradeos in the Design of Single Chip Multiprocessors David H. Albonesi and Israel Koren Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 01003, USA Abstract:
More informationESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence
Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy
More informationShared memory. Caches, Cache coherence and Memory consistency models. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16
Shared memory Caches, Cache coherence and Memory consistency models Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Shared memory Caches, Cache
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationCACHE COHERENCE ON A SLOTTED RING
CACHE COHERENCE ON A SLOTTED RING Luiz A. Barroso and Michel Dubois EE-Systems Department University of Southern California Los Angeles, CA 90089-1115 Abstract -- The Express Ring is a new architecture
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationParallel Programming Platforms
arallel rogramming latforms Ananth Grama Computing Research Institute and Department of Computer Sciences, urdue University ayg@cspurdueedu http://wwwcspurdueedu/people/ayg Reference: Introduction to arallel
More informationLecture 7: Implementing Cache Coherence. Topics: implementation details
Lecture 7: Implementing Cache Coherence Topics: implementation details 1 Implementing Coherence Protocols Correctness and performance are not the only metrics Deadlock: a cycle of resource dependencies,
More informationCS152 Computer Architecture and Engineering Lecture 17: Cache System
CS152 Computer Architecture and Engineering Lecture 17 System March 17, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http//http.cs.berkeley.edu/~patterson
More informationSpeculative Versioning Cache: Unifying Speculation and Coherence
Speculative Versioning Cache: Unifying Speculation and Coherence Sridhar Gopal T.N. Vijaykumar, Jim Smith, Guri Sohi Multiscalar Project Computer Sciences Department University of Wisconsin, Madison Electrical
More informationEECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements
EECS15 - Digital Design Lecture 11 SRAM (II), Caches September 29, 211 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http//www-inst.eecs.berkeley.edu/~cs15 Fall
More informationApproaches to Building Parallel Machines. Shared Memory Architectures. Example Cache Coherence Problem. Shared Cache Architectures
Approaches to Building arallel achines Switch/Bus n Scale Shared ory Architectures (nterleaved) First-level (nterleaved) ain memory n Arvind Krishnamurthy Fall 2004 (nterleaved) ain memory Shared Cache
More informationLecture 25: Multiprocessors
Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed
More informationAdapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]
Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationComputer Architecture Memory hierarchies and caches
Computer Architecture Memory hierarchies and caches S Coudert and R Pacalet January 23, 2019 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches
More informationComputer Architecture
Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core
More informationInvestigating the Effectiveness of a Third Level Cache
Investigating the Effectiveness of a Third Level Cache by S. Ghai J. Joyner L. John IBM Corporation IBM Corporation ECE Department Austin, TX 78758 Austin, TX 78758 The University of Texas at Austin sanj@austin.ibm.com
More informationChapter 5. Thread-Level Parallelism
Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationLect. 6: Directory Coherence Protocol
Lect. 6: Directory Coherence Protocol Snooping coherence Global state of a memory line is the collection of its state in all caches, and there is no summary state anywhere All cache controllers monitor
More informationCSC526: Parallel Processing Fall 2016
CSC526: Parallel Processing Fall 2016 WEEK 5: Caches in Multiprocessor Systems * Addressing * Cache Performance * Writing Policy * Cache Coherence (CC) Problem * Snoopy Bus Protocols PART 1: HARDWARE Dr.
More informationParallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence
Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationis developed which describe the mean values of various system parameters. These equations have circular dependencies and must be solved iteratively. T
A Mean Value Analysis Multiprocessor Model Incorporating Superscalar Processors and Latency Tolerating Techniques 1 David H. Albonesi Israel Koren Department of Electrical and Computer Engineering University
More informationIntroducing TAM: Time-Based Access Memory
SPECIAL SECTION ON SECURITY AND RELIABILITY AWARE SYSTEM DESIGN FOR MOBILE COMPUTING DEVICES Received December 9, 2015, accepted January 27, 2016, date of publication February 3, 2016, date of current
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 18 Scalable Parallel Caches Overview ost cache protocols are more complicated than two state Snooping not effective for network-based systems Consider three
More informationCS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols
CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationChapter Seven Morgan Kaufmann Publishers
Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be
More informationOptimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres
Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Facultad de Informatica, Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia,
More informationSuggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!
1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and
More informationECE 30 Introduction to Computer Engineering
ECE 0 Introduction to Computer Engineering Study Problems, Set #9 Spring 01 1. Given the following series of address references given as word addresses:,,, 1, 1, 1,, 8, 19,,,,, 7,, and. Assuming a direct-mapped
More informationCS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols
CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste
More informationShared Symmetric Memory Systems
Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence
CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM
More informationLecture 24: Virtual Memory, Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large
More informationA Multiprocessor system generally means that more than one instruction stream is being executed in parallel.
Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,
More informationMemory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy
Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Levels in
More informationPredicting the Worst-Case Execution Time of the Concurrent Execution. of Instructions and Cycle-Stealing DMA I/O Operations
ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, California, June 1995. Predicting the Worst-Case Execution Time of the Concurrent Execution of Instructions and Cycle-Stealing
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)
1 MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) Chapter 5 Appendix F Appendix I OUTLINE Introduction (5.1) Multiprocessor Architecture Challenges in Parallel Processing Centralized Shared Memory
More informationDesign of A Memory Latency Tolerant. *Faculty of Eng.,Tokai Univ **Graduate School of Eng.,Tokai Univ. *
Design of A Memory Latency Tolerant Processor() Naohiko SHIMIZU* Kazuyuki MIYASAKA** Hiroaki HARAMIISHI** *Faculty of Eng.,Tokai Univ **Graduate School of Eng.,Tokai Univ. 1117 Kitakaname Hiratuka-shi
More informationHigh Performance Multiprocessor System
High Performance Multiprocessor System Requirements : - Large Number of Processors ( 4) - Large WriteBack Caches for Each Processor. Less Bus Traffic => Higher Performance - Large Shared Main Memories
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationOutline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA
CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra
More informationThe Performance of Cache-Coherent Ring-based Multiprocessors
Appeared in the Proceedings of the 2th Intl. Symp. on Computer Architecture, May 1993 The Performance of Cache-Coherent Ring-based Multiprocessors Luiz André Barroso and Michel Dubois barroso@paris.usc.edu;
More informationPerformance metrics for caches
Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache / total number of memory references Typically h = 0.90 to 0.97 Equivalent metric:
More informationLondon SW7 2BZ. in the number of processors due to unfortunate allocation of the. home and ownership of cache lines. We present a modied coherency
Using Proxies to Reduce Controller Contention in Large Shared-Memory Multiprocessors Andrew J. Bennett, Paul H. J. Kelly, Jacob G. Refstrup, Sarah A. M. Talbot Department of Computing Imperial College
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationIntroduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico
More informationCache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O
6.823, L21--1 Cache Coherence Protocols: Implementation Issues on SMP s Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Coherence Issue in I/O 6.823, L21--2 Processor Processor
More informationEE382 Processor Design. Processor Issues for MP
EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I EE 382 Processor Design Winter 98/99 Michael Flynn 1 Processor Issues for MP Initialization Interrupts Virtual Memory TLB Coherency
More informationCOSC4201 Multiprocessors
COSC4201 Multiprocessors Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Multiprocessing We are dedicating all of our future product development to multicore
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationCS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols
CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Protocols Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152
More informationScalable Cache Coherence
Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient
More informationMinimizing the Directory Size for Large-scale DSM Multiprocessors. Technical Report
Minimizing the Directory Size for Large-scale DSM Multiprocessors Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE Minneapolis,
More informationShared Memory Architectures. Approaches to Building Parallel Machines
Shared Memory Architectures Arvind Krishnamurthy Fall 2004 Approaches to Building Parallel Machines P 1 Switch/Bus P n Scale (Interleaved) First-level $ P 1 P n $ $ (Interleaved) Main memory Shared Cache
More informationThe Sun Fireplane Interconnect in the Mid- Range Sun Fire Servers
TAK IT TO TH NTH Alan Charlesworth icrosystems The Fireplane Interconnect in the id- Range Fire Servers Vertical & Horizontal Scaling any CUs in one box Cache-coherent shared memory (S) Usually proprietary
More informationECE 551 System on Chip Design
ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs
More informationReal-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo
Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract
More informationPerformance study example ( 5.3) Performance study example
erformance study example ( 5.3) Coherence misses: - True sharing misses - Write to a shared block - ead an invalid block - False sharing misses - ead an unmodified word in an invalidated block CI for commercial
More information