ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Envionment

Size: px

Start display at page:

Download "ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Envionment"

Timothy Paul
6 years ago
Views:

1 ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Envionment, Wonsun Ahn, Josep Torrellas University of Illinois University of Illinois

2 Motivation Architectures that continuously execute Atomic Blocks or Chunks (e.g.,tcc, BulkSC) Chunk: group of dynamically contiguous instructions executed atomically Provide performance and programmability advantages [Hammond 04], [Ahn 10] An important operation is commit: makes the state of chunk visible atomically Designs with lazy detection of chunk conflicts Commit involves updating cache states and checking for conflicts In lazy directory-based cache coherent systems, commit is very challenging Requires updating the states of the distributed caches in a way that appears that chunks execute in a total order In large systems, it results in an execution bottleneck 2

3 Commit in Directory-Based Machine Proc Dir wr Proc Dir Proc Dir commit Proc commit Dir Dir ack inv ack Proc Dir ack inv ack ack inv Proc ack Proc Dir Proc Dir Proc Dir Proc Dir Conventional Cache Coherence Chunk-based Cache Coherence 3

4 Recent Commit Protocols BulkSC [Ceze 07]: Centralized commit arbiter Scalable TCC [Chafi 07]: First distributed scheme Enforce total order by grabbing a commit token Serialization and global communication SEQ Protocol [Pugsley 08]: Extends Scalable TCC by eliminating global communication Still requires serialization of commits that use the same directory module 4

5 SEQ Protocol P=Processor D=Directory module Chunk 0 Chunk 1 P1 P1 cannot overlap occ D0 D1 D2 occ ack occ ack P1 D0 D1 D2 P1 ack occ ack Commits of chunk 0 and chunk 1 are serialized. D0 D1 D2 D0 D1 D2 Commits that use the same directory module are serialized even when they touch non-overlapped lines 5

6 Outline Motivation ScalableBulk Evaluation 6

7 Goals for Scalable Commit No centralized structure Committing processor communicates only with the relevant directory modules Allow concurrent commits of chunks that use the same directory module, as long as the accessed addresses do not overlap 7

8 Contribution: ScalableBulk ScalableBulk: protocol for bottleneck-free commit of chunks in directorybased system Key properties: Enable multiple concurrent chunk commits that use the same directory module Made possible by integrating signatures into directory design Better tolerance to commits that use many directories Eliminates all centralized structures and global communications Committing processor only communicates with the relevant directories (the homes of the addresses accessed by the chunk) More scalable than previous schemes Results: practically eliminates all commit stall overhead for 64 processors 8

9 ScalableBulk Protocol Primitives Allowing multiple non-overlapping commits to use the same directory module Grouping directory modules Initiating the commit optimistically L2$ Proc. + L1$ Directory Module Many-Core Architecture Considered 9

10 Primitive 1: Allowing Concurrent Non-overlapping Commits Read and write footprint of a chunk is summarized in read and write signatures using Bloom filters On commit: signatures are sent to the relevant directory modules Only the addresses in the signature are locked in the directory during the commit Other chunks can commit using the same directory if their signatures do not conflict with the committing signatures 10

11 Primitive 1: Allowing Concurrent Non-overlapping Commits R2 W2 Incoming Signatures W0 Currently Committing Signatures W1 if not : Nack commit else: Start committing NOR Enables more concurrent commits 11

12 Primitive 2: Grouping Directory Modules On a chunk commit: the relevant directory modules Coordinate their transitions by exchanging messages Form a Directory Group Identify a leader module that sends messages to the caches and the committing processor on behalf of the group Grouping Protocol: Complete distributed operation Few messages are required Leader: lowest-numbered directory module in the group 12

13 Primitive 2: Grouping Directory Modules P=Processor D=Directory module Chunk 0 Leader commit_ack bulk_inv Group is formed Sig{R,W} D0 D1 D2 Sig{R,W} Sig{R,W} succ D0 D1 D2 succ inv_ack Commit finished D0 grab D1 grab D2 grab done D0 D1 D2 done 13

14 Distributed Conflict Detection P=Processor D=Directory module Chunk 0 Chunk 1 P1 P1 D0 D1 D2 D0 D1 D2 grab P1 Conflict is detected in D1 P1 Sig{R,W} Sig{R,W} g_failure commit failure grab D0 D1 grab D2 D0 D1 D2 14

15 Primitive 2: Grouping Directory Modules Deadlock is avoided by following a fixed directory-module traversal order Multiple groups can commit concurrently 15

16 Concurrent Commit P=Processor D=Directory module Chunk 0 Chunk 1 P1 P1 D0 D1 D2 D0 D1 D2 grab Sig{R,W} P1 Sig{R,W} No conflict is detected in D1. grab P1 grab D0 D1 grab D2 Two groups commit concurrently. D0 D1 D2 16

17 Primitive 3: Optimistic Commit Initiation Idea: Committing processor (CP) assumes its commit transaction will succeed CP consumes incoming messages before receiving OK to commit Advantages: it enables more overlapping of commits Details in the paper 17

18 Summary: Scalable Commit Commit has no centralization point Committing processor communicates only with relevant directory modules (no message broadcasting) Multiple committing chunks can use the same directory modules, if the addresses that they access do not overlap Similar to how conventional protocols support concurrent writes Optimistic Commit Initiation removes operations from critical path of the commit Commit is truly scalable 18

19 Outline Motivation ScalableBulk Evaluation 19

20 Evaluation Cycle-accurate execution-driven simulator based on SESC and Pin Number of cores: 32 and SPLASH-2 and 7 PARSEC applications Implemented all existing protocols: ScalableBulk Scalable TCC SEQ BulkSC 20

21 Execution Time Execution Time Vips_ Fluidanimate_32 Blackscholes_64 Blackscholes_32 Swaptions_64 Swaptions_32 Vips_ Fluidanimate_ Canneal_64 Canneal_ AVERAGE_64 AVERAGE_32 Facesim_64 Facesim_32 Dedup_64 Dedup_32 Squash Commit Cache Miss Useful ScalableBulk practically eliminates all commit stall time Other existing protocols suffer commit stall (see paper) 21

22 Directories Used Per Commit Chunk commits use about 6 directories on average ScalableBulk is able to overlap the commits that use the same directories if the signatures do not conflict 22

23 Network Message Characterization Scalable TCC ScalableBulk SEQ BulkSC ScalableBulk sends fewer messages than other distributed protocols 23

24 Also in the paper Mechanism to handle fairness and avoid starvation Many details on the implementation of the ScalableBulk protocol Detailed results characterizing various aspects of the protocol 24

25 Conclusion Proposed ScalableBulk: protocol for bottleneck-free commit of chunks in lazy directory-based system Key properties: Enables multiple concurrent chunk commits that use the same directory module Thanks to use of signatures Eliminates all centralized structures and global communications Results: practically eliminates all commit stall overhead for 64 processors Effectively enables a large-scale chunk-based machine 25

26 ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Envionment, Wonsun Ahn, Josep Torrellas University of Illinois University of Illinois

BulkCommit: Scalable and Fast Commit of Atomic Blocks in a Lazy Multiprocessor Environment

BulkCommit: Scalable and Fast Commit of Atomic Blocks in a Lazy Multiprocessor Environment, Benjamin Sahelices Josep Torrellas, Depei Qian University of Illinois University of Illinois http://iacoma.cs.uiuc.edu/