Efficient Persist Barriers for Multicores

Size: px

Start display at page:

Download "Efficient Persist Barriers for Multicores"

Bathsheba Stevens
5 years ago
Views:

1 fficient Persist Barriers for Multicores Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, Stratis Viglas

2 Summary fficient persist barrier Used to implement persistency models Persistency = when stores become durable (Consistency = when stores become visible) valuated Buffered poch Persistency Bulk Strict Persistency 2

3 Access Granularity Persistent Memory Access Latency 3

4 Persistent Memory Access Granularity Secondary Storage Cache DRAM Access Latency 3

5 Persistent Memory Access Granularity PCM STT-MRAM 3D Xpoint Secondary Storage Cache DRAM NVRAM Access Latency 3

6 Persistent Memory Access Granularity PCM STT-MRAM 3D Xpoint Secondary Storage Cache DRAM NVRAM Access Latency Fast, fine grained persistence. 3

7 Persistent Memory Advantages: Unify memory and storage Access to persistent data through processor load/store interface Challenge: Maintaining consistency of data structures in memory 4

8 Consistency Challenge Core Cache DRAM Software Controlled Secondary Storage 5

9 Consistency Challenge Core Core Cache Cache Hardware Controlled DRAM NVRAM Software Controlled Secondary Storage Secondary Storage 5

10 Linked List - Naïve Cache Pseudo-code HAD 1. Create Update Pointer 3. Update Head Pointer HAD 0 1 NVRAM 6

11 Linked List - Naïve Cache Pseudo-code HAD 1. Create Update Pointer 3. Update Head Pointer HAD 0 1 NVRAM 6

12 Linked List - Naïve Cache Pseudo-code HAD 1. Create Update Pointer 3. Update Head Pointer HAD NVRAM 6

13 Linked List - Naïve Cache Pseudo-code HAD 1. Create Update Pointer 3. Update Head Pointer HAD NVRAM 6

14 Linked List - Naïve Cache Pseudo-code 1. Create System Crash! 2. Update Pointer 3. Update Head Pointer HAD NVRAM 6

15 Linked List - Failsafe Cache Pseudo-code HAD 1. Create 2. Update Pointer Persist Barrier 4. Update Head Pointer HAD 0 1 NVRAM 7

16 Linked List - Failsafe Cache Pseudo-code HAD 1. Create Update Pointer 3. Persist Barrier 4. Update Head Pointer HAD 0 1 NVRAM 7

17 Linked List - Failsafe Cache Pseudo-code HAD 1. Create Update Pointer 3. Persist Barrier 4. Update Head Pointer HAD NVRAM 7

18 Linked List - Failsafe Cache Pseudo-code HAD 1. Create Update Pointer 3. Persist Barrier 4. Update Head Pointer HAD NVRAM 7

19 Linked List - Failsafe Cache Pseudo-code HAD 1. Create Update Pointer 3. Persist Barrier 4. Update Head Pointer HAD NVRAM 7

20 Linked List - Failsafe Pseudo-code 1. Create 2. Update Pointer poch A 3. Persist Barrier 4. Update Head Pointer poch B 8

21 Linked List - Failsafe Pseudo-code 1. Create 2. Update Pointer 3. Persist Barrier Programmer Introduced 4. Update Head Pointer 8

22 Linked List - Failsafe Pseudo-code 1. Create 2. Update Pointer 3. Persist Barrier Programmer Introduced 4. Update Head Pointer Divide program execution into epochs through programmer inserted persist barriers = poch Persistence 8

23 poch Persistence* St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d Persistence a poch 1 b c b a a c d poch 2 e poch 3 d p q d d e * Pelley et. al., Memory Persistency, in ISCA

24 poch Persistence* St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d Persistence a poch 1 b c b a a c d poch 2 e poch 3 d p q d d e * Pelley et. al., Memory Persistency, in ISCA

25 poch Persistence* St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d Persistence a poch 1 b c b a a c d poch 2 e poch 3 d p q d d e * Pelley et. al., Memory Persistency, in ISCA

26 poch Persistence* St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d Persistence a poch 1 b c b a a c d poch 2 e poch 3 d p q d d e * Pelley et. al., Memory Persistency, in ISCA

27 poch Persistence* St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d Persistence a poch 1 b c b a a c d poch 2 e poch 3 d p q d d e * Pelley et. al., Memory Persistency, in ISCA

28 poch Persistence* St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d Persistence a poch 1 b c b a a c d poch 2 e poch 3 d p q d d e * Pelley et. al., Memory Persistency, in ISCA

29 poch Persistence* St a St b St c St a Persist Barrier St d St e St d Persist Barrier St p St q St d Persistence a poch 1 b c b a a c d poch 2 e poch 3 d p q d d e Persist operations happen in the critical path of execution. * Pelley et. al., Memory Persistency, in ISCA

30 Buffered poch Persistence* through Lazy Barrier (LB) Implementation of poch Persistence Durability lags visibility To allow performing persist operations out of critical path * Pelley et. al., Memory Persistency, in ISCA * Condit et. al., Better I/O through byte-addressable, persistent memory, in SOSP

31 Buffered poch Persistence* through Lazy Barrier (LB) poch 1 poch 2 poch 3 a b c a d e d p q d Conflicting request Persistence b a c d e * Pelley et. al., Memory Persistency, in ISCA * Condit et. al., Better I/O through byte-addressable, persistent memory, in SOSP

32 Buffered poch Persistence* through Lazy Barrier (LB) poch 1 poch 2 poch 3 a b c a d e d p q d Conflicting request Persistence b a c d e * Pelley et. al., Memory Persistency, in ISCA * Condit et. al., Better I/O through byte-addressable, persistent memory, in SOSP

33 Buffered poch Persistence* through Lazy Barrier (LB) poch 1 poch 2 poch 3 a b c a d e d p q d Conflicting request Persistence b a c d e * Pelley et. al., Memory Persistency, in ISCA * Condit et. al., Better I/O through byte-addressable, persistent memory, in SOSP

34 Buffered poch Persistence* through Lazy Barrier (LB) poch 1 poch 2 poch 3 a b c a d e d p q d Conflicting request Persistence b a c d e Cache Line viction * Pelley et. al., Memory Persistency, in ISCA * Condit et. al., Better I/O through byte-addressable, persistent memory, in SOSP

35 Buffered poch Persistence* through Lazy Barrier (LB) poch 1 poch 2 poch 3 a b c a d e d p q d Conflicting request Persistence b a c d e * Pelley et. al., Memory Persistency, in ISCA * Condit et. al., Better I/O through byte-addressable, persistent memory, in SOSP

Buffered poch Persistence* through Lazy Barrier (LB) poch 1 poch 2 poch 3 a b c a d e d p q d Conflicting request Persistence b a c d e Conflicts bring persist

36 Buffered poch Persistence* through Lazy Barrier (LB) poch 1 poch 2 poch 3 a b c a d e d p q d Conflicting request Persistence b a c d e Conflicts bring persist operations back in the critical path. * Pelley et. al., Memory Persistency, in ISCA * Condit et. al., Better I/O through byte-addressable, persistent memory, in SOSP

37 Intra-thread Conflict poch 1 poch 2 poch 3 a b c a d e d p q d Conflicting request Persistence b a c d e 12

38 Intra-thread Conflict poch 1 poch 2 poch 3 a b c a d e d p q d Conflicting request Persistence b a c d e 12

39 Proactive Flush (PF) Persist Triggers in Lazy Barrier Passive Trigger: cache line eviction Reactive Trigger: flush on (intra/inter)-thread conflicts 13

40 Proactive Flush (PF) Persist Triggers in Lazy Barrier Passive Trigger: cache line eviction Reactive Trigger: flush on (intra/inter)-thread conflicts Persist Trigger with Proactive Flush Proactive Trigger: proactively flush on epoch completion 13

41 Proactive Flush (PF) poch 1 poch 2 poch 3 a b c a d e d p q d Persistence b a c d e 14

42 Proactive Flush (PF) poch 1 poch 2 poch 3 a b c a d e d p q d Persistence b a c d e 14

43 Proactive Flush (PF) poch 1 poch 2 poch 3 a b c a d e d p q d Persistence b a c d e 14

44 Proactive Flush (PF) poch 1 poch 2 poch 3 a b c a d e d p q d Persistence b a c d e 14

45 Proactive Flush (PF) poch 1 poch 2 poch 3 a b c a d e d p q d Persistence b a c d e Proactive Flush 14

46 Proactive Flush (PF) poch 1 poch 2 poch 3 a b c a d e d p q d Persistence b a c d e Proactive Flush 14

47 Proactive Flush (PF) poch 1 poch 2 poch 3 a b c a d e d p q d Persistence b a c d e Proactive Flush Reduces the probability of encountering conflicts. 14

48 Inter-thread Conflict Thread T 0 W A poch 00 W B W W F Persistence Z A B F Thread T 1 R X R Y W Z R P R B R Q W poch 10 poch 11 15

49 Inter-thread Conflict Thread T 0 W A poch 00 W B W W F Persistence Z A B F Thread T 1 R X R Y W Z R P R B R Q W poch 10 poch 11 15

50 Inter-thread Conflict Thread T 0 W A poch 00 W B W W F Persistence Z A B F Thread T 1 R X R Y W Z R P R B R Q W poch 10 poch 11 15

51 Inter-thread Conflict Thread T 0 W A poch 00 W B W W F Persistence Z A B F Thread T 1 R X R Y W Z R P R B R Q W poch 10 poch 11 15

52 Inter-thread Conflict Thread T 0 W A poch 00 W B W W F Persistence Z A B F Thread T 1 R X R Y W Z R P R B R Q W poch 10 poch 11 15

53 Inter-thread Dependency Lazy barrier Tracking (IDT) No tracking of inter-thread dependencies Need to enforce dependencies online 16

54 Inter-thread Dependency Lazy barrier Tracking (IDT) No tracking of inter-thread dependencies Need to enforce dependencies online Inter-thread Dependency Tracking Add inter-thread dependence tracking registers Track dependencies to enforce offline 16

55 Inter-thread Dependency Tracking (IDT) Thread T 0 poch 00 Source IDT Table Dependent W A W B W W F Persistence Z A B F Thread R X R Y poch W Z R P R B T 1 R Q W poch

56 Inter-thread Dependency Tracking (IDT) Thread T 0 poch 00 Source IDT Table Dependent W A W B W W F Persistence Z A B F Thread R X R Y poch W Z R P R B T 1 R Q W poch

57 Inter-thread Dependency Tracking (IDT) Thread T 0 W A poch 00 W B W W F IDT Table Source Dependent poch 00 poch 11 Persistence Z A B F Thread R X R Y poch W Z R P R B T 1 R Q W poch

58 Inter-thread Dependency Tracking (IDT) Thread T 0 W A poch 00 W B W W F IDT Table Source Dependent poch 00 poch 11 Persistence Z A B F Thread R X R Y poch W Z R P R B T 1 R Q W poch

59 Inter-thread Dependency Tracking (IDT) Thread T 0 W A poch 00 W B W W F IDT Table Source Dependent poch 00 poch 11 Persistence Z A B F Thread R X R Y poch W Z R P R B T 1 R Q W poch

60 Inter-thread Dependency Tracking (IDT) Thread T 0 W A poch 00 W B W W F IDT Table Source Dependent poch 00 poch 11 Persistence Z A B F Thread R X R Y poch W Z R P R B T 1 R Q W poch Reduces the latency of conflicting requests. 17

61 valuation LB LB+IDT LB+PF LB++ Lazy barrier Lazy barrier with inter-thread dependence tracking (IDT) Lazy barrier with proactive flush (PF) Lazy barrier with both IDT and PF Persistency Models Persist Barrier Designs Buffered poch Persistency (BP) maintaining in-memory persistent data structures Bulk Strict Persistency (BSP) = BP + atomicity provide stronger persistency model (strict persistency) similar to doing sequential consistency in bulk mode* * Ceze et. al., BulkSC: Bulk enforcement of sequential consistency, in ISCA

62 System Configuration We evaluate proposed design using GM5 full-system simulation mode 32 Core CMP with 32x1MB LLC cache banks and 4 memory controllers More details on implementation of persist barrier for such a system are in the paper. 19

63 BP Transaction Throughput Higher is Better 20

64 BP Transaction Throughput 3% Higher is Better 20

65 BP Transaction Throughput 15% Higher is Better 20

66 BP Transaction Throughput 22% Higher is Better 20

67 BSP xecution Time Lower is Better 21

68 BSP xecution Time 15% Lower is Better 21

69 BSP xecution Time 20% Lower is Better 21

70 Conclusion We propose an efficient implementation of a persist barrier primitive Buffered implementation, to move persists out of critical path We highlight how conflicts bring them back into critical path We propose and implement two optimizations Proactive Flush: Reduce the percentage of conflicting epochs Inter-thread Dependence Tracking: Reduce the penalty of inter-thread conflicts We demonstrate the efficacy by implementing two persistence models, namely BP and BSP efficiently 22

71 fficient Persist Barriers for Multicores Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, Stratis Viglas

Architectural Support for Atomic Durability in Non-Volatile Memory

Architectural Support for Atomic Durability in Non-Volatile Memory Arpit Joshi, Vijay Nagarajan, Stratis Viglas, Marcelo Cintra NVMW 2018 Summary Non-Volatile Memory (NVM) - on the memory bus enables in-memory