What Is STM and How Well It Performs? Aleksandar Dragojević

Size: px
Start display at page:

Download "What Is STM and How Well It Performs? Aleksandar Dragojević"

Transcription

1 What Is STM and How Well It Performs? Aleksandar Dragojević 1

2 Hardware Trends 2

3 Hardware Trends CPU 2

4 Hardware Trends CPU 2

5 Hardware Trends CPU 2

6 Hardware Trends CPU 2

7 Hardware Trends CPU CPU 2

8 Hardware Trends CPU CPU CPU CPU 2

9 Hardware Trends CPU CPU CPU CPU CPU CPU CPU CPU 2

10 Concurrent Programming Domain of experts Fine-grained locking Lock-free Average programmers New abstractions needed 3

11 Transactional Memory 4

12 Transactional Memory // O1: move 20 a->b 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; // O2: add 10 to a 5: int a = acc_a; 6: acc_a = a + 10; 4

13 Transactional Memory // O1: move 20 a->b atomic { 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; } // O2: add 10 to a atomic { 5: int a = acc_a; 6: acc_a = a + 10; } 4

14 Transactional Memory // O1: move 20 a->b atomic { 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; } // O2: add 10 to a atomic { 5: int a = acc_a; 6: acc_a = a + 10; } 4

15 Transactional Memory Sequential code Add transactions atomic key word System ensures correctness equivalent to sequential 5

16 Is it really simpler? 6

17 Is it really simpler? void Enqueue(int value) { ptrver_t next, tail, tail2, newb; node_t *node = alloc_queue_node(value); while(true) { tail = Tail; next = tail.ptr->next; tail2 = Tail; if(tail == tail2) if(next.ptr == NULL) { newb.ptr = node; newb.ver = next.ver + 1; if(cas(&tail.ptr->next, next, newb)) break; } else { newb.ptr = next.ptr; newb.ver = tail.ver + 1; CAS(&Tail, tail, newb); } } newb.ptr = node; newb.ver = tail.ver + 1; CAS(&Tail, tail, newb); } 6

18 Is it really simpler? void Enqueue(int value) { node_t *node = alloc_queue_node(value); atomic { if(tail == NULL) head = node; else tail->next = node; tail = node; } } 6

19 Disclaimer 7

20 What I might say TM is easy to use by non-experts TM shows great promise I hope we can make TM widely used 8

21 What I am not saying TM 9

22 Outline What is TM? How to implement an STM? SwissTM STM performance predicting the performance 10

23 What is TM? 11

24 What is TM? What does TM guarantee? 11

25 Serializability 12

26 Serializability 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; 12

27 Serializability atomic { // t1 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; } 12

28 Serializability atomic { // t1 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; } atomic { // t2 5: int a = acc_a; 6: acc_a = a + 10; } 12

29 Serializability atomic { // t1 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; } t 1 t 2 atomic { // t2 5: int a = acc_a; 6: acc_a = a + 10; } t correct 12

30 Serializability atomic { // t1 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; } t 2 t 1 atomic { // t2 5: int a = acc_a; 6: acc_a = a + 10; } t correct 12

31 Serializability atomic { // t1 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; } atomic { // t2 5: int a = acc_a; 6: acc_a = a + 10; } t client 12

32 Serializability atomic { // t1 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; } atomic { // t2 5: int a = acc_a; 6: acc_a = a + 10; } t bank 12

33 How is this achieved? T 1 T 2 13

34 How is this achieved? T 1 t 1 T 2 13

35 How is this achieved? T 1 t 1 1 T 2 13

36 How is this achieved? T 1 t T 2 13

37 How is this achieved? T 1 t T 2 13

38 How is this achieved? T 1 t T 2 13

39 How is this achieved? T 1 t commit T 2 13

40 How is this achieved? T 1 t commit T 2 t 2 13

41 How is this achieved? T 1 t commit T 2 t

42 How is this achieved? T 1 t commit T 2 t

43 How is this achieved? T 1 t commit T 2 t commit 13

44 How is this achieved? T 1 T 2 13

45 How is this achieved? T 1 t 1 1 T 2 13

46 How is this achieved? T 1 t 1 1 T 2 t

47 How is this achieved? T 1 t T 2 t

48 How is this achieved? T 1 t T 2 t

49 How is this achieved? T 1 t T 2 t

50 How is this achieved? T 1 t T 2 t abort 13

51 How is this achieved? T 1 t T 2 t 2 13

52 How is this achieved? T 1 t T 2 t 2 13

53 How is this achieved? T 1 t commit T 2 t 2 13

54 How is this achieved? T 1 t commit T 2 t

55 How is this achieved? T 1 t commit T 2 t

56 How is this achieved? T 1 t commit T 2 t commit 13

57 How is this achieved? TM monitors accesses to objects When it detects conflicting access one transaction is aborted its actions are rolled back it is restarted transaction commits When all actions are not conflicting 14

58 Is serializability enough? 15

59 Is serializability enough? Variables: int x=0, y=1; Invariant: x < y 15

60 Is serializability enough? atomic { 1: int xl = x; 2: int yl = y; 3: x = yl; 4: y = yl * 2; } Variables: int x=0, y=1; Invariant: x < y 15

61 Is serializability enough? Variables: int x=0, y=1; Invariant: x < y atomic { 1: int xl = x; 2: int yl = y; 3: x = yl; 4: y = yl * 2; } atomic { 5: int xl = x; 6: int yl = y; 7: int zl = 1/(yl-xl); 8: z = zl; } 15

62 Consistent view All transactions must observe consistent views of memory at all times even the aborted ones 16

63 Opacity Serializability there exists an equivalent serial (one thread) execution Consistent memory view no transaction can e.g. divide by zero because of non-consistent reads 17

64 TM semantics Committed: instantaneous Aborted: never visible All: observe consistent state 18

65 TM semantics 19

66 TM semantics T 1 T 2 T 3 serial 19

67 TM semantics T 1 t 11 :commit T 2 T 3 serial 19

68 TM semantics T 1 t 11 :commit T 2 t 21 :commit T 3 serial 19

69 TM semantics T 1 t 11 :commit T 2 T 3 t 21 :commit t 31 :abort serial 19

70 TM semantics T 1 t 11 :commit T 2 T 3 t 21 :commit t 31 :abort t 32 :commit serial 19

71 TM semantics T 1 t 11 :commit t 12 :abort t 13 :commit T 2 t 21 :commit t 22 :commit T 3 t 31 :abort t 32 :commit t 33 :commit serial 19

72 TM semantics T 1 t 11 :commit t 12 :abort t 13 :commit T 2 t 21 :commit t 22 :commit T 3 t 31 :abort t 32 :commit t 33 :commit serial 19

73 TM semantics T 1 t 11 :commit t 12 :abort t 13 :commit T 2 t 21 :commit t 22 :commit T 3 t 31 :abort t 32 :commit t 33 :commit serial t 11 19

74 TM semantics T 1 t 11 :commit t 12 :abort t 13 :commit T 2 t 21 :commit t 22 :commit T 3 t 31 :abort t 32 :commit t 33 :commit serial t 11 19

75 TM semantics T 1 t 11 :commit t 12 :abort t 13 :commit T 2 t 21 :commit t 22 :commit T 3 t 31 :abort t 32 :commit t 33 :commit serial t 11 t 21 19

76 TM semantics T 1 t 11 :commit t 12 :abort t 13 :commit T 2 t 21 :commit t 22 :commit T 3 t 31 :abort t 32 :commit t 33 :commit serial t 11 t 21 19

77 TM semantics T 1 t 11 :commit t 12 :abort t 13 :commit T 2 t 21 :commit t 22 :commit T 3 t 31 :abort t 32 :commit t 33 :commit serial t 11 t 21 t 22 19

78 TM semantics T 1 t 11 :commit t 12 :abort t 13 :commit T 2 t 21 :commit t 22 :commit T 3 t 31 :abort t 32 :commit t 33 :commit serial t 11 t 21 t 22 19

79 TM semantics T 1 t 11 :commit t 12 :abort t 13 :commit T 2 t 21 :commit t 22 :commit T 3 t 31 :abort t 32 :commit t 33 :commit serial t 11 t 21 t 22 t 32 19

80 TM semantics T 1 t 11 :commit t 12 :abort t 13 :commit T 2 t 21 :commit t 22 :commit T 3 t 31 :abort t 32 :commit t 33 :commit serial t 11 t 21 t 22 t 32 t 13 t 33 19

81 How to implement an STM? 20

82 How to implement an STM? How does it all fit? 20

83 Software TM Available now Component of HyTM Backwards compatibility 21

84 From.cc To.exe 22

85 From.cc To.exe.cc 22

86 From.cc To.exe.cc.exe 22

87 From.cc To.exe.cc?.exe 22

88 From.cc To.exe.cc cc.exe 22

89 From.cc To.exe.cc cc.o.exe 22

90 From.cc To.exe.cc cc.o ld.exe 22

91 From.cc To.exe.cc cc.o ld.exe lib*.a 22

92 From.cc To.exe.cc cc.o ld.exe lib*.a 22

93 From.cc To.exe.cc cc.o ld.exe TM? lib*.a 22

94 From.cc To.exe.cc.exe 22

95 From.cc To.exe.cc Tx pass.exe 22

96 From.cc To.exe.cc Tx pass cc.exe 22

97 From.cc To.exe.cc Tx pass cc.o.exe 22

98 From.cc To.exe.cc Tx pass cc.o ld.exe 22

99 From.cc To.exe.cc Tx pass cc.o ld.exe libtm.a 22

100 From.cc To.exe.cc Tx pass cc.o ld.exe libtm.a 22

101 From.cc To.exe.cc Tx pass cc.o ld.exe libtm.a 22

102 TM library Implements TM Similar (same) API Different algorithms different performance different properties 23

103 TM library API tx_start() tx_read(addr) : val tx_write(addr, val) tx_commit() tx_abort() 24

104 TM libraries SwissTM (EPFL) TinySTM (UNINE) DSTM, TL2, TLRW, SkyTM (Sun) McRT (Intel) SXM, Bartok (Microsoft)... 25

105 Tx pass 26

106 Tx pass atomic { // t1 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; } 26

107 Tx pass atomic { // t1 1: int a = acc_a; 2: acc_a = a - 20; 3: int b = acc_b; 4: acc_b = b + 20; } tx_start(); 1: int a = tx_read(acc_a); 2: tx_write(acc_a, a-20); 3: int b = tx_read(acc_b); 4: tx_write(acc_b, b+20); tx_commit(); 26

108 Implementing Tx pass Manual Compiler Other 27

109 Manual Tx pass Manually insert TM API calls Highly optimized no unnecessary TM calls Error-prone missing TM calls Tedious need to rewrite a lot of code 28

110 Compiler Tx pass Integrated with the compiler Simple to use Lower performance unnecessary TM calls Better support for optimizations lower the overheads 29

111 Other Tx pass Source to source compiler separate, simpler compiler Bytecode instrumentation for managed languages (Java, C#) 30

112 Tx Passes C/C++ Intel DTMC (LLVM) Sun gcc Java Deuce 31

113 SwissTM 32

114 SwissTM How to implement an STM library? 32

115 Setting 33

116 Setting CPU 1 33

117 Setting CPU 1 CPU 2 33

118 Setting CPU 1 CPU 2 CPU 3 33

119 Setting Main memory CPU 1 CPU 2 CPU 3 33

120 Setting Main memory 0x0 CPU 1 CPU 2 CPU 3 33

121 Setting Main memory CPU 1 0x0 0x4 CPU 2 CPU 3 33

122 Setting Main memory CPU 1 0x0 0x4 0x8 0xc CPU 2 CPU 3 0xffffffff 33

123 Setting Main memory CPU 1 op(addr) 0x0 0x4 0x8 0xc CPU 2 CPU 3 0xffffffff 33

124 Setting Main memory CPU 1 op(addr) 0x0 0x4 0x8 0xc CPU 2 CPU 3 0xffffffff 33

125 Setting (2) Memory consists of locations word sized (32 bits) Each location has address CPUs execute operations on locations read, write, c&s, t&s,... 34

126 Setting (3) Operations have different costs Try to avoid expensive operations c&s writing shared data reading shared data 35

127 SwissTM Two phase locking Invisible reads time-based validation Deferred updates support rollback 36

128 SwissTM design TM 37

129 SwissTM design TM conflict detection 37

130 SwissTM design TM conflict detection contention manager 37

131 SwissTM design t 1 TM conflict detection contention manager t 2 37

132 SwissTM design t 1 R(acc_a) tx_read(acc_a) TM conflict detection contention manager t 2 37

133 SwissTM design t 1 R(acc_a) TM conflict detection contention manager t 2 R(acc_a) tx_read(acc_a) 37

134 SwissTM design t 1 R(acc_a) W(acc_a) tx_write(acc_a) TM conflict detection contention manager t 2 R(acc_a) 37

135 SwissTM design t 1 R(acc_a) W(acc_a) TM conflict detection contention manager t 2 R(acc_a) tx_write(acc_a) W(acc_a) 37

136 SwissTM design t 1 R(acc_a) W(acc_a) TM conflict detection contention manager t 2 R(acc_a) abort 37

137 Conflict Detection eager > lazy t 1 commit RW V t 2 RW abort lazy wasted 38

138 Conflict Detection eager > lazy t 1 commit RW V t 2 RW abort lazy wasted 38

139 Conflict Detection eager > lazy t 1 commit RW V t 2 RW abort abort eager wasted 38

140 Conflict Detection 38

141 Conflict Detection lazy > eager t 1 commit W V t 2 R commit lazy 38

142 Conflict Detection lazy > eager t 1 commit W V t 2 R abort eager 38

143 Conflict Detection lazy > eager t 1 commit W V t 2 R commit lazy 38

144 Contention Manager t 1 W? V t 2 W 39

145 Contention Manager t 1 Greedy W abort younger V t 2 W abort 39

146 Contention Manager 39

147 Contention Manager t 1 W? V t 2 W 39

148 Contention Manager t 1 Timid abort W abort later V t 2 W 39

149 SwissTM Design Mixed invalidation lazy for read/write eager for write/write Two-phase contention manager timid for short greedy for long 40

150 Starting point void tx_start() word_t tx_read(word_t *addr) void tx_write(word_t *addr, word_t val) void tx_commit() 41

151 Where is the lock? 42

152 Where is the lock? Global Lock Table 42

153 Where is the lock? Global Lock Table lock[0] lock[1] lock[2] lock[1m lock[4m - 1] 42

154 Where is the lock? Global Lock Table lock[0] lock[1] map(addr) lock[2] lock[1m lock[4m - 1] 42

155 Where is the lock? Global Lock Table lock[0] lock[1] map(0xab000020) lock[2] lock[1m lock[4m - 1] 42

156 Where is the lock? Global Lock Table lock[0] lock[1] lock[2] map(0xab000020) a b lock[1m lock[4m - 1] 42

157 Where is the lock? Global Lock Table lock[0] lock[1] lock[2] map(0xab000020) a b lock[1m lock[4m - 1] 42

158 Where is the lock? Global Lock Table lock[0] lock[1] lock[2] map(0xab000020) a b lock[1m lock[4m - 1] 42

159 Where is the lock? Global Lock Table lock[0] lock[1] lock[2] map(0xab000020) a b lock[1m lock[4m - 1] 42

160 Lock 43

161 Lock write lock read lock 43

162 Lock unlocked write lock 0x0 43

163 Lock unlocked write lock 0x0 locked write lock owner log entry ptr 43

164 Lock unlocked write lock 0x0 locked write lock owner log entry ptr unlocked read lock version << 1 43

165 Lock unlocked write lock 0x0 locked write lock owner log entry ptr unlocked read lock version << 1 locked read lock 0x1 43

166 Lock (2) Two memory words Write lock encounter time Read lock commit time detect write/write conflicts detect read/write conflicts 44

167 Versions Each location has a version Use shared version counter speeds up validation Every transaction that writes, updates the counter on commit 45

168 Versions (2) Transactions read version counter at start If location version lower than counter no updates to the location since start read set is consistent Otherwise, validate remember current version counter 46

169 Shared data Commit_ts // shared version counter Greedy_ts // shared CM counter Lock_table // locks for all locations 47

170 Thread-local data _start // rollback jump target _valid_ts // read set version _read_log // what tx read _write_log // what tx wrote _cm_ts // CM timestamp 48

171 Start 1: tx_start(): 2: _start := create_jump_target() 3: _valid_ts := read(commit_ts) 49

172 Read 1: tx_read(word_t *addr) 2: (r_lock, w_lock) := map(addr) 3: if locked_by_me(w_lock) return get_val(w_lock) 4: version := read(r_lock) 5: while true 6: if version = 0x1 7: version := read(r_lock) 8: continue 9: value := read(addr) 10: version2 := read(r_lock) 11: if version = version2 break 12: version := version2 13: read_log_add(r_lock, version) 14: if version > _valid_ts and not extend() 15: rollback() 16: return value 50

173 Read 1: tx_read(word_t *addr) 2: (r_lock, w_lock) := map(addr) Map to lock 3: if locked_by_me(w_lock) return get_val(w_lock) 4: version := read(r_lock) 5: while true 6: if version = 0x1 7: version := read(r_lock) 8: continue 9: value := read(addr) 10: version2 := read(r_lock) 11: if version = version2 break 12: version := version2 13: read_log_add(r_lock, version) 14: if version > _valid_ts and not extend() 15: rollback() 16: return value 50

174 Read 1: tx_read(word_t *addr) 2: (r_lock, w_lock) := map(addr) Map to lock 3: if locked_by_me(w_lock) return get_val(w_lock) 4: version := read(r_lock) 5: while true 6: if version = 0x1 7: version := read(r_lock) 8: continue 9: value := read(addr) 10: version2 := read(r_lock) 11: if version = version2 break 12: version := version2 13: read_log_add(r_lock, version) 14: if version > _valid_ts and not extend() 15: rollback() 16: return value 50 Read after write

175 Read 1: tx_read(word_t *addr) 2: (r_lock, w_lock) := map(addr) Map to lock 3: if locked_by_me(w_lock) return get_val(w_lock) 4: version := read(r_lock) 5: while true 6: if version = 0x1 7: version := read(r_lock) 8: continue 9: value := read(addr) 10: version2 := read(r_lock) 11: if version = version2 break 12: version := version2 13: read_log_add(r_lock, version) 14: if version > _valid_ts and not extend() 15: rollback() 16: return value 50 Read after write Read consistent version and value

176 Read 1: tx_read(word_t *addr) 2: (r_lock, w_lock) := map(addr) Map to lock 3: if locked_by_me(w_lock) return get_val(w_lock) 4: version := read(r_lock) 5: while true 6: if version = 0x1 7: version := read(r_lock) 8: continue 9: value := read(addr) 10: version2 := read(r_lock) 11: if version = version2 break 12: version := version2 13: read_log_add(r_lock, version) 14: if version > _valid_ts and not extend() 15: rollback() 16: return value 50 Read after write Read consistent version and value Read state consistent?

177 Extend 1: extend() 2: ts := read(commits_ts) 3: if validate() 4: _valid_ts := ts 5: return true 6: return false 51

178 Validate 1: validate() 2: for entry in _read_log 3: if entry.version = read(entry.r_lock) 4: continue 5: if locked_by_me(entry.w_lock) 6: continue 7: return false 8: return true 52

179 Write 1: tx_write(word_t *addr, word_t val) 2: (r_lock, w_lock) := map(addr) 3: if locked_by_me(w_lock) 4: update(w_lock, val) 5: return 6: while true 7: if w_lock!= 0x0 8: if cm_should_abort(w_lock) rollback() 9: else continue 10: entry := write_log_add(w_lock,addr,val) 11: if c&s(w_lock, 0, entry) break 12: write_log_remove(entry) 13: if read(r_lock) > _valid_ts and not extend() 14: rollback() 53

180 Write 1: tx_write(word_t *addr, word_t val) 2: (r_lock, w_lock) := map(addr) 3: if locked_by_me(w_lock) Map to lock 4: update(w_lock, val) 5: return 6: while true 7: if w_lock!= 0x0 8: if cm_should_abort(w_lock) rollback() 9: else continue 10: entry := write_log_add(w_lock,addr,val) 11: if c&s(w_lock, 0, entry) break 12: write_log_remove(entry) 13: if read(r_lock) > _valid_ts and not extend() 14: rollback() 53

181 Write 1: tx_write(word_t *addr, word_t val) 2: (r_lock, w_lock) := map(addr) 3: if locked_by_me(w_lock) Map to lock 4: update(w_lock, val) 5: return 6: while true 7: if w_lock!= 0x0 8: if cm_should_abort(w_lock) rollback() 9: else continue 10: entry := write_log_add(w_lock,addr,val) 11: if c&s(w_lock, 0, entry) break 12: write_log_remove(entry) 13: if read(r_lock) > _valid_ts and not extend() 14: rollback() Write after write 53

182 Write 1: tx_write(word_t *addr, word_t val) 2: (r_lock, w_lock) := map(addr) 3: if locked_by_me(w_lock) Map to lock 4: update(w_lock, val) 5: return 6: while true Acquire 7: if w_lock!= 0x0 write lock 8: if cm_should_abort(w_lock) rollback() 9: else continue 10: entry := write_log_add(w_lock,addr,val) 11: if c&s(w_lock, 0, entry) break 12: write_log_remove(entry) 13: if read(r_lock) > _valid_ts and not extend() 14: rollback() Write after write 53

183 Write 1: tx_write(word_t *addr, word_t val) 2: (r_lock, w_lock) := map(addr) 3: if locked_by_me(w_lock) Map to lock 4: update(w_lock, val) 5: return 6: while true Acquire 7: if w_lock!= 0x0 write lock 8: if cm_should_abort(w_lock) rollback() 9: else continue 10: entry := write_log_add(w_lock,addr,val) 11: if c&s(w_lock, 0, entry) break 12: write_log_remove(entry) 13: if read(r_lock) > _valid_ts and not extend() 14: rollback() 53 Write after write Validate (for WaR)

184 Commit 1: tx_commit() 2: if is_empty(_write_log) return 3: for entry in _write_log 4: write(entry.r_lock,0x1) 5: ts := increment(commits_ts) 6: if ts > _valid_ts + 1 and not validate() 7: for entry in _write_log 8: write(entry.r_lock, entry.version) 9: rollback() 10: for entry in _write_log 11: write(entry.addr, entry.value) 12: write(entry.r_lock, ts << 1) 13: write(entry.w_lock, 0) 54

185 Commit 1: tx_commit() 2: if is_empty(_write_log) return 3: for entry in _write_log 4: write(entry.r_lock,0x1) 5: ts := increment(commits_ts) 6: if ts > _valid_ts + 1 and not validate() 7: for entry in _write_log 8: write(entry.r_lock, entry.version) 9: rollback() 10: for entry in _write_log 11: write(entry.addr, entry.value) 12: write(entry.r_lock, ts << 1) 13: write(entry.w_lock, 0) Read-only 54

186 Commit 1: tx_commit() 2: if is_empty(_write_log) return 3: for entry in _write_log 4: write(entry.r_lock,0x1) 5: ts := increment(commits_ts) 6: if ts > _valid_ts + 1 and not validate() 7: for entry in _write_log 8: write(entry.r_lock, entry.version) 9: rollback() 10: for entry in _write_log 11: write(entry.addr, entry.value) 12: write(entry.r_lock, ts << 1) 13: write(entry.w_lock, 0) Read-only Acquire read locks 54

187 Commit 1: tx_commit() 2: if is_empty(_write_log) return 3: for entry in _write_log 4: write(entry.r_lock,0x1) 5: ts := increment(commits_ts) 6: if ts > _valid_ts + 1 and not validate() 7: for entry in _write_log 8: write(entry.r_lock, entry.version) 9: rollback() 10: for entry in _write_log 11: write(entry.addr, entry.value) 12: write(entry.r_lock, ts << 1) 13: write(entry.w_lock, 0) Read-only Acquire read locks Get next version 54

188 Commit 1: tx_commit() 2: if is_empty(_write_log) return 3: for entry in _write_log 4: write(entry.r_lock,0x1) 5: ts := increment(commits_ts) 6: if ts > _valid_ts + 1 and not validate() 7: for entry in _write_log 8: write(entry.r_lock, entry.version) 9: rollback() 10: for entry in _write_log 11: write(entry.addr, entry.value) 12: write(entry.r_lock, ts << 1) 13: write(entry.w_lock, 0) Read-only Acquire read locks Get next version Validate read set 54

189 Commit 1: tx_commit() 2: if is_empty(_write_log) return 3: for entry in _write_log 4: write(entry.r_lock,0x1) 5: ts := increment(commits_ts) 6: if ts > _valid_ts + 1 and not validate() 7: for entry in _write_log 8: write(entry.r_lock, entry.version) 9: rollback() 10: for entry in _write_log 11: write(entry.addr, entry.value) 12: write(entry.r_lock, ts << 1) 13: write(entry.w_lock, 0) Read-only Acquire read locks Get next version Validate read set Commit values to memory 54

190 Rollback 1: rollback() 2: for entry in _write_log 3: write(entry.w_lock, 0x0) 4: long_jump(_start) 55

191 Contention manager 1: on_start() 2: _cm_ts := 3: on_write() 4: if _cm_ts = and size(_write_log) > 10 5: _cm_ts := increment(greedy_ts) 6: on_rollback() 7: wait_random() 8: cm_should_abort(w_lock) 9: if _cm_ts = return true 10: owner = owner(w_lock) 11: if owner._cm_ts < _cm_ts return true 12: abort(owner) 13: return false 56

192 Contention manager 1: on_start() 2: _cm_ts := Start as Timid 3: on_write() 4: if _cm_ts = and size(_write_log) > 10 5: _cm_ts := increment(greedy_ts) 6: on_rollback() 7: wait_random() 8: cm_should_abort(w_lock) 9: if _cm_ts = return true 10: owner = owner(w_lock) 11: if owner._cm_ts < _cm_ts return true 12: abort(owner) 13: return false 56

193 Contention manager 1: on_start() 2: _cm_ts := Start as Timid 3: on_write() 4: if _cm_ts = and size(_write_log) > 10 5: _cm_ts := increment(greedy_ts) 6: on_rollback() 7: wait_random() 8: cm_should_abort(w_lock) 9: if _cm_ts = return true 10: owner = owner(w_lock) 11: if owner._cm_ts < _cm_ts return true 12: abort(owner) 13: return false 56 Switch to Greedy

194 Contention manager 1: on_start() 2: _cm_ts := Start as Timid 3: on_write() 4: if _cm_ts = and size(_write_log) > 10 5: _cm_ts := increment(greedy_ts) 6: on_rollback() 7: wait_random() 8: cm_should_abort(w_lock) 9: if _cm_ts = return true 10: owner = owner(w_lock) 11: if owner._cm_ts < _cm_ts return true 12: abort(owner) 13: return false 56 Switch to Greedy Random backoff

195 Contention manager 1: on_start() 2: _cm_ts := Start as Timid 3: on_write() 4: if _cm_ts = and size(_write_log) > 10 5: _cm_ts := increment(greedy_ts) 6: on_rollback() 7: wait_random() 8: cm_should_abort(w_lock) 9: if _cm_ts = return true Timid 10: owner = owner(w_lock) 11: if owner._cm_ts < _cm_ts return true 12: abort(owner) 13: return false 56 Switch to Greedy Random backoff

196 Contention manager 1: on_start() 2: _cm_ts := Start as Timid 3: on_write() 4: if _cm_ts = and size(_write_log) > 10 5: _cm_ts := increment(greedy_ts) 6: on_rollback() 7: wait_random() 8: cm_should_abort(w_lock) 9: if _cm_ts = return true Timid 10: owner = owner(w_lock) 11: if owner._cm_ts < _cm_ts return true 12: abort(owner) 13: return false 56 Switch to Greedy Random backoff Greedy

197 STMBench7 Throughput [10 3 tx/s] 7 SwissTM TinySTM RSTM TL read dominated Thread count 57

198 Duration [s] Mixed invalidation TinySTM SwissTM LeeTM Thread count 58

199 Duration [s] 80 Mixed invalidation TinySTM SwissTM LeeTM irr 5% Thread count 58

200 Duration [s] Mixed invalidation TinySTM SwissTM LeeTM irr 20% Thread count 58

201 Two-phase CM Throughput [10 6 tx/s] Two-phase Greedy red-black tree Thread count 59

202 Next steps Translate into code some additional details Compile Run 60

203 Additional details Memory management allocations inside transactions that abort? Actions that cannot be rolled back input / output Same data used by non-tx code privatization / publication 61

204 STM performance 62

205 STM performance How fast is an STM really? 62

206 Measuring performance Compare different approaches different STMs STM vs locking vs lock-free How to express performance? metric Common approaches work 63

207 Approach I 64

208 Approach I 64

209 Approach I (2) Take some (long) task e.g. genome sequencing Run with different STMs Faster STM needs less time 65

210 Approach II 66

211 Approach II (2) Take some repetitive task e.g. insert element into red-black tree Keep executing it for some (fixed) time Run with different STMs Faster STM executes the task more times 67

212 Approach III 68

213 Avoiding pitfalls STM1 sequences genome faster than STM2 does not mean STM1 also solves some optimization problem A faster than STM2 No complete solution run as many workloads as possible Be careful 69

214 STMBench7 70

215 STMBench7 What does a benchmark look like? 70

216 STMBench7 Uses approach II Large data structure Modeled on OO7 CAD / CAM / CASE workloads Two locking, several STM implementations Crash test 71

217 Data structure 72

218 Data structure (2) 73

219 Operations 45 operations Read only and Update Short and long Three different workloads read, read-write, write 74

220 Execution Thread 1 op1 Thread 2 op2 Thread 3 op3 Data structure Latency, throughput 75

221 Execution (2) Threads execute a mix of operations Experiment lasts for 10s for example Measure of performance is the number of executed operations per second 76

222 STMBench7 Throughput [10 3 tx/s] 7 SwissTM TinySTM RSTM TL read dominated Thread count 77

223 Important question 78

224 Important question Can STM outperform sequential code? 78

225 SwissTM vs Sequential (x86 manual)!d&&(5d$$ #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 79

226 SwissTM vs Sequential (x86 manual)!d&&(5d$$ #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 79

227 SwissTM vs Sequential (x86 manual)!d&&(5d$$ #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 79

228 SwissTM vs Sequential (x86 manual)!d&&(5d$$ #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 79

229 SwissTM vs Sequential (x86 manual)!d&&(5d$$ #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 79

230 SwissTM vs Sequential (x86 manual)!d&&(5d$$ #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 79

231 SwissTM vs Sequential (x86 manual)!d&&(5d$$ #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 79

232 SwissTM vs Sequential (x86 manual)!d&&(5d$$ #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 79

233 SwissTM vs Sequential (x86 manual)!d&&(5d$$ #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 79

234 SwissTM vs Sequential (x86 Intel compiler) 8A$$/.A& #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 80

235 SwissTM vs Sequential (x86 Intel compiler) 8A$$/.A& #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 80

236 SwissTM vs Sequential (x86 Intel compiler) 8A$$/.A& #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 80

237 SwissTM vs Sequential (x86 Intel compiler) 8A$$/.A& #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 80

238 SwissTM vs Sequential (x86 Intel compiler) 8A$$/.A& #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 80

239 SwissTM vs Sequential (x86 Intel compiler) 8A$$/.A& #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 80

240 SwissTM vs Sequential (x86 Intel compiler) 8A$$/.A& #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 80

241 SwissTM vs Sequential (x86 Intel compiler) 8A$$/.A& #!" +" *" )" (" '" &" %" $" #" #" $" &" *" #("!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 80

242 Compiler cost 8A$$/.A& $"!#,"!#+"!#*"!#)"!#("!#'"!#&"!#%"!#$" $" %" '" +" $)"!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 81

243 Compiler cost 8A$$/.A& $"!#,"!#+"!#*"!#)"!#("!#'"!#&"!#%"!#$" $" %" '" +" $)"!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 81

244 Compiler cost 8A$$/.A& $"!#,"!#+"!#*"!#)"!#("!#'"!#&"!#%"!#$" $" %" '" +" $)"!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 81

245 Compiler cost 8A$$/.A& $"!#,"!#+"!#*"!#)"!#("!#'"!#&"!#%"!#$" $" %" '" +" $)"!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 81

246 Compiler cost 8A$$/.A& $"!#,"!#+"!#*"!#)"!#("!#'"!#&"!#%"!#$" $" %" '" +" $)"!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 81

247 Compiler cost 8A$$/.A& $"!#,"!#+"!#*"!#)"!#("!#'"!#&"!#%"!#$" $" %" '" +" $)"!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 81

248 Compiler cost 8A$$/.A& $"!#,"!#+"!#*"!#)"!#("!#'"!#&"!#%"!#$" $" %" '" +" $)"!"!"#$%& '$()*$& +(,-./$-& 0*$"(%&1234& 0*$"(%&5)6& 5"7#-2(,4& 8%9":& ;"9"<)(&1234& ;"9"<)(&5)6& ="/"& 1"%4,"7>$& 8?2A>2%,& 81

249 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

250 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

251 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

252 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

253 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

254 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

255 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

256 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

257 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

258 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

259 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

260 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

261 SwissTM vs Sequential (Sparc) '$" E#$>F$ >G$>H$ EI$ '#" '!"!D&&(5D$$ &" %" $" #" '" #" $" &" '%" (#" %$"!"!"#$%&'($!"#$%&'()*+,-&$!"#$*+,-&$ "'.&/$ 0&123&$ 41-+5(&+$ 63&'1/$7,89$ 63&'1/$:2;$ A'('$ 7'/9-'<B&$ :,1C&($:,/-$ %<-+&&$!C,DB,/-$ 82

262 SwissTM vs Sequential x86 (16) SPARC (64) SwissTM (manual) SwissTM (compiler) 16/17 17/17 13/14 - Total 46/48 (95%) 83

263 SwissTM vs Sequential x86 (16) SPARC (64) min max avg min max avg SwissTM (manual) SwissTM (compiler)

264 Predicting STM performance 85

265 Predicting STM performance Can we guess STM performance? 85

266 Good performance? 12 STMBench7 9 Speedup Threads 86

267 Good performance? 12 9 STMBench7 Yes Read Speedup Threads 86

268 Good performance? 12 9 STMBench7 Or maybe no? Read Speedup 6 3 Write Threads 86

269 Open questions What is really going on? significant difference What should we do? given application - use TM or not? 87

270 Answers so far Rules of thumb Theory Existing models 88

271 Rules of Thumb Good performance if mostly read operations low contention Vague difficult to apply 89

272 Theory Worst case performance: Guerraoui, Kapałka 2007: Every progressive, single-version TM that uses invisible reads has the time complexity of Ω(k) where k = Obj. 90

273 Theory Worst case performance: Guerraoui, Kapałka 2007: Every progressive, single-version TM that uses invisible reads has the time complexity of Ω(k) where k = Obj. We want common case 90

274 Existing models Model one aspect: Heindl, Pokam, Adl-Tabatabai 2009: conflict detection algorithmic performance Answer specific question: Usui et al. 2009: use lock or STM for a critical section 91

275 Pragmatic approach 92

276 Pragmatic approach 92

277 Pragmatic approach 92

278 Pragmatic approach 92

279 Pragmatic approach 92

280 Pragmatic approach 92

281 Pragmatic approach speedup = f(n) 92

282 Constructing function 10 Genome 8 Speedup Threads 93

283 Constructing function 10 Genome 8 Speedup Threads 93

284 Constructing function 10 Genome f(n) 8 Speedup Threads 93

285 Desired properties Simple General Low error 94

286 Interpolation 10 Genome 8 Speedup Threads 95

287 Interpolation 10 8? Genome Speedup Threads 95

288 Interpolation 10 Genome 8 Speedup Threads 95

289 Interpolation 10 Genome Polynomial 8 Speedup Threads 95

290 Interpolation 10 Genome Rational 8 Speedup Threads 95

291 Choosing function Low error Simple Rational Polynomial Logarithmic Exponential Power 96

292 Choosing function Low error Simple Rational Polynomial Logarithmic Exponential Power 96

293 Extrapolation 15 Genome 12 Speedup Threads 97

294 Extrapolation 15 Genome Speedup ? Threads 97

295 Extrapolation 15 Genome 12 Speedup Threads 97

296 Extrapolation 15 Polynomial 8 Genome 12 Rational 8 Speedup Threads 97

297 Extrapolation 15 Genome Polynomial Rational 32 Speedup Threads 97

298 Extrapolation error 4 Intruder 3 Speedup Threads 98

299 Extrapolation error 4 Intruder 3 Rational 8 Speedup Threads 98

300 Extrapolation error 4 Intruder Speedup 3 2 6% Rational Threads 98

301 Extrapolation error 4 Intruder 3 Rational 16 Speedup Threads 98

302 Extrapolation error 4 Intruder 3 9% Rational 16 Speedup Threads 98

303 Extrapolation error 4 Intruder Speedup 3 2 Rational Threads 98

304 Extrapolation error 4 Intruder Speedup 3 2 Rational 32 7% Threads 98

305 Extrapolation error 4 Intruder ~10% for 2m Speedup 3 2 Rational 32 7% Threads 98

306 Applications Predict performance use TM or not? assign more CPUs? Predict optimal thread count schedule threads 99

307 Use TM or not? 12 Rational 8 9 Speedup Threads 100

308 Use TM or not? 12 Rational 8 9 Genome Speedup Threads 100

309 Use TM or not? 12 Rational 8 9 Genome Speedup Threads 100

310 Use TM or not? 12 Rational 8 9 Genome Speedup 6 3 SSCA Threads 100

311 Use TM or not? 12 Rational 8 9 Genome Speedup 6 3 SSCA Threads 100

312 Scheduling 1. Schedule ni threads 2. Measure performance 3. Construct f(n) 4. Find ni+1 for which f(n) is max 5. Schedule ni+1 threads 101

313 Scheduling 3 Intruder (max 22) Speedup 2 1 Measured Threads 102

314 Scheduling 3 Intruder (max 22) Speedup 2 1 f(n) Measured (max 24) Threads 102

315 Scheduling 3 Intruder (max 22) f(n) Speedup 2 1 Measured (max 64) Threads 102

316 Scheduling 3 Intruder (max 22) Speedup 2 1 f(n) Measured (max 22) Threads 102

317 Scheduling 3 Intruder (max 22) Speedup 2 1 f(n) Measured (max 22) Threads 102

318 Limitations f(n) strongly depends on workload computer system Not always accurate 103

319 Future work Find better functions Use characteristics of performance cannot be negative Use runtime information STM OS 104

320 Links SwissTM Intel C/C++ DTMC C/C++ (LLVM) 105

321 References Rachid Guerraoui, Michał Kapałka, and Jan Vitek. STMBench7: A Benchmark for Software Transactional Memory. EuroSys Rachid Guerraoui and Michał Kapałka. On the Correctness of Transactional Memory. PPoPP Aleksandar Dragojević, Rachid Guerraoui, and Michał Kapałka. Dividing Transactional Memories by Zero. Transact

322 References (2) Aleksandar Dragojević, Rachid Guerraoui, and Michał Kapałka. Stretching Transactional Memory. PLDI Aleksandar Dragojević, Pascal Felber, Vincent Gramoli, Rachid Guerraoui. Why STM can be more than a Research Toy. CACM April, Aleksandar Dragojević, Rachid Guerraoui. Predicting the Scalability of an STM: A Pragmatic Approach. Transact

323 Thank you 108

324 Questions, comments? 108

Stretching Transactional Memory

Stretching Transactional Memory Stretching Transactional Memory Aleksandar Dragojević Rachid Guerraoui Michał Kapałka Ecole Polytechnique Fédérale de Lausanne, School of Computer and Communication Sciences, I&C, Switzerland {aleksandar.dragojevic,

More information

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based? Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based

More information

Transactional Memory. R. Guerraoui, EPFL

Transactional Memory. R. Guerraoui, EPFL Transactional Memory R. Guerraoui, EPFL Locking is history Lock-freedom is difficult Wanted A concurrency control abstraction that is simple, robust and efficient Transactions Historical perspective Eswaran

More information

Transactional Memory

Transactional Memory Transactional Memory Michał Kapałka EPFL, LPD STiDC 08, 1.XII 2008 Michał Kapałka (EPFL, LPD) Transactional Memory STiDC 08, 1.XII 2008 1 / 25 Introduction How to Deal with Multi-Threading? Locks? Wait-free

More information

False Conflict Reduction in the Swiss Transactional Memory (SwissTM) System

False Conflict Reduction in the Swiss Transactional Memory (SwissTM) System False Conflict Reduction in the Swiss Transactional Memory (SwissTM) System Aravind Natarajan Department of Computer Engineering The University of Texas at Dallas Richardson, TX - 75080, USA Email: aravindn@utdallas.edu

More information

Conflict Detection and Validation Strategies for Software Transactional Memory

Conflict Detection and Validation Strategies for Software Transactional Memory Conflict Detection and Validation Strategies for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, William N. Scherer III, and Michael L. Scott University of Rochester www.cs.rochester.edu/research/synchronization/

More information

Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications

Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications Fernando Rui, Márcio Castro, Dalvan Griebler, Luiz Gustavo Fernandes Email: fernando.rui@acad.pucrs.br,

More information

Chí Cao Minh 28 May 2008

Chí Cao Minh 28 May 2008 Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor

More information

6.852: Distributed Algorithms Fall, Class 20

6.852: Distributed Algorithms Fall, Class 20 6.852: Distributed Algorithms Fall, 2009 Class 20 Today s plan z z z Transactional Memory Reading: Herlihy-Shavit, Chapter 18 Guerraoui, Kapalka, Chapters 1-4 Next: z z z Asynchronous networks vs asynchronous

More information

Lowering the Overhead of Nonblocking Software Transactional Memory

Lowering the Overhead of Nonblocking Software Transactional Memory Lowering the Overhead of Nonblocking Software Transactional Memory Virendra J. Marathe Michael F. Spear Christopher Heriot Athul Acharya David Eisenstat William N. Scherer III Michael L. Scott Background

More information

EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics

EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics Pervasive Parallelism Laboratory, Stanford University Sungpack Hong Tayo Oguntebi Jared Casper Nathan Bronson Christos Kozyrakis

More information

Improving the Practicality of Transactional Memory

Improving the Practicality of Transactional Memory Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter

More information

Unifying Thread-Level Speculation and Transactional Memory

Unifying Thread-Level Speculation and Transactional Memory Unifying Thread-Level Speculation and Transactional Memory João Barreto 1, Aleksandar Dragojevic 2, Paulo Ferreira 1, Ricardo Filipe 1, and Rachid Guerraoui 2 1 INESC-ID/Technical University Lisbon, Portugal

More information

Unifying Thread-Level Speculation and Transactional Memory

Unifying Thread-Level Speculation and Transactional Memory Unifying Thread-Level Speculation and Transactional Memory João Barreto 1, Aleksandar Dragojevic 2, Paulo Ferreira 1, Ricardo Filipe 1,, and Rachid Guerraoui 2 1 INESC-ID/Technical University Lisbon, Portugal

More information

Performance Evaluation of Adaptivity in STM. Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich

Performance Evaluation of Adaptivity in STM. Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich Performance Evaluation of Adaptivity in STM Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich Motivation STM systems rely on many assumptions Often contradicting for different

More information

Performance Comparison of Various STM Concurrency Control Protocols Using Synchrobench

Performance Comparison of Various STM Concurrency Control Protocols Using Synchrobench Performance Comparison of Various STM Concurrency Control Protocols Using Synchrobench Ajay Singh Dr. Sathya Peri Anila Kumari Monika G. February 24, 2017 STM vs Synchrobench IIT Hyderabad February 24,

More information

1 Publishable Summary

1 Publishable Summary 1 Publishable Summary 1.1 VELOX Motivation and Goals The current trend in designing processors with multiple cores, where cores operate in parallel and each of them supports multiple threads, makes the

More information

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat Relaxing Concurrency Control in Transactional Memory by Utku Aydonat A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of The Edward S. Rogers

More information

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs

More information

Preventing versus Curing: Avoiding Conflicts in Transactional Memories

Preventing versus Curing: Avoiding Conflicts in Transactional Memories Preventing versus Curing: Avoiding Conflicts in Transactional Memories Aleksandar Dragojević Anmol V. Singh Rachid Guerraoui Vasu Singh EPFL Abstract Transactional memories are typically speculative and

More information

HydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran. Hot Topics in Parallelism (HotPar '12), Berkeley, CA

HydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran. Hot Topics in Parallelism (HotPar '12), Berkeley, CA HydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran Hot Topics in Parallelism (HotPar '12), Berkeley, CA Motivation & Objectives Background Architecture Program Reconstruction Implementation

More information

Remote Transaction Commit: Centralizing Software Transactional Memory Commits

Remote Transaction Commit: Centralizing Software Transactional Memory Commits IEEE TRANSACTIONS ON COMPUTERS 1 Remote Transaction Commit: Centralizing Software Transactional Memory Commits Ahmed Hassan, Roberto Palmieri, and Binoy Ravindran Abstract Software Transactional Memory

More information

ByteSTM: Java Software Transactional Memory at the Virtual Machine Level

ByteSTM: Java Software Transactional Memory at the Virtual Machine Level ByteSTM: Java Software Transactional Memory at the Virtual Machine Level Mohamed Mohamedin Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment

More information

The Semantics of Progress in Lock-Based Transactional Memory

The Semantics of Progress in Lock-Based Transactional Memory The Semantics of Progress in Lock-Based Transactional Memory Rachid Guerraoui Michał Kapałka EPFL, Switzerland {rachid.guerraoui, michal.kapalka}@epfl.ch Abstract Transactional memory (TM) is a promising

More information

Performance Improvement via Always-Abort HTM

Performance Improvement via Always-Abort HTM 1 Performance Improvement via Always-Abort HTM Joseph Izraelevitz* Lingxiang Xiang Michael L. Scott* *Department of Computer Science University of Rochester {jhi1,scott}@cs.rochester.edu Parallel Computing

More information

STMBench7: A Benchmark for Software Transactional Memory

STMBench7: A Benchmark for Software Transactional Memory STMBench7: A Benchmark for Software Transactional Memory Rachid Guerraoui School of Computer and Communication Sciences, EPFL Michał Kapałka School of Computer and Communication Sciences, EPFL Jan Vitek

More information

1 Introduction. Michał Kapałka EPFL, Switzerland Rachid Guerraoui EPFL, Switzerland

1 Introduction. Michał Kapałka EPFL, Switzerland Rachid Guerraoui EPFL, Switzerland THE THEORY OF TRANSACTIONAL MEMORY Rachid Guerraoui EPFL, Switzerland rachid.guerraoui@epfl.ch Michał Kapałka EPFL, Switzerland michal.kapalka@epfl.ch Abstract Transactional memory (TM) is a promising

More information

Implementing and Evaluating Nested Parallel Transactions in STM. Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University

Implementing and Evaluating Nested Parallel Transactions in STM. Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University Implementing and Evaluating Nested Parallel Transactions in STM Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University Introduction // Parallelize the outer loop for(i=0;i

More information

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28 Transactional Memory or How to do multiple things at once Benjamin Engel Transactional Memory 1 / 28 Transactional Memory: Architectural Support for Lock-Free Data Structures M. Herlihy, J. Eliot, and

More information

Ensuring Irrevocability in Wait-free Transactional Memory

Ensuring Irrevocability in Wait-free Transactional Memory Ensuring Irrevocability in Wait-free Transactional Memory Jan Kończak Institute of Computing Science, Poznań University of Technology Poznań, Poland jan.konczak@cs.put.edu.pl Paweł T. Wojciechowski Institute

More information

Scalable Software Transactional Memory for Chapel High-Productivity Language

Scalable Software Transactional Memory for Chapel High-Productivity Language Scalable Software Transactional Memory for Chapel High-Productivity Language Srinivas Sridharan and Peter Kogge, U. Notre Dame Brad Chamberlain, Cray Inc Jeffrey Vetter, Future Technologies Group, ORNL

More information

Speculative Synchronization

Speculative Synchronization Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization

More information

Lock vs. Lock-free Memory Project proposal

Lock vs. Lock-free Memory Project proposal Lock vs. Lock-free Memory Project proposal Fahad Alduraibi Aws Ahmad Eman Elrifaei Electrical and Computer Engineering Southern Illinois University 1. Introduction The CPU performance development history

More information

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,

More information

Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory

Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev MIT amatveev@csail.mit.edu Nir Shavit MIT shanir@csail.mit.edu Abstract Because of hardware TM limitations, software

More information

Scheduling Transactions in Replicated Distributed Transactional Memory

Scheduling Transactions in Replicated Distributed Transactional Memory Scheduling Transactions in Replicated Distributed Transactional Memory Junwhan Kim and Binoy Ravindran Virginia Tech USA {junwhan,binoy}@vt.edu CCGrid 2013 Concurrency control on chip multiprocessors significantly

More information

Open Nesting in Software Transactional Memory. Paper by: Ni et al. Presented by: Matthew McGill

Open Nesting in Software Transactional Memory. Paper by: Ni et al. Presented by: Matthew McGill Open Nesting in Software Transactional Memory Paper by: Ni et al. Presented by: Matthew McGill Transactional Memory Transactional Memory is a concurrent programming abstraction intended to be as scalable

More information

Serializability of Transactions in Software Transactional Memory

Serializability of Transactions in Software Transactional Memory Serializability of Transactions in Software Transactional Memory Utku Aydonat Tarek S. Abdelrahman Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto {uaydonat,tsa}@eecg.toronto.edu

More information

6 Transactional Memory. Robert Mullins

6 Transactional Memory. Robert Mullins 6 Transactional Memory ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Limitations of lock-based programming Transactional memory Programming with TM ( STM ) Software TM ( HTM ) Hardware TM 2

More information

! Part I: Introduction. ! Part II: Obstruction-free STMs. ! DSTM: an obstruction-free STM design. ! FSTM: a lock-free STM design

! Part I: Introduction. ! Part II: Obstruction-free STMs. ! DSTM: an obstruction-free STM design. ! FSTM: a lock-free STM design genda Designing Transactional Memory ystems Part II: Obstruction-free TMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch! Part I: Introduction! Part II: Obstruction-free TMs! DTM: an obstruction-free

More information

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Chen, Vijay Chidambaram, Emmett Witchel The University of Texas at Austin

More information

EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION

EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION GUOWEI ZHANG, VIRGINIA CHIU, DANIEL SANCHEZ MICRO 2016 Executive summary 2 Exploiting commutativity benefits update-heavy apps Software techniques

More information

Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * Technion Yahoo Research

Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * Technion Yahoo Research Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * * Technion Yahoo Research 1 Mul'-Threading is Everywhere 2 Agenda Mo@va@on Concurrent Data Structure Libraries (CDSLs)

More information

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end

More information

Integrating Transactionally Boosted Data Structures with STM Frameworks: A Case Study on Set

Integrating Transactionally Boosted Data Structures with STM Frameworks: A Case Study on Set Integrating Transactionally Boosted Data Structures with STM Frameworks: A Case Study on Set Ahmed Hassan Roberto Palmieri Binoy Ravindran Virginia Tech hassan84@vt.edu robertop@vt.edu binoy@vt.edu Abstract

More information

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 28 November 2014

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 28 November 2014 NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 28 November 2014 Lecture 8 Problems with locks Atomic blocks and composition Hardware transactional memory Software transactional memory

More information

SELF-TUNING HTM. Paolo Romano

SELF-TUNING HTM. Paolo Romano SELF-TUNING HTM Paolo Romano 2 Based on ICAC 14 paper N. Diegues and Paolo Romano Self-Tuning Intel Transactional Synchronization Extensions 11 th USENIX International Conference on Autonomic Computing

More information

Transactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Transactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Transactional Memory Companion slides for The by Maurice Herlihy & Nir Shavit Our Vision for the Future In this course, we covered. Best practices New and clever ideas And common-sense observations. 2

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

ABORTING CONFLICTING TRANSACTIONS IN AN STM

ABORTING CONFLICTING TRANSACTIONS IN AN STM Committing ABORTING CONFLICTING TRANSACTIONS IN AN STM PPOPP 09 2/17/2009 Hany Ramadan, Indrajit Roy, Emmett Witchel University of Texas at Austin Maurice Herlihy Brown University TM AND ITS DISCONTENTS

More information

Transactifying Apache s Cache Module

Transactifying Apache s Cache Module H. Eran O. Lutzky Z. Guz I. Keidar Department of Electrical Engineering Technion Israel Institute of Technology SYSTOR 2009 The Israeli Experimental Systems Conference Outline 1 Why legacy applications

More information

Database Management Systems Concurrency Control

Database Management Systems Concurrency Control atabase Management Systems Concurrency Control B M G 1 BMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHOS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files ata Files

More information

On Improving Transactional Memory: Optimistic Transactional Boosting, Remote Execution, and Hybrid Transactions

On Improving Transactional Memory: Optimistic Transactional Boosting, Remote Execution, and Hybrid Transactions On Improving Transactional Memory: Optimistic Transactional Boosting, Remote Execution, and Hybrid Transactions Ahmed Hassan Preliminary Examination Proposal submitted to the Faculty of the Virginia Polytechnic

More information

Software transactional memory

Software transactional memory Transactional locking II (Dice et. al, DISC'06) Time-based STM (Felber et. al, TPDS'08) Mentor: Johannes Schneider March 16 th, 2011 Motivation Multiprocessor systems Speed up time-sharing applications

More information

Lecture 12 Transactional Memory

Lecture 12 Transactional Memory CSCI-UA.0480-010 Special Topics: Multicore Programming Lecture 12 Transactional Memory Christopher Mitchell, Ph.D. cmitchell@cs.nyu.edu http://z80.me Database Background Databases have successfully exploited

More information

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section

More information

How Live Can a Transactional Memory Be?

How Live Can a Transactional Memory Be? How Live Can a Transactional Memory Be? Rachid Guerraoui Michał Kapałka February 18, 2009 Abstract This paper asks how much liveness a transactional memory (TM) implementation can guarantee. We first devise

More information

Multi-Core Computing with Transactional Memory. Johannes Schneider and Prof. Roger Wattenhofer

Multi-Core Computing with Transactional Memory. Johannes Schneider and Prof. Roger Wattenhofer Multi-Core Computing with Transactional Memory Johannes Schneider and Prof. Roger Wattenhofer 1 Overview Introduction Difficulties with parallel (multi-core) programming A (partial) solution: Transactional

More information

10 Supporting Time-Based QoS Requirements in Software Transactional Memory

10 Supporting Time-Based QoS Requirements in Software Transactional Memory 1 Supporting Time-Based QoS Requirements in Software Transactional Memory WALTHER MALDONADO, University of Neuchâtel, Switzerland PATRICK MARLIER, University of Neuchâtel, Switzerland PASCAL FELBER, University

More information

On Exploring Markov Chains for Transaction Scheduling Optimization in Transactional Memory

On Exploring Markov Chains for Transaction Scheduling Optimization in Transactional Memory WTTM 2015 7 th Workshop on the Theory of Transactional Memory On Exploring Markov Chains for Transaction Scheduling Optimization in Transactional Memory Pierangelo Di Sanzo, Marco Sannicandro, Bruno Ciciani,

More information

LogTM: Log-Based Transactional Memory

LogTM: Log-Based Transactional Memory LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture () 26 Mulitfacet

More information

University of Karlsruhe (TH)

University of Karlsruhe (TH) University of Karlsruhe (TH) Research University founded 1825 Technical Briefing Session Multicore Software Engineering @ ICSE 2009 Transactional Memory versus Locks - A Comparative Case Study Victor Pankratius

More information

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence? CPSC-4/6: Operang Systems Atomic Transactions The Transaction Model / Primitives Serializability Implementation Serialization Graphs 2-Phase Locking Optimistic Concurrency Control Transactional Memory

More information

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications José éf. Martínez * and Josep Torrellas University of Illinois ASPLOS 2002 * Now at Cornell University Overview Allow

More information

An Update on Haskell H/STM 1

An Update on Haskell H/STM 1 An Update on Haskell H/STM 1 Ryan Yates and Michael L. Scott University of Rochester TRANSACT 10, 6-15-2015 1 This work was funded in part by the National Science Foundation under grants CCR-0963759, CCF-1116055,

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 22: More Transaction Implementations 1 Review: Schedules, schedules, schedules The DBMS scheduler determines the order of operations from txns are executed

More information

Understanding Hardware Transactional Memory

Understanding Hardware Transactional Memory Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence

More information

arxiv: v1 [cs.dc] 13 Aug 2013

arxiv: v1 [cs.dc] 13 Aug 2013 Opacity of Memory Management in Software Transactional Memory Holger Machens and Volker Turau Institute of Telematics Hamburg University of Technology {machens,turau}@tuhh.de http://www.ti5.tuhh.de arxiv:1308.2881v1

More information

Potential violations of Serializability: Example 1

Potential violations of Serializability: Example 1 CSCE 6610:Advanced Computer Architecture Review New Amdahl s law A possible idea for a term project Explore my idea about changing frequency based on serial fraction to maintain fixed energy or keep same

More information

Lock-based concurrency control. Q: What if access patterns rarely, if ever, conflict? Serializability. Concurrency Control II (OCC, MVCC)

Lock-based concurrency control. Q: What if access patterns rarely, if ever, conflict? Serializability. Concurrency Control II (OCC, MVCC) Concurrency Control II (CC, MVCC) CS 418: Distributed Systems Lecture 18 Serializability Execution of a set of transactions over multiple items is equivalent to some serial execution of s Michael Freedman

More information

Coping With Context Switches in Lock-Based Software Transacional Memory Algorithms

Coping With Context Switches in Lock-Based Software Transacional Memory Algorithms TEL-AVIV UNIVERSITY RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES SCHOOL OF COMPUTER SCIENCE Coping With Context Switches in Lock-Based Software Transacional Memory Algorithms Dissertation submitted

More information

Heckaton. SQL Server's Memory Optimized OLTP Engine

Heckaton. SQL Server's Memory Optimized OLTP Engine Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability

More information

On the Correctness of Transactional Memory

On the Correctness of Transactional Memory On the Correctness of Transactional Memory Rachid Guerraoui Michał Kapałka School of Computer and Communication Sciences, EPFL {rachid.guerraoui, michal.kapalka}@epfl.ch Abstract Transactional memory (TM)

More information

Commit Phase in Timestamp-based STM

Commit Phase in Timestamp-based STM Commit Phase in Timestamp-based STM Rui Zhang Dept. of Computer Science Rice University Houston, TX 77005, USA ruizhang@rice.edu Zoran Budimlić Dept. of Computer Science Rice University Houston, TX 77005,

More information

Concurrency in Transactional Memory

Concurrency in Transactional Memory Concurrency in Transactional Memory Vincent Gramoli To cite this version: Vincent Gramoli. Concurrency in Transactional Memory. Distributed, Parallel, and Cluster Computing [cs.dc]. UPMC - Sorbonne University,

More information

A Causality-Based Runtime Check for (Rollback) Atomicity

A Causality-Based Runtime Check for (Rollback) Atomicity A Causality-Based Runtime Check for (Rollback) Atomicity Serdar Tasiran Koc University Istanbul, Turkey Tayfun Elmas Koc University Istanbul, Turkey RV 2007 March 13, 2007 Outline This paper: Define rollback

More information

A Machine Learning Approach to Adaptive Software Transaction Memory

A Machine Learning Approach to Adaptive Software Transaction Memory Lehigh University Lehigh Preserve Theses and Dissertations 2011 A Machine Learning Approach to Adaptive Software Transaction Memory Qingping Wang Lehigh University Follow this and additional works at:

More information

Transaction Management: Introduction (Chap. 16)

Transaction Management: Introduction (Chap. 16) Transaction Management: Introduction (Chap. 16) CS634 Class 14 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke What are Transactions? So far, we looked at individual queries;

More information

Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory

Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory Mohamed Mohamedin Virginia Tech Blacksburg, Virginia mohamedin@vt.edu Sebastiano Peluso Virginia Tech Blacksburg, Virginia peluso@vt.edu

More information

Concurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis

Concurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis oncurrent programming: From theory to practice oncurrent Algorithms 2015 Vasileios Trigonakis From theory to practice Theoretical (design) Practical (design) Practical (implementation) 2 From theory to

More information

Schedule. Today: Feb. 21 (TH) Feb. 28 (TH) Feb. 26 (T) Mar. 5 (T) Read Sections , Project Part 6 due.

Schedule. Today: Feb. 21 (TH) Feb. 28 (TH) Feb. 26 (T) Mar. 5 (T) Read Sections , Project Part 6 due. Schedule Today: Feb. 21 (TH) Transactions, Authorization. Read Sections 8.6-8.7. Project Part 5 due. Feb. 26 (T) Datalog. Read Sections 10.1-10.2. Assignment 6 due. Feb. 28 (TH) Datalog and SQL Recursion,

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

A Dynamic Instrumentation Approach to Software Transactional Memory. Marek Olszewski

A Dynamic Instrumentation Approach to Software Transactional Memory. Marek Olszewski A Dynamic Instrumentation Approach to Software Transactional Memory by Marek Olszewski A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All

More information

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Programming

More information

A Relativistic Enhancement to Software Transactional Memory

A Relativistic Enhancement to Software Transactional Memory A Relativistic Enhancement to Software Transactional Memory Philip W. Howard Portland State University Jonathan Walpole Portland State University Abstract Relativistic Programming is a technique that allows

More information

Summary: Open Questions:

Summary: Open Questions: Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization

More information

Enhancing efficiency of Hybrid Transactional Memory via Dynamic Data Partitioning Schemes

Enhancing efficiency of Hybrid Transactional Memory via Dynamic Data Partitioning Schemes Enhancing efficiency of Hybrid Transactional Memory via Dynamic Data Partitioning Schemes Pedro Raminhas Instituto Superior Técnico, Universidade de Lisboa Lisbon, Portugal Email: pedro.raminhas@tecnico.ulisboa.pt

More information

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory 1 Capacity Limitations P P P P B1 C C B1 C C Mem Coherence Monitor Mem Coherence Monitor B2 In a Sequent NUMA-Q design above,

More information

Performance Tradeoffs in Software Transactional Memory

Performance Tradeoffs in Software Transactional Memory Master Thesis Computer Science Thesis no: MCS-2010-28 May 2010 Performance Tradeoffs in Software Transactional Memory Gulfam Abbas Naveed Asif School School of Computing of Computing Blekinge Blekinge

More information

Monitors; Software Transactional Memory

Monitors; Software Transactional Memory Monitors; Software Transactional Memory Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 18, 2012 CPD (DEI / IST) Parallel and

More information

Concurrency control CS 417. Distributed Systems CS 417

Concurrency control CS 417. Distributed Systems CS 417 Concurrency control CS 417 Distributed Systems CS 417 1 Schedules Transactions must have scheduled so that data is serially equivalent Use mutual exclusion to ensure that only one transaction executes

More information

Problems Caused by Failures

Problems Caused by Failures Problems Caused by Failures Update all account balances at a bank branch. Accounts(Anum, CId, BranchId, Balance) Update Accounts Set Balance = Balance * 1.05 Where BranchId = 12345 Partial Updates - Lack

More information

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636) What are Transactions? Transaction Management: Introduction (Chap. 16) CS634 Class 14, Mar. 23, 2016 So far, we looked at individual queries; in practice, a task consists of a sequence of actions E.g.,

More information

Summary: Issues / Open Questions:

Summary: Issues / Open Questions: Summary: The paper introduces Transitional Locking II (TL2), a Software Transactional Memory (STM) algorithm, which tries to overcomes most of the safety and performance issues of former STM implementations.

More information

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Concurrency Control Part 2 (R&G ch. 17) Serializability Two-Phase Locking Deadlocks

More information

CS377P Programming for Performance Multicore Performance Synchronization

CS377P Programming for Performance Multicore Performance Synchronization CS377P Programming for Performance Multicore Performance Synchronization Sreepathi Pai UTCS October 21, 2015 Outline 1 Synchronization Primitives 2 Blocking, Lock-free and Wait-free Algorithms 3 Transactional

More information

TMUNIT: Testing Software Transactional Memories

TMUNIT: Testing Software Transactional Memories TMUNIT: Testing Software Transactional Memories Derin Harmanci Pascal Felber University of Neuchâtel Switzerland {derin.harmanci,pascal.felber}@unine.ch Vincent Gramoli EPFL & University of Neuchâtel Switzerland

More information

Extracting Parallelism from Legacy Sequential Code Using Software Transactional Memory

Extracting Parallelism from Legacy Sequential Code Using Software Transactional Memory Extracting Parallelism from Legacy Sequential Code Using Software Transactional Memory Mohamed M. Saad Preliminary Examination Proposal submitted to the Faculty of the Virginia Polytechnic Institute and

More information