TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS. Xiangyao Yu, Srini Devadas CSAIL, MIT

Size: px

Start display at page:

Download "TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS. Xiangyao Yu, Srini Devadas CSAIL, MIT"

Brendan McGee
6 years ago
Views:

1 TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT

2 FOR FIFTY YEARS, WE HAVE RIDDEN MOORE S LAW Moore s Law and the scaling of clock frequency = printing press for the currency of performance

3 10,000,000 1,000, ,000 10,000 1, TECHNOLOGY SCALING u Transistors x 1000 Clock frequency (MHz) Power (W) Cores Each generation of Moore s law doubles the number of transistors but clock frequency has stopped increasing.

4 10,000,000 1,000, ,000 10,000 1, TECHNOLOGY SCALING u Transistors x 1000 Clock frequency (MHz) Power (W) Cores To increase performance, need to exploit parallelism.

5 DIFFERENT KINDS OF PARALLELISM - 1 Instruction Level Transaction Level a = b + c d = e + f g = d + b Read A Read B Compute C Read A Read D Compute E Read C Read E Compute F 5

6 DIFFERENT KINDS OF PARALLELISM - 2 Thread Level A x B = C Task Level Search( image ) Cloud Different thread computes each entry of product matrix C 6

7 DIFFERENT KINDS OF PARALLELISM - 3 Thread Level A x B = C User Level Search( image ) Lookup( data ) Cloud Different thread computes each entry of product matrix C Query( record ) 7

8 DEPENDENCY DESTROYS PARALLELISM for i = 1 to n a[b[i]] = (a[b[i - 1]] + b[i]) / c[i] Need to compute i th entry after i - 1 th has been computed L 8

9 Read A Read A DIFFERENT KINDS OF DEPENDENCY No dependency! Write A Write A WAW: Semantics decide order Write A Read A RAW: Read needs new value Read A Write A WAR: We have flexibility here! 9

10 DEPENDENCE IS ACROSS TIME, BUT WHAT IS TIME? Time can be physical time Time could correspond to logical timestamps assigned to instructions Time could be a combination of the above à Time is a definition of ordering 10

11 Read A WAR DEPENDENCE Initially A = 10 Thread 0 Thread 1 Logical Order Write A Physical Time Order A=13 Local copy of A = 10 Read happens later than Write in physical time but is before Write in logical time. 11

12 WHAT IS CORRECTNESS? We define correctness of a parallel program based on its outputs in relation to the program run sequentially 12

13 SEQUENTIAL CONSISTENCY A B C D Global Memory Order A B C D A C B D C D A B C B A D Can we exploit this freedom in correct execution to avoid dependency? 13

AVOIDING DEPENDENCY ACROSS THE STACK Circuit

Database Distributed Shared Memory Efficient atomic

concurrency control with Andy Pavlo and Daniel

14 AVOIDING DEPENDENCY ACROSS THE STACK Circuit Multicore Processor Multicore Database Distributed Database Distributed Shared Memory Efficient atomic instructions Tardis coherence protocol TicToc concurrency control with Andy Pavlo and Daniel Sanchez Distributed TicToc Transaction processing with fault tolerance 14

15 SHARED MEMORY SYSTEMS Cache Coherence Concurrency Control Multi-core Processor OLTP Database 15

16 DIRECTORY-BASED COHERENCE P P P P P P P P P Data replicated and cached locally for access Uncached data copied to local cache, writes invalidate data copies 16

17 CACHE COHERENCE SCALABILITY 250% Storage Overhead 200% 150% 100% 50% 0% Today Core Count Read A Read A Write A Invalidation O(N) Sharer List 17

18 LEASE-BASED COHERENCE Ld(A) Core 0 Core 1 Write Timestamp (wts) Ld(A) Read Timestamp (rts) Ld(A) St(A) Program Timestamp (pts) t= A read gets a lease on a cacheline Lease renewal aler lease expires A store can only commit aler leases expire Tardis: logical leases Logical Time Timestamp 18

19 LOGICAL TIMESTAMP Invalidation Tardis (No Invalidation) Physical Time Order Logical Time Order (concept borrowed from database) Old Version New Version logical time 19

20 TIMESTAMP MANAGEMENT Core A state A B pts=5 wts Shared LLC Cache S 0 10 S 0 10 S 0 5 rts Program Timestamp (pts) Timestamp of last memory operation Write Timestamp (wts) Data created at wts Read Timestamp (rts) Data valid from wts to rts wts Lease rts Logical Time 20

21 TWO-CORE EXAMPLE Core 0 pts=0 Cache Core 1 pts=0 Cache Core 0 store A load B load B 3 4 Core 1 store B load A A S 0 0 B S 0 0 physical time 21

22 STORE CORE 0 Core 0 pts=0 pts=1 A M Cache 1 1 ST(A) Req Core 1 pts=0 Cache Core 0 store A load B load B 3 4 Core 1 store B load A A MS 0 0 B S 0 0 M owner:0 Write at pts = 1 22

23 LOAD CORE 0 Core 0 pts=1 Cache A M 1 1 B LD(B) Req Core 1 pts=0 Cache Core 0 store A load B load B 3 4 Core 1 store B load A A M owner:0 B S Reserve rts to pts + lease = 11 23

24 STORE CORE 1 Core 0 pts=1 Cache A M 1 1 B S 0 11 Core 1 pts=0 pts=12 Cache B M ST(B) Req Core 0 store A load B load B 3 4 Core 1 store B load A A M B MS 0 11 owner:0 M owner:1 Exclusive ownership returned No invalidation 24

25 Two VERSIONS COEXIST Core 0 pts=1 Cache A M 1 1 B S 0 11 Core 1 pts=12 Cache B M Core 0 store A load B load B 3 4 Core 1 store B load A A B M owner:0 M owner:1 Core 1 traveled ahead in time Versions ordered in logical time 25

26 LOAD CORE 1 Core 0 pts=1 Cache A M S B S 0 11 WB(A) Req S M owner: A B Core 1 A B M owner:1 pts=12 Cache M LD(A) Req Core 0 store A load B load B Write back request to Core 0 Downgrade from M to S Core 1 store B load A Reserve rts to pts + lease =

27 LOAD CORE 0 Core 0 pts=1 A Core 1 M owner:0 B M owner:1 pts=12 Cache Cache A S 1 22 A S 1 22 B S 0 11 B M Core 0 store A load B physical time 5 load B Core 1 store B load A logical timestamp 5 > physical 4, 5 < logical 4 time 3 4 time global memory order physical time order 27

28 Core 0 store A load B RAW load B SUMMARY OF EXAMPLE Directory RAW Core 1 WAR store B load A physical time Core 0 store A load B load B Tardis RAW WAR Core 1 store B load A physical time order physical + logical time order 28

29 PHYSIOLOGICAL TIME Core 0 store A (1) load B (1) load B (1) Tardis Core 1 store B (12) load A (12) Global Memory Order Physical Time Logical Time Physiological Time X < PL Y := X < L Y or (X = L Y and X < P Y) Thm: Tardis obeys Sequential Consistency 29

30 TARDIS PROS AND CONS Scalability No Invalidation, Multicast or Broadcast Lease Renew Speculative Read Timestamp Size Timestamp Compression Time Stands Still Livelock Avoidance 30

31 EVALUATION Storage overhead per cacheline (N cores) Directory: N bits per cacheline Tardis: Max(Const, log(n)) bits per cacheline 250% 200% 150% 100% 50% 0% Directory Tardis

32 SPEEDUP Normalized Speedup Graphite Multi-core Simulator (64 cores) DIRECTORY TARDIS TARDIS AT 256 CORES 32

33 NETWORK TRAFFIC Normalized Traffic DIRECTORY TARDIS TARDIS AT 256 CORES 33

34 Network Traffic Storage Overhead Snoopy Coherence Directory Coherence Optimized Directory TARDIS High Complexity Performance Degradation 34

35 CONCURRENCY CONTROL BEGIN COMMIT BEGIN BEGIN COMMIT BEGIN COMMIT BEGIN COMMIT Serializability COMMIT BEGIN COMMIT BEGIN COMMIT BEGIN COMMIT Results should correspond to some serial order of atomic execution 35

36 CONCURRENCY CONTROL BEGIN COMMIT BEGIN COMMIT BEGIN COMMIT BEGIN COMMIT Can t Have This BEGIN BEGIN COMMIT COMMIT BEGIN COMMIT BEGIN COMMIT Results should correspond to some serial order of atomic execution 36

37 BOTTLENECK 1: TIMESTAMP ALLOCATION Centralized Allocator Timestamp allocation is a scalability bottleneck Throughput (Million txn/s) T/O 2PL Synchronized Clock Clock skew causes unnecessary aborts Thread Count 37

38 BOTTLENECK 2: STATIC ASSIGNMENT Time Timestamps assigned before a transaction starts T1@ts=1 T2@ts=2 BEGIN BEGIN READ(A) WRITE(A) COMMIT COMMIT Suboptimal assignment leads to unnecessary aborts. T1@ts=2 T2@ts=1 BEGIN BEGIN READ(A) WRITE(A) COMMIT ABORT 38

39 KEY IDEA: DATA DRIVEN TIMESTAMP MANAGEMENT Traditional T/O 1. Acquire timestamp (TS) 2. Determine tuple visibility using TS Timestamp Allocation Static Timestamp Assignment TicToc 1. Access tuples and remember their timestamp info. 2. Compute commit timestamp (CommitTS) No Timestamp Allocation Dynamic Timestamp Assignment 39

40 TICTOC TRANSACTION EXECUTION BEGIN READ PHASE VALIDATION PHASE WRITE PHASE COMMIT Read & Write Tuples Execute Transaction Compute CommitTS Decide Commit/Abort Update Database wts : rts : last data wts last data rts data valid between wts and rts Tuple Format Data wts (Write Timestamp) rts (Read Timestamp) 40

41 TICTOC EXAMPLE Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B Database States 5 commit? 6 commit? T1 Transaction Local States T2 41

42 LOAD A FROM T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B 5 commit? 6 commit? T1 Load a snapshot of tuple A - Data, wts and rts T2 42

43 LOAD A FROM T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B 5 commit? 6 commit? T1 Load a snapshot of tuple A - Data, wts and rts T2 43

44 STORE B FROM T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B 5 commit? 6 commit? T1 Store B to local write set T2 44

45 LOAD B FROM T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B 5 commit? 6 commit? T1 Load a snapshot of tuple B - Data, wts and rts T2 45

46 COMMIT PHASE OF T Logical Time Tuple A Tuple B v1 v1 Transaction load A store B commit? Transaction load A load B commit? T1 T2 Compute CommitTS Write Set: tuple.rts+1 CommitTs Read Set: tuple.wts CommitTs tuple.rts 46

47 COMMIT PHASE OF T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B T1 TS=2 5 commit? 6 commit? T1 rts extension on tuple A T2 47

48 COMMIT PHASE OF T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 v2 1 3 load A store B 2 4 load A load B T1 TS=2 5 commit? 6 commit? T1 Copy tuple B from write set to database T2 48

49 COMMIT PHASE OF T Logical Time Tuple A Tuple B v1 v1 v2 Transaction load A store B commit? Transaction load A load B commit? T1 T2 T1 TS=2 Compute CommitTS for T2 Find consistent read time for T2 (no writes in T2) 49

50 FINAL STATE Logical Time Tuple A Tuple B v1 v1 v2 Transaction load A store B commit? Transaction load A load B commit? T1 T2 T2 TS=0 T1 TS=2 Txn 1 < Txn 2 physical time Txn 2 < logical Txn 1 time Thm: Serializability = All operations valid at CommitTS 50

51 EXPERIMENTAL SETUP DBx1000: Main Memory DBMS No logging No B-tree (hash indexing) Concurrency Control Algorithms MVCC: HEKATON (Microsoft) OCC: SILO (Harvard/MIT) 2PL: DL_DETECT, NO_WAIT 10 GB YCSB Benchmark 51

52 EVALUATION TICTOC HEKATON DL_DETECT NO_WAIT SILO YCSB -- Medium Contention YCSB -- High Contention Throughput (Million txn/s) 2 1 Throughput (Million txn/s) Thread Count Thread Count 52

53 TICTOC DISCUSSION Thm: Serializability = All ops valid at CommitTS Transactions may have same CommitTS Logical timestamp growing rate indicates inherent parallelism Commit Timestamp TS ALLOC High Medium # of CommiCed Txns 53

54 PHYSIOLOGICAL TIME ACROSS THE STACK Circuit Multicore Processor Multicore Database Distributed Database Distributed Shared Memory Efficient atomic instructions Tardis coherence protocol TicToc concurrency control Distributed TicToc Transaction processing with fault tolerance 54

55 ATOMIC INSTRUCTION (LR/SC) ABA Problem Detect ABA using timestamp (wts) Core 0 Core 1 LR(x); # x = A ST(x); # x = B ST(x); # x = A SC(x); # x = A ABA? 55

56 TARDIS CACHE COHERENCE Simple: No Invalidation Scalable: O(log N) storage No Broadcast, No Multicast No Clock Synchronization Support Relaxed Consistency Models 56

57 T1000: PROPOSED 1000-CORE SHARED MEMORY PROCESSOR 57

58 TICTOC CONCURRENCY CONTROL Data Driven Timestamp Management No Central Timestamp Allocation Dynamic Timestamp Assignment 58

59 DISTRIBUTED TICTOC Data Driven Timestamp Management Efficient Two-Phase Commit Protocol Support Local Caching of Remote Data 59

60 FAULT TOLERANT DISTRIBUTED SHARED MEMORY Transactional Programming Model Distributed Command Logging Dynamic Dependency Tracking Among Transactions (WAR dependency can be ignored) 60

61 TIME TRAVELING TO ELIMINATE WAR Xiangyao Yu, Srini Devadas CSAIL, MIT

TicToc: Time Traveling Optimistic Concurrency Control

TicToc: Time Traveling Optimistic Concurrency Control Authors: Xiangyao Yu, Andrew Pavlo, Daniel Sanchez, Srinivas Devadas Presented By: Shreejit Nair 1 Background: Optimistic Concurrency Control ØRead