locker: distributed locking Knut

Size: px

Start display at page:

Download "locker: distributed locking Knut"

David Campbell
6 years ago
Views:

1 locker: distributed locking Knut

2 The need Real-time multiplayer game at Wooga Stateful One process per user One process per world

3 The need Only one process per user & world Cannot reconcile game worlds Strong consistency Fine-grained distributed coordination Lookup table when sending game updates

4 Client Client visit world #123 visit world #123 World #123 World #123 lock ok lock already_locked Lock

5 Client Client World #123

6 Next generation Lock implemented once already Central serialization with atomic ops (Redis) SPoF Next gen want higher availability, because..

7 Living with failure Hardware sometimes fails The network is mostly ok Software is almost bug-free Downtime is often short

8 Living with a SPoF sucks..

9 Development easy Operations suck Change become scary Operator error becomes costly Must fix problems immediately

10 Requirements ~10k conditional writes per second ~3M eventually consistent reads per second ~150k reads on local replica Lock expires if not kept alive Dynamic cluster membership

11 ZooKeeper looks like the best option

12 What would the dream solution look like?

13 LB LB S3 DB

16 N=5, W=3

17 N=5, W=3

18 N=5, W=3

19 N=5, W=3

20 N=5, W=3

21 Run inside our app servers Easy to debug, instrument, monitor, deploy Simple operations

22 How hard can it be?

23 Distributed systems are hard Many books, papers on distributed systems Riak a good example of mindset Idea: pick the algorithms that fits best

24 Simplest thing that could possibly work

25 good enough Problem Consistency, conditional writes Availability Local replica Solution Serialization, 2 Phase Commit Multiple serializers, quorum (CP) Replay transaction log Dynamic cluster Manual configuration Anti-entropy Lock keep alive

26 Implementation Proof of concept in Friday afternoon Looked promising, spent ~3 weeks Turned into production quality Test race conditions PropEr 330 lines (!)

27 locker

28 Beware tradeoffs! Keep consistency, sacrifice availability during failure No persistency, assumes enough masters stays up No group membership, views Manually configure masters, replicas, quorum value Assumes perfect network during reconfiguration Not based on a described algorithm

29 Usage start_link(w) set_nodes(allnodes, Masters, Replicas) lock(key, Value, LeaseLength) extend_lease(key, Value, LeaseLength) release(key, Value) dirty_read(key)

30 lock(key, Value, LeaseLength) Two phases: Prepare: Ask masters for votes Commit: If majority, write on masters Timeout counted as negative vote, CP Asynchronous replication, wait_for/2

31 Client A Client B locker:lock(foo, pid(0,123,0)) Master #1 Master #2 Master #3

32 Client A locker:lock(foo, pid(0,123,0)) Client B Send: {get_write_lock, foo, not_found} Master #1 Master #2 Master #3 Write locks:[] [] []

33 Client A locker:lock(foo, pid(0,123,0)) Client B Send: {get_write_lock, foo, not_found} Master #1 Master #2 Master #3 [{lock, foo}] [{lock, foo}] []

34 Client A locker:lock(foo, pid(0,123,0)) Client B Send: {get_write_lock, foo, not_found} Master #1 Master #2 Master #3 [{lock, foo}] [{lock, foo}] []

35 Client A locker:lock(foo, pid(0,123,0)) Client B Send: {get_write_lock, foo, not_found} Master #1 Master #2 Master #3 [{lock, foo}] [{lock, foo}] [{lock, foo}]

36 Client A locker:lock(foo, pid(0,123,0)) Client B Send: {get_write_lock, foo, not_found} Already locked! Master #1 Master #2 Master #3 [{lock, foo}] [{lock, foo}] [{lock, foo}]

37 Client A locker:lock(foo, pid(0,123,0)) Client B Send: {get_write_lock, foo, not_found} [ok, ok, error] [error,error, ok] Master #1 Master #2 Master #3 [{lock, foo}] [{lock, foo}] [{lock, foo}]

38 Client A locker:lock(foo, pid(0,123,0)) Client B [ok, ok, error] [error, error, ok] Master #1 Master #2 Master #3 [{lock, foo}] [{lock, foo}] [{lock, foo}]

39 Client A locker:lock(foo, pid(0,123,0)) Client B [ok, ok, error] Send: {write, foo, 123} [error, error, ok] Send: release_write_lock Master #1 Master #2 Master #3 [{lock, foo}] [{lock, foo}] [{lock, foo}]

40 Client A locker:lock(foo, pid(0,123,0)) Client B [ok, ok, error] Send: {write, foo, 123} [error, error, ok] Send: release_write_lock Master #1 Master #2 Master #3 [] [] []

41 Use locker when Strong consistency is sometimes needed Protect resources Leader election Service discovery

42 Is it any good?

43 Some experts say yes, some experts say no

44 Conclusion

45 Distributed systems are hard

46 We could maybe have made ZooKeeper work..

47 ..but we now have our dream system

48 Ambitious, naive project led to big operational advantage

49 Q&A

Intuitive distributed algorithms. with F#

Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype