Giza: Erasure Coding Objects across Global Data Centers

Size: px

Start display at page:

Download "Giza: Erasure Coding Objects across Global Data Centers"

Angel Joseph
6 years ago
Views:

1 Giza: Erasure Coding Objects across Global Data Centers Yu Lin Chen*, Shuai Mu, Jinyang Li, Cheng Huang *, Jin li *, Aaron Ogus *, and Douglas Phillips* New York University, *Microsoft Corporation USENIX ATC, Santa Clara, CA, July

2 Azure data centers span the globe 2

3 Data center fault tolerance: Geo Replication blob 3

4 Data center fault tolerance: Erasure coding blob a b p Project Giza 4

5 The cost of erasure coding a b blob Project Giza 5

6 When is cross DC coding a win? 1. Inexpensive cross DC communication. 2. Coding friendly workload a) Stores a lot of data in large objects (less coding overhead) b) Infrequent reconstruction due to read or failure. 6

7 Cross DC communication is becoming less expensive Largest bandwidth capacity today with 160Tpbs 7

8 In search of a coding friendly workload 3 months trace: January March

9 OneDrive is occupied mostly by large objects 9

10 OneDrive is occupied mostly by large objects Object > 1GB occupies 53% of total storage capacity. 10

11 OneDrive is occupied mostly by large objects Objects < 4MB occupies only 0.9% of total storage capacity 11

12 OneDrive reads are infrequent after a month 12

13 OneDrive reads are infrequent after a month 47% of the total bytes read occur within the first day of the object creation 13

14 OneDrive reads are infrequent after a month less than 2% of the total bytes read occur after a month 14

15 Outline 1. The case for erasure coding across data centers. 2. Giza design and challenges 3. Experimental Results 15

16 Giza: a strongly consistent versioned object store Erasure code each object individually. Optimize latency for Put and Get. Leverage existing single-dc cloud storage API. 16

17 Giza s architecture DC 2 Table Blob DC 1 DC 3 17

18 Giza s workflow Client Put(Blob1, Data) DC 2 Table Blob DC 1 DC 3 18

19 Giza s workflow Client Put(Blob1, Data) Put(GUID1, Fragment A) DC 2 Table Blob DC 1 DC 3 19

20 Giza s workflow Client Put(Blob1, Data) {DC1: GUID1, DC2: GUID2, DC3: GUID3} DC 2 Table Blob DC 1 DC 3 20

21 Giza metadata versioning Let md this = {DC 1 :GUID 4, DC 2 :GUID 5, DC 3 :GUID 6 } Key Metadata (version #: actual metadata) Blob1 v1: md 1 Key Blob1 Metadata (version #: actual metadata) v1: md 1, v2: md this 21

22 Inconsistency during concurrent put operations Client Put(Blob1, Data1) Client Put(Blob1, Data2) DC 2 Table Blob DC 1 DC 3 22

23 Inconsistency during concurrent put operations Order of arrival md2 md1 md2 md1 md2 md1 Key Metadata Key Metadata Key Metadata Blob1 v1: md 1 DC 1 Blob1 v1: md 2 DC 2 Blob1 v1: md 1 DC 3 WARNING: INCONSISTENT STATE 23

24 Inconsistency during concurrent put operations Order of arrival md2 md1 md2 md1 md2 md1 Key Metadata Blob1 v1: md 1 DC 1 Key Metadata Blob1 v1: md 2 DC 2 Key Metadata FAILURE! Blob1 v1: md 1 DC 3 WARNING: INCONSISTENT STATE 24

25 Choosing version value via consensus Key Metadata Key Metadata Key Metadata Blob1 v1: md 1 v2: md 2 Blob1 v1: md 1 v2: md 2 Blob1 v1: md 1 v2: md 2 DC 1 DC 2 OR DC 3 Key Metadata Key Metadata Key Metadata Blob1 v1: md 2 v2: md 1 Blob1 v1: md 2 v2: md 1 Blob1 v1: md 2 v2: md 1 DC 1 DC 2 DC 3 25

26 Paxos and Fast Paxos properties 26

27 Paxos and Fast Paxos properties 27

28 Implementing Fast Paxos with Azure table API Let md this = ({DC 1 :GUID 1, DC 2 :GUID 2, DC 3 :GUID 3 }, Paxos states) DC 2 Table Blob DC 1 DC 3 28

29 Implementing Fast Paxos with Azure table API Let md this = ({DC 1 :GUID 1, DC 2 :GUID 2, DC 3 :GUID 3 }, Paxos states) DC 2 Table Blob DC 1 DC 3 29

30 Implementing Fast Paxos with Azure table API Let md this = ({DC 1 :GUID 1, DC 2 :GUID 2, DC 3 :GUID 3 }, Paxos states) Read metadata row and decide to accept or reject proposal (v2 = md this ) Table Blob Etag = 1 Remote DC 30

31 Implementing Fast Paxos with Azure table API Let md this = ({DC 1 :GUID 1, DC 2 :GUID 2, DC 3 :GUID 3 }, Paxos states) Etag = 1 Fail to update table since etag has changed Write metadata row back to the table if proposal accepted Table Blob Etag = 2 Remote DC 31

32 Implementing Fast Paxos with Azure table API Let md this = ({DC 1 :GUID 1, DC 2 :GUID 2, DC 3 :GUID 3 }, Paxos states) Etag = 1 Successfully update table since etag hasn t changed Write metadata row back to the table if proposal accepted Table Blob Etag = 1 Remote DC 32

33 Parallelizing metadata and data path DC 2 Table Blob DC 1 DC 3 33

34 Data path fails, metadata path succeeds Key Metadata (version #: actual metadata) Blob1 v1: md 1, v2: md 2 Ongoing, data path failed During read, Giza will revert back to the latest version where the data is available. 34

35 Giza Put Execute metadata and data path in parallel. Upon success, return acknowledgement to the client. In the background, update highest committed version number in the metadata row. Key Committed Metadata (version #: actual metadata) Blob1 1 v1: md 1, v2: md 2 Ongoing, not committed 35

36 Giza Get Optimistically read the latest committed version locally. Execute both data path and metadata path in parallel: Data path retrieves the data fragments indicated by the local latest committed version. Metadata path will retrieve the metadata row from all data centers and validate the latest committed version. 36

37 Outline 1. The case for erasure coding across data centers. 2. Giza design and challenges 3. Experimental Results 37

38 Giza deployment configurations US

39 Giza deployment configurations US

40 Giza deployment configurations World

41 Giza deployment configurations World

42 Metadata Path: Fast vs Classic Paxos 23ms 32ms 64ms 42

43 Metadata Path: Fast vs Classic Paxos 102ms 240ms 139ms 43

44 Giza Put Latency (World-2-1) 44

45 Giza Get Latency (World-2-1) 45

46 Summary Recent trend in data centers supports the economic feasibility of erasure coding across data centers. Giza is a strongly consistent versioned object store, built on top of Azure storage. Giza optimizes for latency and is fast in the common case. 46

Erasure Coding in Object Stores: Challenges and Opportunities

Erasure Coding in Object Stores: Challenges and Opportunities Lewis Tseng Boston College July 2018, PODC Acknowledgements Nancy Lynch Muriel Medard Kishori Konwar Prakash Narayana Moorthy Viveck R. Cadambe