Apache BookKeeper. A High Performance and Low Latency Storage Service

Size: px

Start display at page:

Download "Apache BookKeeper. A High Performance and Low Latency Storage Service"

Norah Chrystal Lewis
5 years ago
Views:

1 Apache BookKeeper A High Performance and Low Latency Storage Service

2 Hello! I am Sijie Guo - PMC Chair of Apache BookKeeper Co-creator of Apache DistributedLog Twitter Messaging/Pub-Sub Team Yahoo! R&D Beijing

3 Challenges in Distributed Systems

4 Expect Failures up to 10% annual failure rates for disks/servers

5 Symptoms

6 Problem 1: Not Available

7 Problem 1: Not Available

8 Problem 2: Inconsistencies

9 CAP

10 More Issues

11 Problem 3: Split Brain Writer A Two Writers Writer A Writer A Write A Write A

12 Problem 4: Failure Detection B A C

13 Problem 5: Recovery B A Consistency C Recovery Protocol

14 Solutions

15 Overview Enter Apache BookKeeper

16 BookKeeper - Durable Storage A building block for reliable systems Client Library Replication Consistency Durability Commodity Hardware Recovery

17 Ledger Abstraction

18 Ledger Segment Block / Object Append-Only File...

19 Guarantees If an entry has been acknowledged, it must be readable If an entry is read once, it must always be readable

20 History Initial Use Case - Hadoop NameNode HA 2008: Open Sourced Contrib of ZooKeeper 2011: Sub-Project of ZooKeeper 2012: Yahoo! Push Notification 2012~Now: DistributedLog, Pulsar, Majordodo 2015~Now: Salesforce Distributed Store

21 Details Inside of Apache BookKeeper

22 Architecture APP Client Metadata Store Ledger Bookie Bookie Bookie

23 Reliable Writes Bookie Store digest along with entry Fsync entries before responding Ack when Bookie Bookie All Previous Entries Accepted This Entry Quorum by

24 Consistency - LastAddPushed Writer Add entries LastAdd Pushed

25 Consistency - LastAddConfirmed Ownership Changed Writer Writer Add entries Ack Adds Fencing LastAdd Confirmed Reader LastAdd Confirmed Reader

26 Fencing

27 Read Entry & Read LAC B1 Read Entry K Read LAC Client Client Speculative Reads On Timeouts Quorum Read B2 B3 B1 B2 B3

28 Long Poll Read Client Speculative Long Poll B1 Long Poll Read B2 B3

29 Inside a Bookie

30 Use Cases Apache BookKeeper as a Building Block

31 Projects built on BookKeeper Twitter: Apache DistributedLog Yahoo: Pulsar - Cloud Messaging Service Salesforce Distributed Store. Huawei - HDFS NameNode HA HubSpot - WAL Majordodo - Distributed Resource Manager

32 Apache DistributedLog (Twitter)

33 Apache DistributedLog Log Segment X Log Segment X+2 Log Segment X Oldest Newest Apache BookKeeper

34 Apache DistributedLog DBs - e.g., Twitter s Manhattan Deferred RPC (queuing) Metadata Store Write Proxy Log Streams Log Segment Store (BK) Self-serve Pub/Sub - Ownership Tracking - Batching, Compression - Abstraction & Naming - Data Management Stream Computing Cross DC Replication Record Cache Rate Limiting, Quota - Read Proxy - Efficient Write & Read - Intra-cluster & Geo Replication Cold Storage (HDFS) - Applications - Different Consumer models - Serving - Raw Streams - Segments

35 DistributedLog at Twitter Manhattan Key/Value Store - WAL Durable Deferred RPC - Journal Real-Time Search Indexing - Change Propagation Self-serve Pub/Sub - Message Delivery, Ads Pipeline Stream Computing Source & Sink Stateful Processing in Heron (coming soon) Reliable Cross Datacenter Replication

36 Scale DistributedLog at Twitter 1.5 trillion records/day, 17.5 petabytes/day O(10) thousands streams, O(1) million live ledgers O(10^2) bookies, O(10^3) proxies Records size from 100 bytes to 20 KB to even more Data is kept from hours to days, even up to a year Replication factor is 3 or 5. 9 or 15 for global use case.

37 DistributedLog Resources Website - Mail List dev@distributedlog.incubator.apache.org Project Ideas Paper - DistributedLog: A high performance replicated log service (ICDE 2017)

38 Yahoo! Pulsar (Cloud Messaging Service)

39 Yahoo! Pulsar Distributed Pub/Sub Messaging Platform Flexible Messaging Model - Topic and Queue Durable, Low Latency Strong Ordering and Consistency Guarantees Geo Replication Apache BookKeeper as Durable Message Store

40 Yahoo! Pulsar

41 Scale Pulsar at Yahoo! 100 billion messages per day More than 1.4 million topics Avg publish latency across services of less than 5ms 10+ data centers, cross-region replications

42 Pulsar Performance

43 Salesforce Distributed Store

44 Salesforce Application Storage Store for Persistent WAL, Data and Objects Low, Constant Write Latencies Low, Constant Random Read Latencies Highly Available, Consistent Distributed and Linearly Scalable On Commodity Hardware

45 Heterogeneous Stores

46 Community Roadmap, Releases, Future

47 Community 7 PMC Members 10+ Committers 20+ Active Contributors 5+ Companies actively using/contributing Twitter Yahoo! Salesforce Huawei EMC

48 Release Netty 4 Upgrade - Performance Improvements Security (Authentication & Authorization) Support Explicit LAC Long Poll Read Support Auto Re-replication Improvements...

49 Future Scalable Segment Store Object, Log, File, Stream, Long Term Storage Disk Scrubber Better Lifecycle Management Beyond the limit 128 bits support Scalable metadata management

50 Thanks! Any questions? You can find me

Building Durable Real-time Data Pipeline

Building Durable Real-time Data Pipeline Apache BookKeeper at Twitter @sijieg Twitter Background Layered Architecture Agenda Design Details Performance Scale @Twitter Q & A Publish-Subscribe Online services