HBase Solutions at Facebook

Similar documents
Facebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Typical size of data you deal with on a daily basis

10 Million Smart Meter Data with Apache HBase

CSE-E5430 Scalable Cloud Computing Lecture 9

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

TRANSACTIONS AND ABSTRACTIONS

Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Big Data Facebook

Ghislain Fourny. Big Data 5. Column stores

HBase. Леонид Налчаджи

Advanced Database Technologies NoSQL: Not only SQL

Distributed File Systems II

Apache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel

Ghislain Fourny. Big Data 5. Wide column stores

TRANSACTIONS OVER HBASE

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

CISC 7610 Lecture 2b The beginnings of NoSQL

Introduction to Database Services

A BigData Tour HDFS, Ceph and MapReduce

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Time Series Live 2017

Introduction to BigData, Hadoop:-

Comparing SQL and NOSQL databases

CIB Session 12th NoSQL Databases Structures

ADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services

EsgynDB Enterprise 2.0 Platform Reference Architecture

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

HBASE INTERVIEW QUESTIONS

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

Who Am I? Chris Larsen

Data Storage Infrastructure at Facebook

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

The State of Apache HBase. Michael Stack

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

HBase... And Lewis Carroll! Twi:er,

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

MyRocks deployment at Facebook and Roadmaps. Yoshinori Matsunobu Production Engineer / MySQL Tech Lead, Facebook Feb/2018, #FOSDEM #mysqldevroom

Big Data Analytics. Rasoul Karimi

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

2013 AWS Worldwide Public Sector Summit Washington, D.C.

RAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko

Scalable Streaming Analytics

CS November 2018

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

CS November 2017

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Cassandra 1.0 and Beyond

Next-Generation Cloud Platform

A Glimpse of the Hadoop Echosystem

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Challenges for Data Driven Systems

Data Infrastructure at LinkedIn. Shirshanka Das XLDB 2011

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Building LinkedIn s Real-time Data Pipeline. Jay Kreps

BigTable: A Distributed Storage System for Structured Data

Multi-Tenancy & Isolation. Bogdan Munteanu - Dropbox

Building Durable Real-time Data Pipeline

VOLTDB + HP VERTICA. page

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Apache Hadoop.Next What it takes and what it means

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

Efficiency at Scale. Sanjeev Kumar Director of Engineering, Facebook

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

FROM LEGACY, TO BATCH, TO NEAR REAL-TIME. Marc Sturlese, Dani Solà

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Distributed PostgreSQL with YugaByte DB

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores

Cmprssd Intrduction To

A Fast and High Throughput SQL Query System for Big Data

Tools for Social Networking Infrastructures

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

RocksDB Key-Value Store Optimized For Flash

Data-Intensive Distributed Computing

Apache Accumulo 1.4 & 1.5 Features

Migrating to Cassandra in the Cloud, the Netflix Way

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Faster HBase queries. Introducing hindex Secondary indexes for HBase. ApacheCon North America Rajeshbabu Chintaguntla

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

Presented by Sunnie S Chung CIS 612

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Distributed Systems 16. Distributed File Systems II

Google Cloud Bigtable. And what it's awesome at

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Data Informatics. Seon Ho Kim, Ph.D.

Transcription:

HBase Solutions at Facebook Nicolas Spiegelberg Software Engineer, Facebook QCon Hangzhou, October 28 th, 2012

Outline HBase Overview Single Tenant: Messages Selection Criteria Multi-tenant Solutions Physical Self-service Hashout Deployment Recommendations Recent Work

HBase Overview

HBase in a nutshell Apache open source project modeled after Google s BigTable a distributed, large-scale data store built on top of Hadoop Distributed File System (HDFS) efficient at random writes and reads

HBase System Overview Database Layer Master Backup Master Region Server Region Server HBASE Region Server... Storage Layer Coordination Service AvatarNamenode AvatarNamenode Datanode Datanode Datanode HDFS... ZK Peer ZK Peer Zookeeper...

Horizontal Scalability Region......

Automatic Failover HBase client Find new server from HMaster

Write Path Overview Region Server.... Region #2 Region #1.... ColumnFamily #2 ColumnFamily #1 Memstore HFiles flush Data in HFile is sorted; has block index for efficient retrieval

Read Path Overview Region Server.... Region #2 Region #1.... ColumnFamily #2 ColumnFamily #1 Memstore HFiles Get

HBase Use Cases @ Facebook Messages Facebook Insights Self-service Hashout Operational Data Store More Analytics/Hashout apps Site Integrity 2010 2011 2012 2013 Social Graph Search Indexing Realtime Hive Updates Cross-system Tracing and more

Flagship App: Facebook Messages

Monthly data volume prior to launch 15B x 1,024 bytes = 14TB 120B x 100 bytes = 11TB

Facebook Messages NOW Quick Stats 11B+ messages/day 90B+ data accesses Peak: 1.5M ops/sec ~55%Rd, 45% Wr 20PB+ of total data Grows 400TB/month Messages Emails Chats SMS

Facebook Messages: Requirements Very high write volume Previously, chat was not persisted to disk Ever-growing data sets (old data rarely gets accessed) Elasticity & automatic failover Strong consistency within a single data center Large scans/map-reduce support for migrations & schema conversions Bulk import data

Messaging Data Small/medium sized data in HBase Message metadata & indices Search index Small message bodies Attachments and large messages in Haystack (Facebook s photo store) HBase currently not optimized for large objects (multiple megabytes and beyond), but could change in future.

Snapshot Schema Overview 3 CFs: Actions, Snapshot, Keywords 1. Actions: Log of user actions. 2. Snapshot: Blob containing all user metadata (everything but the message itself) Mailbox = Snapshot + Actions after Snapshot 3. Keywords Started out as a Lucene Index. Worried about whether HBase was doing proper prefix seeking. Switched to keyword-based before launch.

Messages Schema & Evolution Trick: Actions (data) Column Family is the source of truth Regenerate Snapshot + Keywords by replaying Actions in order Custom, Low-Volume Backup Solution Fast Iteration on Schema

Messages Schema & Evolution Metadata portion of schema underwent 3 changes: Per-User Snapshots (early development; rollout up to 1M users) Per-Thread Snapshots (up to full rollout 1B+ accounts; 800M+ active) Per-Message Shard Snapshots (after rollout)

How we use MapReduce Upgrading schema Old version to new version Deleting old version of schema Deletion jobs Messages Search indexes Other misc jobs Find all users in a cell For importing, exporting HBase tables

Physical Multi-tenancy - Real-time Ads Insights - Operational Data Store

1. Real-time Facebook Insights Real-time analytics for social plugins on top of HBase Publishers get real-time distribution/engagement metrics: # of impressions, likes analytics by domain/url/demographics and time periods Uses HBase capabilities: Efficient counters (single-rpc increments) TTL for purging old data Needs massive write throughput & low latencies Billions of URLs Millions of counter increments/second

Facebook Insights: Before Traditional ETL using Hadoop + Hive Scribe MapReduce SQL Web Tier HDFS Hive MySQL Update time: 15 minute 24 hours

Facebook Insights: After (on HBase) Realtime ETL using HBase Scribe PTail HBase API Web Tier HDFS PUMA HBase Update time: 10 30 seconds!

Real-time Ads Insights : Scaling Lessons Learned 1. Don t tackle more than you can handle 2. GC Tuning can hurt 3. Batch for efficient Writing +5 is faster than 5*(+1) MultiPut is faster than Put 4. Periodic Aggregation with MR vs Realtime +1000 is MUCH faster than 1000*(+1)

2. Operational Data Store Collects variety of metrics from production servers System level metrics (CPU, Memory, IO, Network) Application level metrics (Web, DB, Caches) Can easily graph historical data e.g., CPU usage for machine over last 6 hours Supports complex aggregation, transformations, etc. Used by HBase itself!

ODS (contd.) Difficult to scale with MySQL Currently 4-minute data. We need higher precision. 10s of millions of unique time-series ~100B+ writes per day Hot-shard problem requires manual resharding HBase offers automatic/dynamic splitting of shards Uses HBase s TTL feature to purge old data automatically! Reads are mostly for recent data HBase s storage model perfectly optimized for reading recent data

ODS: Schema Design Three different Column Families Raw Hour Day MR Jobs to handle rollups

ODS: Compaction Tricks Log-structured Merge Tree HFiles Time-ordered Data Storage! Better Spatial Locality Compaction.ratio 1.4 è 0.25 Future: Use Coprocessor day hour min Server-side constraints HFiles flush

ODS: Lesson Learned Split-tier Architecture Idea: Scale Cache & Storage Independently ODS HBASE HDFS Problems: 1. Network Architecture Insufficient 2. Memcache + App Logic More Effective 3. Traffic Between Nodes > User Traffic Flush Compaction (R+W)

ODS: HA Problem: Needed High Availability Solution: Dual-cluster Architecture Same Location, Different Network Best Effort for Puts MR Jobs for Eventual Consistency Biggest Negative: 6x replication End-game: Master-Master Replication HBASE ODS HBASE

Self-Service Tier

Self-Service Tier: What is it? Optimistic Multi-tenancy Allow users to create a table Inform them about state of multi-tenancy Expect users to do the right thing Monitoring JMX Metrics RPC Monitor Slow-query Logs HLog/HFile Pretty Printer

Self-Service Tier: Problems! Except Users have a relative opinion about not a lot of data Users don t analyze their own data (250MB KVs) Users don t always name their tables well Resource Analysis is non-trivial

Self-service: Lessons Learned Physically Isolate Users Thread:Server Mapping Block Abusive Users Promote Heavy Users

Hashout Controlled Multi-tenancy

What is it? A generic Key-Value store Multiple apps Simple API put(appid, key, value) value = get(appid, key) Need to declare app in code

Architecture put(appid, key, value) cluster, table, cf = getconf(appid) get(appid, key)00 cluster, table, cf = getconf(appid) HBase HBase Read from HBase Read from cache Row key = md5:appid:key Column = Memcache

How we started Not a self service model Each app is reviewed Global and per-app metrics Num gets, puts, latencies, errors Added Client-side metrics In case things went wrong Per-app kill switch Started with non-critical apps

What we observed Miscellaneous improvements Memcache for read intensive apps Friendly names for apps Alerts on exceptions Capacity estimation is hard Load on HBase (gets/puts/deletes) Number of connections to Thrift Too many requests from one app == Silently Point the User to a Separate Tier

Recent Work

HBase Development @ Facebook Reliability/Correctness Durable commit log in HDFS Multi-CF ACID semantics Many txn log recovery bug fixes Pluggable HDFS block placement policy to reduce probability of data loss Thrift gateway fixes Features Index-Only queries Hot Backups C++ Client Intra-row pagination support HTableMultiplexer Availability Durable commit log in HDFS Rolling upgrades Online alter table Interruptible compactions Faster region opens Manageability HBase Health Checker (hbck) Slow query logs TaskMonitor (like v$session stats in Oracle) Lots of Metrics (per-cf metrics, master metrics) Performance/Scaling Bloom Filters Compaction algo improvements Multi-threaded compactions HFile V2 Timerange hints/optimization Lazy Seeks Delete Bloom Filter Improved handling of compressed HFiles DataBlockEncoding Locality on full/rolling cluster restarts Per-region data placement Compressed RPCs

HBase Future Work It is still early days! Eliminate HDFS SPOF (Bookkeeper) Features (secondary indices, query language) Incremental Processing/Indexing Framework HBase on Flash Lot more performance/availability improvements

Acknowledgements Data Infrastructure Team Open source community And lots of people across Facebook

We are 1% finished

Thanks! Questions? facebook.com/engineering