BigTable: A Distributed Storage System for Structured Data

Similar documents
NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

CS November 2017

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

CS November 2018

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Distributed Systems [Fall 2012]

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Comparing SQL and NOSQL databases

BigTable. CSE-291 (Cloud Computing) Fall 2016

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Design & Implementation of Cloud Big table

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

7680: Distributed Systems

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements

Bigtable: A Distributed Storage System for Structured Data

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Structured Big Data 1: Google Bigtable & HBase Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Distributed File Systems II

MapReduce & BigTable

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

BigTable A System for Distributed Structured Storage

BigTable: A System for Distributed Structured Storage

CSE-E5430 Scalable Cloud Computing Lecture 9

CA485 Ray Walshe NoSQL

CS5412: OTHER DATA CENTER SERVICES

Extreme Computing. NoSQL.

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

Distributed Database Case Study on Google s Big Tables

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

Outline. Spanner Mo/va/on. Tom Anderson

CS5412: DIVING IN: INSIDE THE DATA CENTER

Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب

Ghislain Fourny. Big Data 5. Wide column stores

Google big data techniques (2)

HBASE INTERVIEW QUESTIONS

Distributed Systems 16. Distributed File Systems II

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Lessons Learned While Building Infrastructure Software at Google

Ghislain Fourny. Big Data 5. Column stores

HBase. Леонид Налчаджи

Bigtable: A Distributed Storage System for Structured Data

DIVING IN: INSIDE THE DATA CENTER

CS /29/18. Paul Krzyzanowski 1. Question 1 (Bigtable) Distributed Systems 2018 Pre-exam 3 review Selected questions from past exams

Distributed Systems Pre-exam 3 review Selected questions from past exams. David Domingo Paul Krzyzanowski Rutgers University Fall 2018

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

HBase Installation and Configuration

CS5412: DIVING IN: INSIDE THE DATA CENTER

Typical size of data you deal with on a daily basis

Bigtable: A Distributed Storage System for Structured Data

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

A BigData Tour HDFS, Ceph and MapReduce

Facebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Cloud Computing. Up until now

CA485 Ray Walshe Google File System

Lecture: The Google Bigtable

Data Informatics. Seon Ho Kim, Ph.D.

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1

Distributed Data Management. Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig

ADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services

11 Storage at Google Google Google Google Google 7/2/2010. Distributed Data Management

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

BigData and Map Reduce VITMAC03

10 Million Smart Meter Data with Apache HBase

Infrastructure system services

Big Data Analytics. Rasoul Karimi

CS 655 Advanced Topics in Distributed Systems

MapReduce. U of Toronto, 2014

Google Data Management

W b b 2.0. = = Data Ex E pl p o l s o io i n

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

CLOUD-SCALE FILE SYSTEMS

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Distributed computing: index building and use

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

HBase Installation and Configuration

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

The Google File System

The Google File System (GFS)

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

Hadoop. copyright 2011 Trainologic LTD

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Data Storage in the Cloud

NPTEL Course Jan K. Gopinath Indian Institute of Science

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Column Stores and HBase. Rui LIU, Maksim Hrytsenia

Transcription:

BigTable: A Distributed Storage System for Structured Data Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 1 / 57

Motivation Lots of (semi-)structured data at Google. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 2 / 57

Motivation Lots of (semi-)structured data at Google. URLs: contents, crawl metadata, links, anchors Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 2 / 57

Motivation Lots of (semi-)structured data at Google. URLs: contents, crawl metadata, links, anchors Per-user data: user preferences settings, recent queries, search results Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 2 / 57

Motivation Lots of (semi-)structured data at Google. URLs: contents, crawl metadata, links, anchors Per-user data: user preferences settings, recent queries, search results Geographical locations: physical entities, e.g., shops, restaurants, roads Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 2 / 57

Motivation Lots of (semi-)structured data at Google. URLs: contents, crawl metadata, links, anchors Per-user data: user preferences settings, recent queries, search results Geographical locations: physical entities, e.g., shops, restaurants, roads Scale is large Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 2 / 57

Motivation Lots of (semi-)structured data at Google. URLs: contents, crawl metadata, links, anchors Per-user data: user preferences settings, recent queries, search results Geographical locations: physical entities, e.g., shops, restaurants, roads Scale is large Billions of URLs, many versions/page - 20KB/page Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 2 / 57

Motivation Lots of (semi-)structured data at Google. URLs: contents, crawl metadata, links, anchors Per-user data: user preferences settings, recent queries, search results Geographical locations: physical entities, e.g., shops, restaurants, roads Scale is large Billions of URLs, many versions/page - 20KB/page Hundreds of millions of users, thousands of q/sec - Latency requirement Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 2 / 57

Motivation Lots of (semi-)structured data at Google. URLs: contents, crawl metadata, links, anchors Per-user data: user preferences settings, recent queries, search results Geographical locations: physical entities, e.g., shops, restaurants, roads Scale is large Billions of URLs, many versions/page - 20KB/page Hundreds of millions of users, thousands of q/sec - Latency requirement 100+TB of satellite image data Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 2 / 57

Goals Need to support: Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 3 / 57

Goals Need to support: Very high read/write rates (millions of operations per second): Google Talk Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 3 / 57

Goals Need to support: Very high read/write rates (millions of operations per second): Google Talk Efficient scans over all or interesting subset of data. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 3 / 57

Goals Need to support: Very high read/write rates (millions of operations per second): Google Talk Efficient scans over all or interesting subset of data. Efficient joins of large 1-1 and 1-* datasets. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 3 / 57

Goals Need to support: Very high read/write rates (millions of operations per second): Google Talk Efficient scans over all or interesting subset of data. Efficient joins of large 1-1 and 1-* datasets. Often want to examine data changes over time. Contents of web page over multiple crawls. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 3 / 57

BigTable Distributed multi-level map Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 4 / 57

BigTable Distributed multi-level map Fault-tolerant, persistent Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 4 / 57

BigTable Distributed multi-level map Fault-tolerant, persistent Scalable 1000s of servers TB of in-memory data Peta byte of disk based data Millions of read/writes per second, efficient scans Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 4 / 57

BigTable Distributed multi-level map Fault-tolerant, persistent Scalable 1000s of servers TB of in-memory data Peta byte of disk based data Millions of read/writes per second, efficient scans Self-managing Servers can be added/removed dynamically Servers adjust to the load imbalance Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 4 / 57

BigTable Distributed multi-level map Fault-tolerant, persistent Scalable 1000s of servers TB of in-memory data Peta byte of disk based data Millions of read/writes per second, efficient scans Self-managing Servers can be added/removed dynamically Servers adjust to the load imbalance CAP: strong consistency and partition tolerance Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 4 / 57

Data Model Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 5 / 57

Reminder [http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques] Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 6 / 57

Column-Oriented Data Model (1/2) Similar to a key/value store, but the value can have multiple attributes (Columns). Column: a set of data values of a particular type. Store and process data by column instead of row. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 7 / 57

Columns-Oriented Data Model (2/2) In many analytical databases queries, few attributes are needed. Column values are stored contiguously on disk: reduces I/O. [Lars George, Hbase: The Definitive Guide, O Reilly, 2011] Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 8 / 57

BigTable Data Model (1/5) Table Distributed multi-dimensional sparse map Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 9 / 57

BigTable Data Model (2/5) Rows Every read or write in a row is atomic. Rows sorted in lexicographical order. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 10 / 57

BigTable Data Model (3/5) Column The basic unit of data access. Column families: group of (the same type) column keys. Column key naming: family:qualifier Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 11 / 57

BigTable Data Model (4/5) Timestamp Each column value may contain multiple versions. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 12 / 57

BigTable Data Model (5/5) Tablet: contiguous ranges of rows stored together. Tables are split by the system when they become too large. Auto-Sharding Each tablet is served by exactly one tablet server. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 13 / 57

Building Blocks Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 14 / 57

BigTable Cell Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 15 / 57

Main Components Master server Tablet server Client library Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 16 / 57

Master Server One master server. Assigns tablets to tablet server. Balances tablet server load. Garbage collection of unneeded files in GFS. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 17 / 57

Tablet Server Many tablet servers. Can be added or removed dynamically. Each manages a set of tablets (typically 10-1000 tablets/server). Handles read/write requests to tablets. Splits tablets when too large. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 18 / 57

Client Library Library that is linked into every client. Client data does not move though the master. Clients communicate directly with tablet servers for reads/writes. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 19 / 57

Building Blocks The building blocks for the BigTable are: Google File System (GFS): raw storage Chubby: distributed lock manager Scheduler: schedules jobs onto machines Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 20 / 57

Google File System (GFS) Large-scale distributed file system. Master: responsible for metadata. Chunk servers: responsible for reading and writing large chunks of data. Chunks replicated on 3 machines, master responsible for ensuring replicas exist. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 21 / 57

Chubby Lock Service (1/2) Name space consists of directories/files used as locks. Read/Write to a file are atomic. Consists of 5 active replicas: one is elected master and serves requests. Needs a majority of its replicas to be running for the service to be alive. Uses Paxos to keep its replicas consistent during failures. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 22 / 57

Chubby Lock Service (2/2) Chubby is used to: Ensure there is only one active master. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 23 / 57

Chubby Lock Service (2/2) Chubby is used to: Ensure there is only one active master. Store bootstrap location of BigTable data. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 23 / 57

Chubby Lock Service (2/2) Chubby is used to: Ensure there is only one active master. Store bootstrap location of BigTable data. Discover tablet servers. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 23 / 57

Chubby Lock Service (2/2) Chubby is used to: Ensure there is only one active master. Store bootstrap location of BigTable data. Discover tablet servers. Store BigTable schema information. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 23 / 57

Chubby Lock Service (2/2) Chubby is used to: Ensure there is only one active master. Store bootstrap location of BigTable data. Discover tablet servers. Store BigTable schema information. Store access control lists. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 23 / 57

SSTable Immutable, sorted file of key-value pairs. Chunks of data plus an index. Index of block ranges, not values. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 24 / 57

Implementation Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 25 / 57

Tablet Assignment 1 tablet 1 tablet server. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 26 / 57

Tablet Assignment 1 tablet 1 tablet server. Master uses Chubby to keep tracks of set of live tablet serves and unassigned tablets. When a tablet server starts, it creates and acquires an exclusive lock in Chubby. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 26 / 57

Tablet Assignment 1 tablet 1 tablet server. Master uses Chubby to keep tracks of set of live tablet serves and unassigned tablets. When a tablet server starts, it creates and acquires an exclusive lock in Chubby. Master detects the status of the lock of each tablet server by checking periodically. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 26 / 57

Tablet Assignment 1 tablet 1 tablet server. Master uses Chubby to keep tracks of set of live tablet serves and unassigned tablets. When a tablet server starts, it creates and acquires an exclusive lock in Chubby. Master detects the status of the lock of each tablet server by checking periodically. Master is responsible for finding when tablet server is no longer serving its tablets and reassigning those tablets as soon as possible. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 26 / 57

Finding a Tablet Three-level hierarchy. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 27 / 57

Finding a Tablet Three-level hierarchy. Root tablet contains location of all tablets in a special METADATA table. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 27 / 57

Finding a Tablet Three-level hierarchy. Root tablet contains location of all tablets in a special METADATA table. METADATA table contains location of each tablet under a row. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 27 / 57

Finding a Tablet Three-level hierarchy. Root tablet contains location of all tablets in a special METADATA table. METADATA table contains location of each tablet under a row. The client library caches tablet locations. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 27 / 57

Master Startup The master executes the following steps at startup: Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 28 / 57

Master Startup The master executes the following steps at startup: Grabs a unique master lock in Chubby, which prevents concurrent master instantiations. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 28 / 57

Master Startup The master executes the following steps at startup: Grabs a unique master lock in Chubby, which prevents concurrent master instantiations. Scans the servers directory in Chubby to find the live servers. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 28 / 57

Master Startup The master executes the following steps at startup: Grabs a unique master lock in Chubby, which prevents concurrent master instantiations. Scans the servers directory in Chubby to find the live servers. Communicates with every live tablet server to discover what tablets are already assigned to each server. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 28 / 57

Master Startup The master executes the following steps at startup: Grabs a unique master lock in Chubby, which prevents concurrent master instantiations. Scans the servers directory in Chubby to find the live servers. Communicates with every live tablet server to discover what tablets are already assigned to each server. Scans the METADATA table to learn the set of tablets. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 28 / 57

Tablet Serving (1/2) Updates committed to a commit log. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 29 / 57

Tablet Serving (1/2) Updates committed to a commit log. Recently committed updates are stored in memory - memtable Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 29 / 57

Tablet Serving (1/2) Updates committed to a commit log. Recently committed updates are stored in memory - memtable Older updates are stored in a sequence of SSTables. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 29 / 57

Tablet Serving (2/2) Strong consistency Only one tablet server is responsible for a given piece of data. Replication is handled on the GFS layer. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 30 / 57

Tablet Serving (2/2) Strong consistency Only one tablet server is responsible for a given piece of data. Replication is handled on the GFS layer. Tradeoff with availability If a tablet server fails, its portion of data is temporarily unavailable until a new server is assigned. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 30 / 57

Compaction When in-memory is full Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 31 / 57

Compaction When in-memory is full Minor compaction Convert the memtable into an SSTable. Reduce memory usage and log traffic on restart. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 31 / 57

Compaction When in-memory is full Minor compaction Convert the memtable into an SSTable. Reduce memory usage and log traffic on restart. Merging compaction Reduces number of SSTables. Reads the contents of a few SSTables and the memtable, and writes out a new SSTable. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 31 / 57

Compaction When in-memory is full Minor compaction Convert the memtable into an SSTable. Reduce memory usage and log traffic on restart. Merging compaction Reduces number of SSTables. Reads the contents of a few SSTables and the memtable, and writes out a new SSTable. Major compaction Merging compaction that results in only one SSTable. No deleted records, only sensitive live data. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 31 / 57

The Bigtable API Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 32 / 57

The Bigtable API Metadata operations Create/delete tables, column families, change metadata Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 33 / 57

The Bigtable API Metadata operations Create/delete tables, column families, change metadata Writes: single-row, atomic write/delete cells in a row, delete all cells in a row Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 33 / 57

The Bigtable API Metadata operations Create/delete tables, column families, change metadata Writes: single-row, atomic write/delete cells in a row, delete all cells in a row Reads: read arbitrary cells in a Bigtable table Each row read is atomic. Can restrict returned rows to a particular range. Can ask for just data from one row, all rows, etc. Can ask for all columns, just certain column families, or specific columns. Can ask for certain timestamps only. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 33 / 57

Writing Example // Open the table Table *T = OpenOrDie("/bigtable/web/webtable"); // Write a new anchor and delete an old anchor RowMutation r1(t, "com.cnn.www"); r1.set("anchor:www.c-span.org", "CNN"); r1.delete("anchor:www.abc.com"); Operation op; Apply(&op, &r1); Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 34 / 57

Reading Example Scanner scanner(t); scanner.lookup("com.cnn.www"); ScanStream *stream; stream = scanner.fetchcolumnfamily("anchor"); stream->setreturnallversions(); for (;!stream->done(); stream->next()) { printf("%s %s %lld %s\n", scanner.rowname(), stream->columnname(), stream->microtimestamp(), stream->value()); } Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 35 / 57

Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 36 / 57

HBase Type of NoSQL database, based on Google Bigtable Column-oriented data store, built on top of HDFS CAP: strong consistency and partition tolerance Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 37 / 57

Region and Region Server [Lars George, Hbase: The Definitive Guide, O Reilly, 2011] Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 38 / 57

HBase Cell [Lars George, Hbase: The Definitive Guide, O Reilly, 2011] (Table, RowKey, Family, Column, Timestamp) Value Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 39 / 57

HBase Cluster [Tom White, Hadoop: The Definitive Guide, O Reilly, 2012] Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 40 / 57

HBase Components [Lars George, Hbase: The Definitive Guide, O Reilly, 2011] Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 41 / 57

HBase Components - Region Server Responsible for all read and write requests for all regions they serve. Split regions that have exceeded the thresholds. Region servers are added or removed dynamically. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 42 / 57

HBase Components - Master Responsible for managing regions and their locations. Assigning regions to region servers (uses Zookeeper). Handling load balancing of regions across region servers. Doesn t actually store or read data. Clients communicate directly with region servers. Responsible for schema management and changes. Adding/removing tables and column families. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 43 / 57

HBase Components - Zookeeper A coordinator service: not part of HBase Master uses Zookeeper for region assignment. Ensures that there is only one master running. Stores the bootstrap location for region discovery: a registry for region servers Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 44 / 57

HBase Components - HFile The data is stored in HFiles. HFiles are immutable sequences of blocks and saved in HDFS. Block index is stored at the end of HFiles. Cannot remove key-values out of HFiles. Delete marker (tombstone marker) indicates the removed records. Hides the marked data from reading clients. Updating key/value pairs: picking the latest timestamp. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 45 / 57

HBase Components - WAL and memstore When data is added it is written to a log called Write Ahead Log (WAL) and is also stored in memory (memstore). When in-memory data exceeds maximum value it is flushed to an HFile. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 46 / 57

HBase Installation and Shell Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 47 / 57

HBase Installation Three options Local (Standalone) Mode Pseudo-Distributed Mode Fully-Distributed Mode Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 48 / 57

Installation - Local Uses local filesystem (not HDFS). Runs HBase and Zookeeper in the same JVM. Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 49 / 57

Installation - Pseudo-Distributed (1/3) Requires HDFS. Mimics Fully-Distributed but runs on just one host. Good for testing, debugging and prototyping, not for production. Configuration files: hbase-env.sh hbase-site.xml Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 50 / 57

Installation - Pseudo-Distributed (2/3) Specify environment variables in hbase-env.sh export JAVA_HOME=/opt/jdk1.7.0_51 Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 51 / 57

Installation - Pseudo-Distributed (3/3) Starts an HBase Master process, a ZooKeeper server, and a Region- Server process. Configure in hbase-site.xml <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://localhost:8020/hbase</value> </property> Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 52 / 57

Start HBase and Test Start the HBase daemon. start-hbase.sh hbase shell Web-based management Master host: http://localhost:60010 Region host: http://localhost:60030 Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 53 / 57

HBase Shell status list create Blog, {NAME=> info }, {NAME=> content } # put table, row_id, family:column, value put Blog, Matt-001, info:title, Elephant put Blog, Matt-001, info:author, Matt put Blog, Matt-001, info:date, 2009.05.06 put Blog, Matt-001, content:post, Do elephants like monkeys? # get table, row_id get Blog, Matt-001 get Blog, Matt-001, {COLUMN=>[ info:author, content:post ]} scan Blog Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 54 / 57

Summary Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 55 / 57

Summary BigTable Column-oriented Main components: master, tablet server, client library Basic components: GFS, chubby, SSTable HBase Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 56 / 57

Questions? Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26 57 / 57