CS November 2017

Similar documents
CS November 2018

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

MapReduce & BigTable

BigTable: A Distributed Storage System for Structured Data

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Distributed Systems [Fall 2012]

Bigtable. Presenter: Yijun Hou, Yixiao Peng

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

BigTable. CSE-291 (Cloud Computing) Fall 2016

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Distributed File Systems II

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Distributed Database Case Study on Google s Big Tables

CA485 Ray Walshe NoSQL

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Bigtable: A Distributed Storage System for Structured Data

7680: Distributed Systems

BigTable: A System for Distributed Structured Storage

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

BigTable A System for Distributed Structured Storage

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

Structured Big Data 1: Google Bigtable & HBase Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Bigtable: A Distributed Storage System for Structured Data

Outline. Spanner Mo/va/on. Tom Anderson

CSE-E5430 Scalable Cloud Computing Lecture 9

Bigtable: A Distributed Storage System for Structured Data

Distributed Systems 16. Distributed File Systems II

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Lecture: The Google Bigtable

Extreme Computing. NoSQL.

Lessons Learned While Building Infrastructure Software at Google

CS5412: OTHER DATA CENTER SERVICES

HBase: Overview. HBase is a distributed column-oriented data store built on top of HDFS

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Data Informatics. Seon Ho Kim, Ph.D.

Infrastructure system services

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Google big data techniques (2)

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

CS5412: DIVING IN: INSIDE THE DATA CENTER

DIVING IN: INSIDE THE DATA CENTER

CS /29/18. Paul Krzyzanowski 1. Question 1 (Bigtable) Distributed Systems 2018 Pre-exam 3 review Selected questions from past exams

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

Distributed Data Management. Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig

11 Storage at Google Google Google Google Google 7/2/2010. Distributed Data Management

Distributed computing: index building and use

Distributed Systems Pre-exam 3 review Selected questions from past exams. David Domingo Paul Krzyzanowski Rutgers University Fall 2018

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Google Data Management

Design & Implementation of Cloud Big table

Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017

CS 655 Advanced Topics in Distributed Systems

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

CLOUD-SCALE FILE SYSTEMS

Data Storage in the Cloud

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Distributed Filesystem

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

MapReduce. U of Toronto, 2014

CS5412: DIVING IN: INSIDE THE DATA CENTER

Comparing SQL and NOSQL databases

Exam 2 Review. October 29, Paul Krzyzanowski 1

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Distributed computing: index building and use

CA485 Ray Walshe Google File System

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Google File System. Arun Sundaram Operating Systems

CSE 124: Networked Services Lecture-16

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Google File System 2

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Ghislain Fourny. Big Data 5. Wide column stores

CmpE 138 Spring 2011 Special Topics L2

BigData and Map Reduce VITMAC03

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

Indexing Large-Scale Data

Chapter 24 NOSQL Databases and Big Data Storage Systems

CSE 124: Networked Services Fall 2009 Lecture-19

The Google File System

A BigData Tour HDFS, Ceph and MapReduce

Ghislain Fourny. Big Data 5. Column stores

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

The Google File System. Alexandru Costan

CISC 7610 Lecture 2b The beginnings of NoSQL

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

Big Data Analytics. Rasoul Karimi

Transcription:

Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account info, recent queries Geography: roads, satellite images, points of interest, annotations Paul Krzyzanowski Rutgers University Fall 2017 Large scale Petabytes of data across thousands of servers Billions of URLs with many versions per page Hundreds of millions of users Thousands of queries per second 100TB+ satellite image data 1 2 Uses A big table At Google, used for: Google Analytics Google Finance Personalized search Blogger.com Google Code hosting YouTube Gmail Google Earth & Google Maps Dozens of others 3 Bigtable is NOT a relational database Bigtable appears as a large table A Bigtable is a sparse, distributed, persistent multidimensional map * rows html PUBLIC HTML PUBLIC HTML> HTML> Web table example columns *Bigtable: OSDI 2006 4 Table Model (row, column, timestamp) cell contents Contents are arbitrary strings (arrays of bytes) Tablets: Pieces of a Table Row operations are atomic Table partitioned dynamically by rows into tablets rows com.aaa com.cnn.www com.cnn.www/tech <!DOCTYPE html <!DOCTYPE html <!DOCTYPE html <!DOCTYPE html columns versions t 2 t 4 Tablet = range of contiguous rows Unit of distribution and load balancing Nearby rows will usually be served by the same server Accessing nearby rows requires communication with a small # of machines You need to select row keys to ensure good locality E.g., reverse domain names: com.cnn.www instead of www.cnn.com com.weather <!DOCTYPE html <!DOCTYPE html t 15 Web table example 5 6 Paul Krzyzanowski 1

Table splitting A table starts as one tablet As it grows, it it split into multiple tablets Approximate size: 100-200 MB per tablet by default html PUBLIC HTML PUBLIC HTML> tablet HTML> Splitting a tablet html PUBLIC HTML PUBLIC HTML> HTML> com.wikipedia <!DOCTYPE HTML> com.zcorp <!DOCTYPE HTML> com.zoom <!DOCTYPE HTML> Split 7 8 Columns and Column Families Column Family Group of column keys Column family is the basic unit of data access Data in a column family is typically of the same type compresses data in the same column family Operations (1) Create column family (2) Store data in any key within the family Column Families: example Three column families language for the web page contents of the web page anchor: contains text of anchors that reference this page. www.cnn.com is referenced by Sports Illustrated (cnnsi.com) and My-Look (mlook.ca) The value of ( com.cnn.www, anchor:cnnsi.com ) is CNN, the reference text from cnnsi.com. Column family anchor anchor:cnnsi.com anchor:mylook.ca Column families will typically be small hundreds of keys; a table may have an unlimited # of column families Identified by family:qualifier html PUBLIC HTML PUBLIC HTML> HTML> CNN CNN.com 9 10 Timestamps Each column family may contain multiple versions Version indexed by a 64-bit timestamp Real time or assigned by client Per-column-family settings for garbage collection Keep only latest n versions Or keep only versions written since time t Retrieve most recent version if no version specified If specified, return version where timestamp requested time API: Operations on Bigtable Create/delete tables & column families Change cluster, table, and column family metadata (e.g., access control rights) Write or delete values in cells Read values from specific rows Iterate over a subset of data in a table All members of a column family Multiple column families E.g., regular expressions, such as anchor:*.cnn.com Multiple timestamps Multiple rows Atomic read-modify-write row operations Allow clients to execute scripts (written in Sawzall) for processing data on the servers 11 12 Paul Krzyzanowski 2

: Supporting Services GFS For storing log and data files Cluster management system For scheduling jobs, monitoring health, dealing with failures Google SSTable (Sorted String Table) Internal file format optimized for streaming I/O and storing <key,value> data Provides a persistent, ordered, immutable map from keys to values Append-only Memory or disk based; indexes are cached in memory If there are additions/deletions/changes to rows New SSTables are written out with the deleted data removed Periodic compaction merges SSTables and removes old retired ones : Supporting Services Chubby Highly-available & persistent distributed lock (lease) service & file system Five active replicas; one elected as master to serve requests Majority must be running Paxos algorithm used to elect master & keep replicas consistent Provides namespace of files & directories Each file or directory can be used as a lock In Bigtable, Chubby is used to: Ensure there is only one active master Store bootstrap location of Bigtable data Discover tablet servers Store Bigtable schema information Store access control lists See http://goo.gl/mcd6ex for a description of SSTable 13 14 1. Many tablet servers coordinate requests to tablets Can be added or removed dynamically Each manages a set of tablets (typically 10-1,000 tablets/server) Handles read/write requests to tablets Splits tablets when too large 2. One master server Assigns tablets to tablet server Balances tablet server load Garbage collection of unneeded files in GFS Schema changes (table & column family creation) 3. Client library Client data does not move though the master Master Clients communicate directly with tablet servers for reads/writes Tablet Servers 15 : METADATA table Three-level hierarchy Balanced structure similar to a B+ tree Root tablet contains location of all tablets in a special METADATA table Row key of METADATA table contains location of each tablet f(table_id, end_row) location of tablet Chubby file Root tablet (1 st METADATA tablet) Other METADATA tablets 16 User Tablet 1 User Tablet N Tablet assigned to one tablet server at a time Chubby keeps track of tablet servers When tablet server starts: It creates & acquires an exclusive lock on a uniquely-named file in a Chubby servers directory Master monitors this directory to discover tablet servers When master starts: Grabs a unique master lock in Chubby (prevent multiple masters) Scans the servers directory in Chubby to find live tablet servers Contacts each tablet server to discover what tablets are assigned to that server Scans the METADATA table to learn the full set of tablets Build a list of tablets not assigned to servers These will be assigned by choosing a tablet server & sending it a tablet load request Tablet assigned to one tablet server at a time When master starts: Grabs a unique master lock in Chubby (prevent multiple masters) Scans the servers directory in Chubby to find live tablet servers Contacts each tablet server to discover what tablets are assigned to that server Scans the METADATA table to learn the full set of tablets Build a list of tablets not assigned to servers These will be assigned by choosing a tablet server & sending it a tablet load request 17 18 Paul Krzyzanowski 3

Fault Tolerance Fault tolerance is provided by GFS & Chubby Dead tablet server Master is responsible for detecting when a tablet server is not working Asks tablet server for status of its lock If the tablet server cannot be reached or has lost its lock Master attempts to grab that server s lock If it succeeds, then the tablet server is dead or cannot reach Chubby Master moves tablets that were assigned to that server into an unassigned state Dead master Master kills itself when its Chubby lease expires Cluster management system detects a non-responding master Chubby: designed for fault tolerance (5-way replication) GFS: stores underlying data designed for n-way replication Bigtable Replication Each table can be configured for replication to multiple Bigtable clusters in different data centers Eventual consistency model 19 20 Google Analytics Raw Click Table (~200 TB) Row for each end-user session Row name: {website name and time of session} Sessions that visit the same web site are & contiguous Summary Table (~20 TB) Contains various summaries for each crawled website Generated from the Raw Click table via periodic MapReduce jobs Personalized Search One Bigtable row per user (unique user ID) Column family per type of action E.g., column family for web queries (your entire search history!) Bigtable timestamp for each element identifies when the event occurred Uses MapReduce over Bigtable to personalize live search results 21 22 Google Maps / Google Earth Preprocessing Table for raw imagery (~70 TB) Each row corresponds to a single geographic segment Rows are named to ensure that adjacent segments are near each other Column family: keep track of sources of data per segment (this is a large # of columns one for each raw data image but sparse) MapReduce used to preprocess data Bigtable outside of Google Apache HBase Built on the Bigtable design Small differences (may disappear) access control not enforced per column family Millisecond vs. microsecond timestamps No client script execution to process stored data Built to use HDFS or any other file system No support for memory mapped tablets Improved fault tolerance with multiple masters on standby Serving Table to index data stored in GFS Small (~500 GB) but serves tens of thousands of queries with low latency 23 24 Paul Krzyzanowski 4

Bigtable vs. Amazon Dynamo Dynamo targets apps that only need key/value access with a primary focus on high availability key-value store versus column-store (column families and columns within them) Bigtable: distributed DB built on GFS Dynamo: distributed hash table Updates are not rejected even during network partitions or server failures The end 25 26 Paul Krzyzanowski 5