CS November 2018

Similar documents
CS November 2017

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

MapReduce & BigTable

BigTable: A Distributed Storage System for Structured Data

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Distributed Systems [Fall 2012]

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Distributed Database Case Study on Google s Big Tables

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Distributed File Systems II

CA485 Ray Walshe NoSQL

BigTable. CSE-291 (Cloud Computing) Fall 2016

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

Bigtable: A Distributed Storage System for Structured Data

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

BigTable: A System for Distributed Structured Storage

BigTable A System for Distributed Structured Storage

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

7680: Distributed Systems

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Bigtable: A Distributed Storage System for Structured Data

Structured Big Data 1: Google Bigtable & HBase Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

HBase: Overview. HBase is a distributed column-oriented data store built on top of HDFS

CSE-E5430 Scalable Cloud Computing Lecture 9

Data Informatics. Seon Ho Kim, Ph.D.

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Bigtable: A Distributed Storage System for Structured Data

Outline. Spanner Mo/va/on. Tom Anderson

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Extreme Computing. NoSQL.

CS5412: OTHER DATA CENTER SERVICES

Infrastructure system services

Lecture: The Google Bigtable

Google big data techniques (2)

Distributed Systems 16. Distributed File Systems II

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Lessons Learned While Building Infrastructure Software at Google

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

CS /29/18. Paul Krzyzanowski 1. Question 1 (Bigtable) Distributed Systems 2018 Pre-exam 3 review Selected questions from past exams

DIVING IN: INSIDE THE DATA CENTER

CS5412: DIVING IN: INSIDE THE DATA CENTER

Distributed computing: index building and use

Distributed Systems Pre-exam 3 review Selected questions from past exams. David Domingo Paul Krzyzanowski Rutgers University Fall 2018

Distributed Data Management. Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig

11 Storage at Google Google Google Google Google 7/2/2010. Distributed Data Management

Design & Implementation of Cloud Big table

Google Data Management

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

CS 655 Advanced Topics in Distributed Systems

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Data Storage in the Cloud

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Comparing SQL and NOSQL databases

Distributed computing: index building and use

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

Indexing Large-Scale Data

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

CS5412: DIVING IN: INSIDE THE DATA CENTER

Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Filesystem

CLOUD-SCALE FILE SYSTEMS

Big Data Analytics. Rasoul Karimi

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

W b b 2.0. = = Data Ex E pl p o l s o io i n

MapReduce. U of Toronto, 2014

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1

Distributed Systems. 29. Distributed Caching Paul Krzyzanowski. Rutgers University. Fall 2014

Big Table. Dennis Kafura CS5204 Operating Systems

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

CSE 124: Networked Services Lecture-16

CA485 Ray Walshe Google File System

Google File System. Arun Sundaram Operating Systems

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب

Chapter 24 NOSQL Databases and Big Data Storage Systems

Ghislain Fourny. Big Data 5. Column stores

Ghislain Fourny. Big Data 5. Wide column stores

Challenges for Data Driven Systems

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

BigData and Map Reduce VITMAC03

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

Typical size of data you deal with on a daily basis

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

CompSci 516: Database Systems

Transcription:

Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account info, recent queries Geography: roads, satellite images, points of interest, annotations Paul Krzyzanowski Rutgers University Fall 2018 Large scale Petabytes of data across thousands of servers Billions of URLs with many versions per page Hundreds of millions of users Thousands of queries per second 100TB+ satellite image data 1 2 Uses A big table At Google, used for: Google Analytics Google Finance Personalized search Blogger.com Google Code hosting YouTube Gmail Google Earth & Google Maps Dozens of others over sixty products Bigtable is NOT a relational database Bigtable appears as a large table A Bigtable is a sparse, distributed, persistent multidimensional map * rows html PUBLIC com.cnn.www <!DOCTYPE HTML PUBLIC com.weather <!DOCTYPE Web table example columns *Bigtable: OSDI 2006 3 4 Table Model (row, column, timestamp) cell contents Contents are arbitrary strings (arrays of bytes) rows columns Columns and Column Families Column Family Group of column keys Column family is the basic unit of data access Data in a column family is typically of the same type Implementation compresses data in the same column family com.aaa com.cnn.www com.cnn.www/tech com.weather <!DOCTYPE html <!DOCTYPE html <!DOCTYPE html <!DOCTYPE html <!DOCTYPE html versions t2 t4 Operations (1) Create column family this is an admin task done when table is created (2) Store data in any key within the family this can be done anytime There will typically be a small number of column families hundreds of column families A table may have an unlimited # of columns: often sparsely populated <!DOCTYPE html t15 Identified by Web table example family:qualifier 5 6 Paul Krzyzanowski 1

Column Families: example Three column families language for the web page contents of the web page anchor: contains text of anchors that reference this page. www.cnn.com is referenced by Sports Illustrated (cnnsi.com) and My-Look (mlook.ca) The value of ( com.cnn.www, anchor:cnnsi.com ) is CNN, the reference text from cnnsi.com. anchor:cnnsi.com anchor:mylook.ca html PUBLIC com.cnn.www <!DOCTYPE HTML PUBLIC Column family anchor CNN CNN.com Tables & Tablets Row operations are atomic Table partitioned dynamically by rows into tablets Tablet = range of contiguous rows Unit of distribution and load balancing Nearby rows will usually be served by the same server Accessing nearby rows requires communication with a small # of machines You need to select row keys to ensure good locality E.g., reverse domain names: com.cnn.www instead of www.cnn.com com.weather <!DOCTYPE 7 8 Table splitting Splitting a tablet A table starts as one tablet As it grows, it it split into multiple tablets Approximate size: 100-200 MB per tablet by default html PUBLIC com.cnn.www <!DOCTYPE HTML PUBLIC tablet com.weather <!DOCTYPE html PUBLIC com.cnn.www <!DOCTYPE HTML PUBLIC com.weather <!DOCTYPE com.wikipedia <!DOCTYPE com.zcorp <!DOCTYPE com.zoom <!DOCTYPE Split 9 10 Timestamps Each column family may contain multiple versions Version indexed by a 64-bit timestamp Real time or assigned by client Per-column-family settings for garbage collection Keep only latest n versions Or keep only versions written since time t Retrieve most recent version if no version specified If specified, return version where timestamp requested time API: Operations on Bigtable Create/delete tables & column families Change cluster, table, and column family metadata (e.g., access control rights) Write or delete values in cells Read values from specific rows Iterate over a subset of data in a table All members of a column family Multiple column families E.g., regular expressions, such as anchor:*.cnn.com Multiple timestamps Multiple rows Atomic read-modify-write row operations Allow clients to execute scripts (written in Sawzall) for processing data on the servers 11 12 Paul Krzyzanowski 2

Implementation: Supporting Services GFS For storing log and data files Cluster management system For scheduling jobs, monitoring health, dealing with failures Google SSTable (Sorted String Table) Internal file format optimized for streaming I/O and storing <key,value> data Provides a persistent, ordered, immutable map from keys to values Append-only Memory or disk based; indexes are cached in memory If there are additions/deletions/changes to rows New SSTables are written out with the deleted data removed Periodic compaction merges SSTables and removes old retired ones Implementation: Supporting Services Chubby is used to: Ensure there is only one active master Store bootstrap location of Bigtable data Discover tablet servers Store Bigtable schema information Store access control lists See http://goo.gl/mcd6ex for a description of SSTable 13 14 Implementation 1. Many tablet servers coordinate requests to tablets Can be added or removed dynamically Each manages a set of tablets (typically 10-1,000 tablets/server) Handles read/write requests to tablets Splits tablets when too large 2. One master server Assigns tablets to tablet server Balances tablet server load Garbage collection of unneeded files in GFS Schema changes (table & column family creation) 3. Client library Client data does not move though the master Master Clients communicate directly with tablet servers for reads/writes Tablet Servers 15 Implementation: METADATA table Three-level hierarchy Balanced structure similar to a B+ tree Root tablet contains location of all tablets in a special METADATA table Row key of METADATA table contains location of each tablet f(table_id, end_row) location of tablet Chubby file Root tablet (1st METADATA tablet) Other METADATA tablets 16 User Tablet 1 User Tablet N Implementation Tablet assigned to one tablet server at a time When master starts: Grabs a unique master lock in Chubby (prevent multiple masters) Scans the servers directory in Chubby to find live tablet servers Contacts each tablet server to discover what tablets are assigned to that server Scans the METADATA table to learn the full set of tablets Build a list of tablets not assigned to servers These will be assigned by choosing a tablet server & sending it a tablet load request Fault Tolerance Fault tolerance is provided by GFS & Chubby Dead tablet server Master is responsible for detecting when a tablet server is not working Asks tablet server for status of its lock If the tablet server cannot be reached or has lost its lock Master attempts to grab that server s lock If it succeeds, then the tablet server is dead or cannot reach Chubby Master moves tablets that were assigned to that server into an unassigned state Dead master Master kills itself when its Chubby lease expires Cluster management system detects a non-responding master Chubby: designed for fault tolerance (5-way replication) GFS: stores underlying data designed for n-way replication 18 19 Paul Krzyzanowski 3

Bigtable Replication Each table can be configured for replication to multiple Bigtable clusters in different data centers Eventual consistency model Google Analytics Raw Click Table (~200 TB) Row for each end-user session Row name: {website name and time of session} Sessions that visit the same web site are & contiguous Summary Table (~20 TB) Contains various summaries for each crawled website Generated from the Raw Click table via periodic MapReduce jobs 20 21 Personalized Search One Bigtable row per user (unique user ID) Column family per type of action E.g., column family for web queries (your entire search history!) Bigtable timestamp for each element identifies when the event occurred Uses MapReduce over Bigtable to personalize live search results Google Maps / Google Earth Preprocessing Table for raw imagery (~70 TB) Each row corresponds to a single geographic segment Rows are named to ensure that adjacent segments are near each other Column family: keep track of sources of data per segment (this is a large # of columns one for each raw data image but sparse) MapReduce used to preprocess data Serving Table to index data stored in GFS Small (~500 GB) but serves tens of thousands of queries with low latency 22 23 Bigtable outside of Google Apache HBase Built on the Bigtable design Small differences (may disappear) access control not enforced per column family Millisecond vs. microsecond timestamps No client script execution to process stored data Built to use HDFS or any other file system No support for memory mapped tablets Improved fault tolerance with multiple masters on standby Bigtable vs. Amazon Dynamo Dynamo targets apps that only need key/value access with a primary focus on high availability key-value store versus column-store (column families and columns within them) Bigtable: distributed DB built on GFS Dynamo: distributed hash table Updates are not rejected even during network partitions or server failures 24 25 Paul Krzyzanowski 4

The end 26 Paul Krzyzanowski 5