References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Similar documents
CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

BigTable. CSE-291 (Cloud Computing) Fall 2016

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

CS November 2017

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

BigTable: A Distributed Storage System for Structured Data

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Distributed Systems [Fall 2012]

Distributed File Systems II

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

CS November 2018

Structured Big Data 1: Google Bigtable & HBase Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Extreme Computing. NoSQL.

7680: Distributed Systems

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements

CS5412: OTHER DATA CENTER SERVICES

CS5412: DIVING IN: INSIDE THE DATA CENTER

CSE-E5430 Scalable Cloud Computing Lecture 9

Bigtable: A Distributed Storage System for Structured Data

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

Bigtable: A Distributed Storage System for Structured Data

Outline. Spanner Mo/va/on. Tom Anderson

CA485 Ray Walshe NoSQL

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

DIVING IN: INSIDE THE DATA CENTER

Data Storage in the Cloud

BigTable A System for Distributed Structured Storage

MapReduce & BigTable

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Bigtable: A Distributed Storage System for Structured Data

Google big data techniques (2)

BigTable: A System for Distributed Structured Storage

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

NPTEL Course Jan K. Gopinath Indian Institute of Science

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

CS5412: DIVING IN: INSIDE THE DATA CENTER

Distributed Data Management. Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

11 Storage at Google Google Google Google Google 7/2/2010. Distributed Data Management

CA485 Ray Walshe Google File System

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

The Google File System

CLOUD-SCALE FILE SYSTEMS

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Big Table. Dennis Kafura CS5204 Operating Systems

Lecture: The Google Bigtable

Distributed Systems 16. Distributed File Systems II

Distributed Database Case Study on Google s Big Tables

Infrastructure system services

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

Distributed Filesystem

Map-Reduce. Marco Mura 2010 March, 31th

From Google File System to Omega: a Decade of Advancement in Big Data Management at Google

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

CS 655 Advanced Topics in Distributed Systems

Data Informatics. Seon Ho Kim, Ph.D.

CSE 124: Networked Services Lecture-16

The Google File System

Lessons Learned While Building Infrastructure Software at Google

CompSci 516 Database Systems

CS /29/18. Paul Krzyzanowski 1. Question 1 (Bigtable) Distributed Systems 2018 Pre-exam 3 review Selected questions from past exams

The Google File System

Google is Really Different.

Distributed Systems Pre-exam 3 review Selected questions from past exams. David Domingo Paul Krzyzanowski Rutgers University Fall 2018

The Google File System

Google File System. Arun Sundaram Operating Systems

Distributed computing: index building and use

Google File System. By Dinesh Amatya

Distributed Programming the Google Way

CSE 124: Networked Services Fall 2009 Lecture-19

Google Data Management

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Comparing SQL and NOSQL databases

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Distributed File Systems

The Google File System

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Design & Implementation of Cloud Big table

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Google File System 2

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

Tools for Social Networking Infrastructures

The Google File System

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

Cluster-Level Google How we use Colossus to improve storage efficiency

Google Disk Farm. Early days

Transcription:

References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed Storage System for Structured Data Fay Chang et al OSDI 2006 Online documentation: HBase Magda Balazinska - CSE 444, Spring 2013 1 Magda Balazinska - CSE 444, Spring 2013 2 What is Bigtable? Bigtable Data Model Distributed storage system Designed to Hold structured data Scale to thousands of servers Store up to several hundred TB (maybe even PB) Perform backend bulk processing Perform real-time data serving To scale, Bigtable has a limited set of features Sparse, multidimensional sorted map (row:string, column:string, time:int64)è string Notice how everything but time is a string Example from Fig 1: Columns are grouped into families Magda Balazinska - CSE 444, Spring 2013 3 Magda Balazinska - CSE 444, Spring 2013 4 Key Features Read/writes of data under single row key is atomic Only single-row transactions! Data is stored in lexicographical order Improves data access locality Horizontally partitioned into tablets Tablets are unit of distribution and load balancing Column families are unit of access control Data is versioned (old versions garbage collected) Ex: most recent three crawls of each page, with times Outline Magda Balazinska - CSE 444, Spring 2013 5 Magda Balazinska - CSE 444, Spring 2013 6 1

API Data definition Creating/deleting tables or column families Changing access control rights Data manipulation Writing or deleting values Looking up values from individual rows Iterate over subset of data in the table Bigtable can serve as input/output for MapReduce Magda Balazinska - CSE 444, Spring 2013 7 Outline Magda Balazinska - CSE 444, Spring 2013 8 Chubby Lock Service Google File System In a distributed system, agreement is a problem Different failure scenarios are possible Nodes can have inconsistent views of who is up and who is down Messages can arrive out-of-order at different nodes But need agreement to make decisions Chubby Provides black-box agreement service through lock abstraction Uses the well-known Paxos algorithm Magda Balazinska - CSE 444, Spring 2013 9 A file = A series of chunks Size of a chunk 64MB Append & read only Fault-tolerance Chunks are distributed Chunks are replicated File Master node Decides chunk placement Decides replica placement Tells clients where to find data Master 10 A Table in Bigtable: Basics A table consists of a set of tablets: Section 53 Each tablet stores a range of the table Each tablet comprises one or more s Table (example with two tablets) Tablet 1 Tablet 2 Each is stored in a GFS file Initially, a tablet has only one Updates add s Major compaction puts everything back into one 11 Details Persistent map from keys to values Ordered Immutable Keys and values are strings API Look up value associated with a key Iterate over all key/value pairs in given range Implementation Sequence of blocks One block index to locate other blocks Magda Balazinska - CSE 444, Spring 2013 12 2

Details is a sequence of blocks Last block is the index to locate other blocks Index is loaded into memory when is open Optionally, whole can be memory mapped Lookup in Read index block of Binary search on index block to find data block Read data block is a GFS File Data Data Data Data Data Data Index Block Magda Balazinska - CSE 444, Spring 2013 13 Magda Balazinska - CSE 444, Spring 2013 14 A Table in Bigtable: Basics A table consists of a set of tablets: Section 53 Each tablet stores a range of the table Each tablet comprises one or more s Table (example with two tablets) Tablet 1 Tablet 2 Each is stored in a GFS file Initially, a tablet has only one Updates add s Major compaction puts everything back into one 15 BigTable Components A library linked into every client One master server Assigns tablets to tablet servers Ensures load balance between tablet servers Detects when tablet servers come and go Handles schema changes Many tablet servers (can be added/removed) Each server manages a set of tablets (10 to 1K) Loads tablets into memory Handles read and write requests Splits tablets that have grown too large Magda Balazinska - CSE 444, Spring 2013 16 Finding Tablet Servers Read Operation on Table Hierarchy analogous t B+ tree On first request, client needs 3 network round trips to find tablet location Root tablet (1st METADATA tablet) Chubby file Subsequently, clients cache tablet locations Other METADATA tablets UserTable1 UserTableN Assuming simple case of 1 tablet = 1 Find location of appropriate tablet Find appropriate tablet in the table and its location Use tablet location hierarchy from previous slide Metadata for a tablet contains list of s Then read data from the Magda Balazinska - CSE 444, Spring 2013 17 Magda Balazinska - CSE 444, Spring 2013 18 3

Assigning Tablets to Tablet Servers Problem Need to balance load for serving read/write requests Want to avoid Chubby file and root tablet being hot-spots Solution Master Assigns tablets to tablet servers Manages tablet server churn and load imbalances Processes schema changes Tablet server Loads tablets into memory (ie, loads index blocks of s) Handles read/write to tablets that it has loaded Splits large tablets Clients cache tablets locations Magda Balazinska - CSE 444, Spring 2013 19 Writing to Tablets Remember: s are immutable When a write operation arrives at a tablet server: Write mutation to a separate commit log stored in GFS Wait until done Insert the mutation into in-memory buffer: memtable The memtable is sorted lexicographically To serve reads, the tablet server Merges s and memtable into a single view Magda Balazinska - CSE 444, Spring 2013 20 Tablet Representation Loading Tablets memtable Read Op To load a tablet, a tablet server does the following Memory GFS tablet log Finds location of tablet through its METADATA (Fig 4) Metadata for tablet includes list of s and set of redo points Read s index blocks into memory Recall an consists of a set of blocks + 1 index block Write Op Files Read the commit log since redo point and reconstructs the memtable (the METADATA includes the redo point) Magda Balazinska - CSE 444, Spring 2013 21 Magda Balazinska - CSE 444, Spring 2013 22 Compaction Optimizations To keep memtables below a threshold Minor compaction: convert memtable into Merging compaction: Read a few s and the memtable Write out a new Major compaction: Replace all s and memtable with a new Vertical partitioning: locality groups Compression of blocks Caching of data Additional indexing: bloom filters Avoid reading that does not have needed data Commit log optimizations Single commit log per tablet server Tablet migration optimization Magda Balazinska - CSE 444, Spring 2013 23 Magda Balazinska - CSE 444, Spring 2013 24 4

Outline Performance Magda Balazinska - CSE 444, Spring 2013 25 Magda Balazinska - CSE 444, Spring 2013 26 Summary Bigtable is a distributed system for storing structured data Provides high performance and high availability Scales incrementally Restricted functionality Widely used by many applications at Google Try HBase http://hbaseapacheorg/ Next Steps Magda Balazinska - CSE 444, Spring 2013 27 Magda Balazinska - CSE 444, Spring 2013 28 5