Cloud Computing. Up until now

Similar documents
BigTable: A Distributed Storage System for Structured Data

CS November 2017

CS November 2018

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Bigtable. Presenter: Yijun Hou, Yixiao Peng

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Distributed File Systems II

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

BigTable. CSE-291 (Cloud Computing) Fall 2016

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

Distributed Systems [Fall 2012]

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

CA485 Ray Walshe NoSQL

Outline. Spanner Mo/va/on. Tom Anderson

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Exam 2 Review. October 29, Paul Krzyzanowski 1

Design & Implementation of Cloud Big table

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Consensus and related problems

MapReduce & BigTable

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements

Bigtable: A Distributed Storage System for Structured Data

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Extreme Computing. NoSQL.

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

CSE-E5430 Scalable Cloud Computing Lecture 9

Applications of Paxos Algorithm

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Intuitive distributed algorithms. with F#

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

Distributed Data Store

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Google File System. Arun Sundaram Operating Systems

Azure-persistence MARTIN MUDRA

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

Distributed Systems 16. Distributed File Systems II

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

Distributed computing: index building and use

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Data Informatics. Seon Ho Kim, Ph.D.

7680: Distributed Systems

DIVING IN: INSIDE THE DATA CENTER

Distributed computing: index building and use

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Distributed Systems. Tutorial 9 Windows Azure Storage

CS5412: OTHER DATA CENTER SERVICES

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Integrity in Distributed Databases

Introduction to NoSQL Databases

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1

Structured Big Data 1: Google Bigtable & HBase Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

CSE 124: Networked Services Lecture-16

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

Paxos and Distributed Transactions

App Engine: Datastore Introduction

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Distributed Database Case Study on Google s Big Tables

CISC 7610 Lecture 2b The beginnings of NoSQL

CIT 668: System Architecture. Amazon Web Services

Today s Papers. Google Chubby. Distributed Consensus. EECS 262a Advanced Topics in Computer Systems Lecture 24. Paxos/Megastore November 24 th, 2014

CS5412: DIVING IN: INSIDE THE DATA CENTER

CS 655 Advanced Topics in Distributed Systems

The Google File System

Data Storage in the Cloud

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Map-Reduce. Marco Mura 2010 March, 31th

Cloud Computing Platform as a Service

Data Management in the Cloud. Tim Kraska

The Google File System

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

CLOUD-SCALE FILE SYSTEMS

CSE 124: Networked Services Fall 2009 Lecture-19

Distributed Systems 11. Consensus. Paul Krzyzanowski

The Google File System. Alexandru Costan

Vlad Vinogradsky

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Replication in Distributed Systems

Microsoft Azure Storage

W b b 2.0. = = Data Ex E pl p o l s o io i n

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Comparing SQL and NOSQL databases

Saranya Sriram Developer Evangelist Microsoft Corporation India

Lessons Learned While Building Infrastructure Software at Google

Transcription:

Cloud Computing Lectures 15 and 16 Cloud Storage 2011-2012 Up until now Introduction Definition of Cloud Computing Grid Computing Content Distribution Networks Cycle-Sharing Distributed Scheduling Map Reduce Cloud Storage File Systems Object Storage 1

Table Storage Services: Google Big Table Hadoop HBase Amazon SimpleDB Microsoft Azure Tables Messaging Services: Amazon SQS Google App Engine Queues Microsoft Azure Queues Relational Database Services: Amazon RDS Azure SQL Outline Table Storage Services 2

Motivation: Why Simple Table Storage? The concept of a record (a row) is essential for most object representations. Having an indexed data structure is essential for queries. Data model matches well DB concepts. ACID transactions on a conventional relational model do not scale. Google BigTable Google table storage service. Used, e.g., for crawling the web. Google Datastore is based on BigTable. 3

Data Model A table has entities (records, rows, ). A table has column families: Created statically (e.g. family = anchor ) A column family is like a property. Each column family has columns: Created dynamically (e.g. column= anchor:cnnsi.com ) A column is an instance of a property. Each column is time stamped: An entity can have several versions with different timestamps. Stored in reverse chronological order. Entities Entities are ordered alphabetically. Entities can be read, written, deleted, scanned (key prefix selection), range scanned (key range selection) ACID transactions only on a single entity. Sequences of entities are stored in tablets. The choice of key determines which entities are nearby in the tablet!! 4

Example: Reading and Iterating // Open the table Table *T = OpenOrDie("/bigtable/web/ webtable"); // Write a new anchor and delete an old anchor RowMutation r1(t, "com.cnn.www"); r1.set("anchor:www.cspan.org", "CNN"); r1.delete("anchor:www.abc.co m"); Operation op; Apply(&op, &r1); Scanner scanner(t); ScanStream *stream; stream = scanner.fetchcolumnfamily ("anchor"); Stream-> SetReturnAllVersions(); scanner.lookup("com.cnn.www" ); for (;!stream->done(); stream->next()) { printf("%s %s %lld %s\n", scanner.rowname(), stream->columnname(), stream->microtimestamp(), stream->value()); } Tablets Each tablet contains a contiguous sequence of entities. Tablets are stores in SSTablefiles within GFS (or HDFS in Hadoop). SSTable, file format: Index of <key, value> pairs + 64kB data blocks. 64K 64K 64K SSTable Index 5

Components The client application uses a library to interact with the service. Master performs operations on: Tables: creation. Column families: creation. Tablets: allocation, elimination, load balancing tablets among tablet servers. Tablet Server: Performs reading, writing and partitioning tablets that are too large (> 200MB). Metadata: Where are the tablets? BigTableuses Chubby, a locking and small-file repository for: Locating BigTable s Metadata. Checking whether tablet servers are alive. Chubby uses five replicated servers executing the Paxos election algorithm. Each operation on Chubby (insertion, deletion,...) triggers an election between the Chubby nodes. 6

Primary Election Distributed consensus problem. Asynchronous communication causes: loss, delay, reordering FLP (Fischer-Lynch-Patterson) impossibility result: In an asynchronous system with no common timing, consensus is impossible. Solution: Paxos protocol Paxos: Problem Collection of processes proposing values Only proposed value may be chosen Only single value chosen Learn of chosen value only when it has been voted Nodes can be proposers, acceptors, learners Asynchronous, non-byzantine model: Arbitrary speeds, fail by stopping, restart Messages not corrupted 7

Paxos: Algorithm Phase 1 (a) Proposer sends prepare request with #n (b) Acceptor: if n > # of any other prepareit has replied to, respond with promise. Phase 2 (a) If majority reply, proposer sends accept with value v (b) Acceptor accepts unless it responded to preparewith # higher than n. Paxos: Algorithm 8

Replicated state machine: Paxos: State Machines Same state if same sequence of ops. Performed. Client sends requests to server: Replicated with Paxos. Paxosused to agree on order of client ops.: Can have failures / more than 1 master. Paxosguarantees only 1 value chosen & replicated. Metadata Each metadata tablethas up to 128MB. Clients maintain a cache of tablet locations. 9

Writing Writing is done on two tablet server structures: In a redo log (one per server) and In a memory temporary table, memtable. Compactions of the memtable into SSTables: Minor: when the memtablereaches a certain size, it s converted into an SSTable. Reduces log and memory occupation. Merge: Groups SSTables from minor compactions. Major: Converts SSTablesto a minimum by filtering entity removals. Fault Tolerance At startup, the mastercompares the list of live servers and the tablets they claim to have with the metadata and updates the tablet distribution. Every time a tablet server dies, the master is notified. 10

Use at Google (2006) Datastore: BigTablefor Programmers Object storage over BigTable: Object = entity. Each entity is accessible by a single application. Each entity has a system unique id that includes app id and kind id. Kind is a namespace for entities. Each entity has an app-wide id. Data are stores in the entities columns. Columns have a name and a value. 11

Datastore: Example DatastoreService ds = DatastoreServiceFactory.getDatastore Service(); Entity book = new Entity("Book"); book.setproperty("title", "The Grapes of Wrath"); book.setproperty("author", "John Steinbeck"); ds.put(book); ds.delete( Book ); Datastore: DB-like support An intermediate layer (Megastore) executes queries, builds indexes and performs multi-register transactional support. Indexes: Kind Index (kind, key): object index with one kind per key. Single-property index (kind, name, value key): kind/column/key indexes, that are created on demand (both ascending and descending versions). Composite index: defined by the user inside datastoreindexes.xml, created before running the app. 12

Transactions BigTable supports only transactions on one entity. Datastoresupports transactions on several entities. However Transactions can only operate on one server (same entity group). No distributed transactions. The usual relational DB scalability issues Datastore transactions: Use optimistic concurrency control. Cannot be nested (no sub-transactions). Are cancelled at the end of the servlet(gae is web app oriented) if they are not confirmed. Transactions: Example try { Transaction txn = ds.begintransaction(); try { boardkey = KeyFactory.createKey("MessageBoard", boardname); messageboard = ds.get(boardkey); } catch (EntityNotFoundException e) { messageboard = new Entity("MessageBoard", boardname); messageboard.setproperty("count", 0); boardkey = ds.put(messageboard); } txn.commit(); //ou txn.rollback() } catch (DatastoreFailureException e) { // Report an error... } 13

Google Query Language Simplified database query language for Google Datastore. All queries are transformed in scan operations on index BigTables. A small subset of SQL. Only supports SELECT. Has no INSERT, UPDATE or DELETE. Google Query Language SELECT [* key ] FROM <kind> [WHERE <condition> [AND <condition>...]] [ORDER BY <property> [ASC DESC] [, <property> [ASC DESC]...]] [LIMIT [<offset>,]<count>] [OFFSET <offset>] <condition> := <property> {< <= > >= =!= } <value> <condition> := <property> IN <list> <condition> := ANCESTOR IS <entity or key> 14

SimpleDB Hierarchical data storage. Based on S3. Adds multiple attributes, indexing and queries. Ad hocdata storage: No system administration cost. No schema. Automatically scalable. Efficient for data where read operations dominate, due to eventual consistency. If conflicts are too common, system becomes inefficient. E.g. forums, metadata, backups. SimpleDB: Data Model Dominions: Equivalent to a table. Identified by a string. 100/account. 10 GB/dominion. Items Identified by a string. Unlimited number per dominion. Attributes <key, value> pairs. No types, just strings. Automatically indexed. 256/item. 250 million/domain. 1KB/attribute. 15

SimpleDB: Missing Features No transactions. No notifications. May be compensated by a messaging service like SQS. No ordering: Must be done at the client. No joins. No types: Only string comparisons. Care must be taken to ensure that comparisons are accurate, e.g.add prefix zeros to numbers ( 001, 002, 003 and not 1,2,3 ). Does not store bags of bytes : only 1 KB. For large objects, S3 must be used directly. SimpleDB: Queries Select: allows querying the domain select target from domain_name where query_expression. Supported operators: =,!=, <, > <=, >=, like, not like, between, is null, is not null. Example: select * from mydomain where keyword = Book. Supports ordering with the SORT operator. And counting using count(). 16

Windows Azure Tables Similar to BigTable. Storage hierarchy: Table -> Entity -> Property -> <name, type, value> Supported types: Binary, Bool, DateTime, Double, GUID, Int, Int64, e String. URL Schema: http://<storageaccount>.table.core.windows.net/<tablenam e>?$filter=<query> Optimistic concurrency control. 17

On Tables: Operations on Tables Create, Delete, Query Example (GET using REST): http://myaccount.table.core.windows.net/customers()?$filter=lastna me%20eq%20'smith'%20and%20firstname%20eq%20'john' On Entities: Insert, Update, Merge, Delete It s possible to perform transactions by grouping operations on entities. Using SOAP, POST the operations list to: http://myaccount.table.core.windows.net/$batch Messaging Services 18

Messaging Services Why? Participants can be weakly connected: No network connection. No simultaneous execution. No binary compatibility. Very useful for: Connecting heterogeneous/legacy systems. Workflow systems. Processes can be manipulated by adding/removing/replicating messages. Examples: Amazon Simple Queue Service (SQS). Microsoft Azure Queues. Communication Service: Reliable. Amazon SQS: Simple Queue Service Persistent (1 hour to 2 weeks; default: 4days). A message is a block of text with up to 8 kb. Queues store messages until they are delivered. A queue stores related messages and can be configured with specific delivery and access control options. 19

SQS: Consistency Message queues are replicated for fault tolerance and scalability: When the queue is read, a quorum of replicas is contacted and therefore all messages may not be read. Message delivery is triggered by the receiver. Therefore no delivery times are guaranteed. SQS does not enforce/guarantee message ordering. The delivery semantics is at least once. SQS: Programming Guidelines Use a idempotent message protocol: I.e. don t design operations that pressuposea particular application state, e.g.: Choose SetValue(v,i) and not IncrementValue(v,i). Choose NewPosition(x,y) and not MoveForward(). Don t use SQS for applications with timing restrictions, e.g. transactional systems. 20

SQS: Operations CreateQueue ListQueues DeleteQueue: by default, only deletes empty queues. SendMessage ReceiveMessage: does not remove messages. Makes them invisible. PeekMessage: Read a message without changing the queue. DeleteMessage: Removes the message from the queue. SQS: Java Example //using the Queue Java library public class SampleDriver { static public String accesskeyid = ""; static public String secretaccesskey = ""; static public String QueueServiceURL = "http://queue.amazonaws.com/"; static private String queuename = "SQS-Test-Queue-Java"; static private String testmessage = "This is a test message."; public static void main(string[] args) throws Exception { testqueue = Queue.createQueue(queueName); List<Queue> queues = Queue.listQueues(queueName); for(queue queue : queues) { if(queue.getqueueendpoint().equals(t estqueue.getqueueendpoint())) { System.out.println("Queue found");} } String msgid = testqueue.sendmessage(testmessage ).getid(); String qcount = testqueue.getapproximatenumberofm essages(); List<QMessage> messages = testqueue.receivemessage(1); do { Thread.sleep(1000); // wait for a second messages = testqueue.receivemessage(1); } while (messages.size() == 0); QMessage message = messages.get(0); testqueue.deletemessage(message.g etreceipthandle()); }} 21

Microsoft Azure Queues Reliable and persistent messaging service. Unlimited queues per account and messages per queue. Each message can have up to 8 kb. Fault tolerance mechanism similar to Amazon: when messages are read they become invisible for a period. Operations on Queues One can reference queues and messages, e.g.: http://<storageaccount>.queue.core.windows.net/<queue Name> Operations on Queues (REST): http://myaccount.queue.core.windows.net?comp=list (list queues) Create, Delete Operations on Messages (REST): Put, Get, Peek, Delete 22

Queues in C#: Writing StorageAccountInfo account = new StorageAccountInfo( baseuri, null, accountname, accountkey); QueueStorage service = QueueStorage.Create(account); MessageQueue queue = service.getqueue("messages"); if (!queue.doesqueueexist()) { queue.createqueue(); } Message msg = new Message(txtMessage.Text); queue.putmessage(msg); Queues in C#: Reading StorageAccountInfo account = new StorageAccountInfo( baseuri, null, accountname, accountkey); QueueStorage service = QueueStorage.Create(account); MessageQueue queue = service.getqueue("messages"); if (queue.doesqueueexist()) { Message msg = queue.getmessage(); if (msg!= null) { RoleManager.WriteToLog("Information", string.format("message '{0}' processed.", msg.contentasstring())); queue.deletemessage(msg); }} 23

Relational Database Services Amazon RDS: Relational Database Service AWS relational database. Goal: Simplify porting applications. Take advantage of low latency inside the cluster: Co-locate apps and DB. Based on MySQL. Supports automatic backups. Supports passive replication in different data centers (Multi Access Zone) for fault tolerance. Supports read replicas for load balancing. Accessed using sysadmin command line tools: DB creation returns a DNS name. From that point on, it s a conventional MySQL server. 24

Variants Small BD: 1.7 GB RAM, 1 core Large BD: 7.5 GB RAM, 2 cores XL BD : 15 GB RAM, 4 cores Double XL BD: 34 GB RAM, 4 cores Quadruple BD XL: 68 GB RAM, 8 cores Disk: from 5GB to 1TB Transitions in scheduled moments with up to 2 hours of downtime. SQL Azure Reporting Business Analytics Data Sync Relational database service: High availability, automatic maintenance. The fabric controller monitors the server load and redistributes the partitions with a higher load. 25

SQL Azure Based on SQL Server. Replication on 3 copies. Strong coherence: When a write operation returns, all replicas were updated. Maximum DB size: 10GB Acess using: OBDC, JDBC, ADO.NET, LINQ SQL Azurevs. AmazonRDS Size: RDS, upto 1TB SQL Azure10GB Specificity: Azureisdesignedfor thecloud, RDS isjusta MySQLEC2 instance. Configurability: The RDS instance can be configured. Compatibility: RDS isfull-fledgedmysql. SQL Azureisa subseto T-SQL. (Price: Different ways and prices to charge for DB, bandwidht andram ) 26

Storage: Overview AWS Microsoft Azure Google / Hadoop SQL RDS SQL Azure X Tables SimpleDB Tables Datastore [BigTable]/ HBase Objects/Blocks S3 Blobs (GFS)/ HDFS Queues Simple Queue Service(SQS) Queues Task Queues Storage: Comparison There are two general complaints: Perfomance(latency). Coherence models do not scale. The problem of scalability is not solved. There are no reliable benchmarks. The market is still in a very dynamic phase. Google storage services are not accessible remotely. Although you can create an intermediate service. 27

Next Time... Execution and Programming Models in Cloud Computing 28