Cloud Computing. Up until now
|
|
- Kory Porter
- 5 years ago
- Views:
Transcription
1 Cloud Computing Lectures 15 and 16 Cloud Storage Up until now Introduction Definition of Cloud Computing Grid Computing Content Distribution Networks Cycle-Sharing Distributed Scheduling Map Reduce Cloud Storage File Systems Object Storage 1
2 Table Storage Services: Google Big Table Hadoop HBase Amazon SimpleDB Microsoft Azure Tables Messaging Services: Amazon SQS Google App Engine Queues Microsoft Azure Queues Relational Database Services: Amazon RDS Azure SQL Outline Table Storage Services 2
3 Motivation: Why Simple Table Storage? The concept of a record (a row) is essential for most object representations. Having an indexed data structure is essential for queries. Data model matches well DB concepts. ACID transactions on a conventional relational model do not scale. Google BigTable Google table storage service. Used, e.g., for crawling the web. Google Datastore is based on BigTable. 3
4 Data Model A table has entities (records, rows, ). A table has column families: Created statically (e.g. family = anchor ) A column family is like a property. Each column family has columns: Created dynamically (e.g. column= anchor:cnnsi.com ) A column is an instance of a property. Each column is time stamped: An entity can have several versions with different timestamps. Stored in reverse chronological order. Entities Entities are ordered alphabetically. Entities can be read, written, deleted, scanned (key prefix selection), range scanned (key range selection) ACID transactions only on a single entity. Sequences of entities are stored in tablets. The choice of key determines which entities are nearby in the tablet!! 4
5 Example: Reading and Iterating // Open the table Table *T = OpenOrDie("/bigtable/web/ webtable"); // Write a new anchor and delete an old anchor RowMutation r1(t, "com.cnn.www"); r1.set("anchor: "CNN"); r1.delete("anchor: m"); Operation op; Apply(&op, &r1); Scanner scanner(t); ScanStream *stream; stream = scanner.fetchcolumnfamily ("anchor"); Stream-> SetReturnAllVersions(); scanner.lookup("com.cnn.www" ); for (;!stream->done(); stream->next()) { printf("%s %s %lld %s\n", scanner.rowname(), stream->columnname(), stream->microtimestamp(), stream->value()); } Tablets Each tablet contains a contiguous sequence of entities. Tablets are stores in SSTablefiles within GFS (or HDFS in Hadoop). SSTable, file format: Index of <key, value> pairs + 64kB data blocks. 64K 64K 64K SSTable Index 5
6 Components The client application uses a library to interact with the service. Master performs operations on: Tables: creation. Column families: creation. Tablets: allocation, elimination, load balancing tablets among tablet servers. Tablet Server: Performs reading, writing and partitioning tablets that are too large (> 200MB). Metadata: Where are the tablets? BigTableuses Chubby, a locking and small-file repository for: Locating BigTable s Metadata. Checking whether tablet servers are alive. Chubby uses five replicated servers executing the Paxos election algorithm. Each operation on Chubby (insertion, deletion,...) triggers an election between the Chubby nodes. 6
7 Primary Election Distributed consensus problem. Asynchronous communication causes: loss, delay, reordering FLP (Fischer-Lynch-Patterson) impossibility result: In an asynchronous system with no common timing, consensus is impossible. Solution: Paxos protocol Paxos: Problem Collection of processes proposing values Only proposed value may be chosen Only single value chosen Learn of chosen value only when it has been voted Nodes can be proposers, acceptors, learners Asynchronous, non-byzantine model: Arbitrary speeds, fail by stopping, restart Messages not corrupted 7
8 Paxos: Algorithm Phase 1 (a) Proposer sends prepare request with #n (b) Acceptor: if n > # of any other prepareit has replied to, respond with promise. Phase 2 (a) If majority reply, proposer sends accept with value v (b) Acceptor accepts unless it responded to preparewith # higher than n. Paxos: Algorithm 8
9 Replicated state machine: Paxos: State Machines Same state if same sequence of ops. Performed. Client sends requests to server: Replicated with Paxos. Paxosused to agree on order of client ops.: Can have failures / more than 1 master. Paxosguarantees only 1 value chosen & replicated. Metadata Each metadata tablethas up to 128MB. Clients maintain a cache of tablet locations. 9
10 Writing Writing is done on two tablet server structures: In a redo log (one per server) and In a memory temporary table, memtable. Compactions of the memtable into SSTables: Minor: when the memtablereaches a certain size, it s converted into an SSTable. Reduces log and memory occupation. Merge: Groups SSTables from minor compactions. Major: Converts SSTablesto a minimum by filtering entity removals. Fault Tolerance At startup, the mastercompares the list of live servers and the tablets they claim to have with the metadata and updates the tablet distribution. Every time a tablet server dies, the master is notified. 10
11 Use at Google (2006) Datastore: BigTablefor Programmers Object storage over BigTable: Object = entity. Each entity is accessible by a single application. Each entity has a system unique id that includes app id and kind id. Kind is a namespace for entities. Each entity has an app-wide id. Data are stores in the entities columns. Columns have a name and a value. 11
12 Datastore: Example DatastoreService ds = DatastoreServiceFactory.getDatastore Service(); Entity book = new Entity("Book"); book.setproperty("title", "The Grapes of Wrath"); book.setproperty("author", "John Steinbeck"); ds.put(book); ds.delete( Book ); Datastore: DB-like support An intermediate layer (Megastore) executes queries, builds indexes and performs multi-register transactional support. Indexes: Kind Index (kind, key): object index with one kind per key. Single-property index (kind, name, value key): kind/column/key indexes, that are created on demand (both ascending and descending versions). Composite index: defined by the user inside datastoreindexes.xml, created before running the app. 12
13 Transactions BigTable supports only transactions on one entity. Datastoresupports transactions on several entities. However Transactions can only operate on one server (same entity group). No distributed transactions. The usual relational DB scalability issues Datastore transactions: Use optimistic concurrency control. Cannot be nested (no sub-transactions). Are cancelled at the end of the servlet(gae is web app oriented) if they are not confirmed. Transactions: Example try { Transaction txn = ds.begintransaction(); try { boardkey = KeyFactory.createKey("MessageBoard", boardname); messageboard = ds.get(boardkey); } catch (EntityNotFoundException e) { messageboard = new Entity("MessageBoard", boardname); messageboard.setproperty("count", 0); boardkey = ds.put(messageboard); } txn.commit(); //ou txn.rollback() } catch (DatastoreFailureException e) { // Report an error... } 13
14 Google Query Language Simplified database query language for Google Datastore. All queries are transformed in scan operations on index BigTables. A small subset of SQL. Only supports SELECT. Has no INSERT, UPDATE or DELETE. Google Query Language SELECT [* key ] FROM <kind> [WHERE <condition> [AND <condition>...]] [ORDER BY <property> [ASC DESC] [, <property> [ASC DESC]...]] [LIMIT [<offset>,]<count>] [OFFSET <offset>] <condition> := <property> {< <= > >= =!= } <value> <condition> := <property> IN <list> <condition> := ANCESTOR IS <entity or key> 14
15 SimpleDB Hierarchical data storage. Based on S3. Adds multiple attributes, indexing and queries. Ad hocdata storage: No system administration cost. No schema. Automatically scalable. Efficient for data where read operations dominate, due to eventual consistency. If conflicts are too common, system becomes inefficient. E.g. forums, metadata, backups. SimpleDB: Data Model Dominions: Equivalent to a table. Identified by a string. 100/account. 10 GB/dominion. Items Identified by a string. Unlimited number per dominion. Attributes <key, value> pairs. No types, just strings. Automatically indexed. 256/item. 250 million/domain. 1KB/attribute. 15
16 SimpleDB: Missing Features No transactions. No notifications. May be compensated by a messaging service like SQS. No ordering: Must be done at the client. No joins. No types: Only string comparisons. Care must be taken to ensure that comparisons are accurate, e.g.add prefix zeros to numbers ( 001, 002, 003 and not 1,2,3 ). Does not store bags of bytes : only 1 KB. For large objects, S3 must be used directly. SimpleDB: Queries Select: allows querying the domain select target from domain_name where query_expression. Supported operators: =,!=, <, > <=, >=, like, not like, between, is null, is not null. Example: select * from mydomain where keyword = Book. Supports ordering with the SORT operator. And counting using count(). 16
17 Windows Azure Tables Similar to BigTable. Storage hierarchy: Table -> Entity -> Property -> <name, type, value> Supported types: Binary, Bool, DateTime, Double, GUID, Int, Int64, e String. URL Schema: e>?$filter=<query> Optimistic concurrency control. 17
18 On Tables: Operations on Tables Create, Delete, Query Example (GET using REST): me%20eq%20'smith'%20and%20firstname%20eq%20'john' On Entities: Insert, Update, Merge, Delete It s possible to perform transactions by grouping operations on entities. Using SOAP, POST the operations list to: Messaging Services 18
19 Messaging Services Why? Participants can be weakly connected: No network connection. No simultaneous execution. No binary compatibility. Very useful for: Connecting heterogeneous/legacy systems. Workflow systems. Processes can be manipulated by adding/removing/replicating messages. Examples: Amazon Simple Queue Service (SQS). Microsoft Azure Queues. Communication Service: Reliable. Amazon SQS: Simple Queue Service Persistent (1 hour to 2 weeks; default: 4days). A message is a block of text with up to 8 kb. Queues store messages until they are delivered. A queue stores related messages and can be configured with specific delivery and access control options. 19
20 SQS: Consistency Message queues are replicated for fault tolerance and scalability: When the queue is read, a quorum of replicas is contacted and therefore all messages may not be read. Message delivery is triggered by the receiver. Therefore no delivery times are guaranteed. SQS does not enforce/guarantee message ordering. The delivery semantics is at least once. SQS: Programming Guidelines Use a idempotent message protocol: I.e. don t design operations that pressuposea particular application state, e.g.: Choose SetValue(v,i) and not IncrementValue(v,i). Choose NewPosition(x,y) and not MoveForward(). Don t use SQS for applications with timing restrictions, e.g. transactional systems. 20
21 SQS: Operations CreateQueue ListQueues DeleteQueue: by default, only deletes empty queues. SendMessage ReceiveMessage: does not remove messages. Makes them invisible. PeekMessage: Read a message without changing the queue. DeleteMessage: Removes the message from the queue. SQS: Java Example //using the Queue Java library public class SampleDriver { static public String accesskeyid = ""; static public String secretaccesskey = ""; static public String QueueServiceURL = " static private String queuename = "SQS-Test-Queue-Java"; static private String testmessage = "This is a test message."; public static void main(string[] args) throws Exception { testqueue = Queue.createQueue(queueName); List<Queue> queues = Queue.listQueues(queueName); for(queue queue : queues) { if(queue.getqueueendpoint().equals(t estqueue.getqueueendpoint())) { System.out.println("Queue found");} } String msgid = testqueue.sendmessage(testmessage ).getid(); String qcount = testqueue.getapproximatenumberofm essages(); List<QMessage> messages = testqueue.receivemessage(1); do { Thread.sleep(1000); // wait for a second messages = testqueue.receivemessage(1); } while (messages.size() == 0); QMessage message = messages.get(0); testqueue.deletemessage(message.g etreceipthandle()); }} 21
22 Microsoft Azure Queues Reliable and persistent messaging service. Unlimited queues per account and messages per queue. Each message can have up to 8 kb. Fault tolerance mechanism similar to Amazon: when messages are read they become invisible for a period. Operations on Queues One can reference queues and messages, e.g.: Name> Operations on Queues (REST): (list queues) Create, Delete Operations on Messages (REST): Put, Get, Peek, Delete 22
23 Queues in C#: Writing StorageAccountInfo account = new StorageAccountInfo( baseuri, null, accountname, accountkey); QueueStorage service = QueueStorage.Create(account); MessageQueue queue = service.getqueue("messages"); if (!queue.doesqueueexist()) { queue.createqueue(); } Message msg = new Message(txtMessage.Text); queue.putmessage(msg); Queues in C#: Reading StorageAccountInfo account = new StorageAccountInfo( baseuri, null, accountname, accountkey); QueueStorage service = QueueStorage.Create(account); MessageQueue queue = service.getqueue("messages"); if (queue.doesqueueexist()) { Message msg = queue.getmessage(); if (msg!= null) { RoleManager.WriteToLog("Information", string.format("message '{0}' processed.", msg.contentasstring())); queue.deletemessage(msg); }} 23
24 Relational Database Services Amazon RDS: Relational Database Service AWS relational database. Goal: Simplify porting applications. Take advantage of low latency inside the cluster: Co-locate apps and DB. Based on MySQL. Supports automatic backups. Supports passive replication in different data centers (Multi Access Zone) for fault tolerance. Supports read replicas for load balancing. Accessed using sysadmin command line tools: DB creation returns a DNS name. From that point on, it s a conventional MySQL server. 24
25 Variants Small BD: 1.7 GB RAM, 1 core Large BD: 7.5 GB RAM, 2 cores XL BD : 15 GB RAM, 4 cores Double XL BD: 34 GB RAM, 4 cores Quadruple BD XL: 68 GB RAM, 8 cores Disk: from 5GB to 1TB Transitions in scheduled moments with up to 2 hours of downtime. SQL Azure Reporting Business Analytics Data Sync Relational database service: High availability, automatic maintenance. The fabric controller monitors the server load and redistributes the partitions with a higher load. 25
26 SQL Azure Based on SQL Server. Replication on 3 copies. Strong coherence: When a write operation returns, all replicas were updated. Maximum DB size: 10GB Acess using: OBDC, JDBC, ADO.NET, LINQ SQL Azurevs. AmazonRDS Size: RDS, upto 1TB SQL Azure10GB Specificity: Azureisdesignedfor thecloud, RDS isjusta MySQLEC2 instance. Configurability: The RDS instance can be configured. Compatibility: RDS isfull-fledgedmysql. SQL Azureisa subseto T-SQL. (Price: Different ways and prices to charge for DB, bandwidht andram ) 26
27 Storage: Overview AWS Microsoft Azure Google / Hadoop SQL RDS SQL Azure X Tables SimpleDB Tables Datastore [BigTable]/ HBase Objects/Blocks S3 Blobs (GFS)/ HDFS Queues Simple Queue Service(SQS) Queues Task Queues Storage: Comparison There are two general complaints: Perfomance(latency). Coherence models do not scale. The problem of scalability is not solved. There are no reliable benchmarks. The market is still in a very dynamic phase. Google storage services are not accessible remotely. Although you can create an intermediate service. 27
28 Next Time... Execution and Programming Models in Cloud Computing 28
BigTable: A Distributed Storage System for Structured Data
BigTable: A Distributed Storage System for Structured Data Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26
More informationCS November 2017
Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationCS November 2018
Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationBigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612
Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Google Bigtable 2 A distributed storage system for managing structured data that is designed to scale to a very
More informationBigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao
Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement
More informationCSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores
CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)
More informationBigtable. Presenter: Yijun Hou, Yixiao Peng
Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng
More informationReferences. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals
References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationBigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis
BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis Motivation Lots of (semi-)structured data at Google URLs: Contents, crawl metadata, links, anchors, pagerank,
More informationBigTable. CSE-291 (Cloud Computing) Fall 2016
BigTable CSE-291 (Cloud Computing) Fall 2016 Data Model Sparse, distributed persistent, multi-dimensional sorted map Indexed by a row key, column key, and timestamp Values are uninterpreted arrays of bytes
More informationbig picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures
Lecture 20 -- 11/20/2017 BigTable big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures what does paper say Google
More informationDistributed Systems [Fall 2012]
Distributed Systems [Fall 2012] Lec 20: Bigtable (cont ed) Slide acks: Mohsen Taheriyan (http://www-scf.usc.edu/~csci572/2011spring/presentations/taheriyan.pptx) 1 Chubby (Reminder) Lock service with a
More informationBigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13
Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University
More informationCA485 Ray Walshe NoSQL
NoSQL BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data
More informationOutline. Spanner Mo/va/on. Tom Anderson
Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable
More informationBigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service
BigTable BigTable Doug Woos and Tom Anderson In the early 2000s, Google had way more than anybody else did Traditional bases couldn t scale Want something better than a filesystem () BigTable optimized
More informationExam 2 Review. October 29, Paul Krzyzanowski 1
Exam 2 Review October 29, 2015 2013 Paul Krzyzanowski 1 Question 1 Why did Dropbox add notification servers to their architecture? To avoid the overhead of clients polling the servers periodically to check
More informationDesign & Implementation of Cloud Big table
Design & Implementation of Cloud Big table M.Swathi 1,A.Sujitha 2, G.Sai Sudha 3, T.Swathi 4 M.Swathi Assistant Professor in Department of CSE Sri indu College of Engineering &Technolohy,Sheriguda,Ibrahimptnam
More informationBigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng
Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:
More informationCISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL
CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours
More informationConsensus and related problems
Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?
More informationMapReduce & BigTable
CPSC 426/526 MapReduce & BigTable Ennan Zhai Computer Science Department Yale University Lecture Roadmap Cloud Computing Overview Challenges in the Clouds Distributed File Systems: GFS Data Process & Analysis:
More informationIntroduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. M. Burak ÖZTÜRK 1 Introduction Data Model API Building
More informationBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber {fay,jeff,sanjay,wilsonh,kerr,m3b,tushar,fikes,gruber}@google.com
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing
ΕΠΛ 602:Foundations of Internet Technologies Cloud Computing 1 Outline Bigtable(data component of cloud) Web search basedonch13of thewebdatabook 2 What is Cloud Computing? ACloudis an infrastructure, transparent
More informationProgramming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines
A programming model in Cloud: MapReduce Programming model and implementation for processing and generating large data sets Users specify a map function to generate a set of intermediate key/value pairs
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationExtreme Computing. NoSQL.
Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable
More informationJargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems
Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons
More informationCSE-E5430 Scalable Cloud Computing Lecture 9
CSE-E5430 Scalable Cloud Computing Lecture 9 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 15.11-2015 1/24 BigTable Described in the paper: Fay
More informationApplications of Paxos Algorithm
Applications of Paxos Algorithm Gurkan Solmaz COP 6938 - Cloud Computing - Fall 2012 Department of Electrical Engineering and Computer Science University of Central Florida - Orlando, FL Oct 15, 2012 1
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable References Bigtable: A Distributed Storage System for Structured Data. Fay Chang et. al. OSDI
More informationGoal of the presentation is to give an introduction of NoSQL databases, why they are there.
1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in
More informationIntuitive distributed algorithms. with F#
Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype
More informationGoogle File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information
Subject 10 Fall 2015 Google File System and BigTable and tiny bits of HDFS (Hadoop File System) and Chubby Not in textbook; additional information Disclaimer: These abbreviated notes DO NOT substitute
More informationBig Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering
Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and Engineering wuct@cs.sjtu.edu.cn Schedule (1) Storage system part (first eight weeks) lec1: Introduction on
More information4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS
W13.A.0.0 CS435 Introduction to Big Data W13.A.1 FAQs Programming Assignment 3 has been posted PART 2. LARGE SCALE DATA STORAGE SYSTEMS DISTRIBUTED FILE SYSTEMS Recitations Apache Spark tutorial 1 and
More informationDistributed Data Store
Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is
More informationDistributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 10. Consensus: Paxos Paul Krzyzanowski Rutgers University Fall 2017 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationAzure-persistence MARTIN MUDRA
Azure-persistence MARTIN MUDRA Storage service access Blobs Queues Tables Storage service Horizontally scalable Zone Redundancy Accounts Based on Uri Pricing Calculator Azure table storage Storage Account
More informationBig Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla
Big Table Google s Storage Choice for Structured Data Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla Bigtable: Introduction Resembles a database. Does not support
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationRecap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2
Recap CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo Paxos is a consensus algorithm. Proposers? Acceptors? Learners? A proposer
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationMegastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Presented by Kewei Li The Problem db nosql complex legacy tuning expensive
More informationNoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate
More information7680: Distributed Systems
Cristina Nita-Rotaru 7680: Distributed Systems BigTable. Hbase.Spanner. 1: BigTable Acknowledgement } Slides based on material from course at UMichigan, U Washington, and the authors of BigTable and Spanner.
More informationDIVING IN: INSIDE THE DATA CENTER
1 DIVING IN: INSIDE THE DATA CENTER Anwar Alhenshiri Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs it to
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions Transactions Main issues: Concurrency control Recovery from failures 2 Distributed Transactions
More informationDistributed Systems. Tutorial 9 Windows Azure Storage
Distributed Systems Tutorial 9 Windows Azure Storage written by Alex Libov Based on SOSP 2011 presentation winter semester, 2011-2012 Windows Azure Storage (WAS) A scalable cloud storage system In production
More informationCS5412: OTHER DATA CENTER SERVICES
1 CS5412: OTHER DATA CENTER SERVICES Lecture V Ken Birman Tier two and Inner Tiers 2 If tier one faces the user and constructs responses, what lives in tier two? Caching services are very common (many
More informationCS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.
Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message
More informationIntegrity in Distributed Databases
Integrity in Distributed Databases Andreas Farella Free University of Bozen-Bolzano Table of Contents 1 Introduction................................................... 3 2 Different aspects of integrity.....................................
More informationIntroduction to NoSQL Databases
Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction
More informationLast time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson
Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to
More informationRecap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1
Recap CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo Paxos is a consensus algorithm. Proposers? Acceptors? Learners? A proposer
More informationStructured Big Data 1: Google Bigtable & HBase Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC
Structured Big Data 1: Google Bigtable & HBase Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC Lecture material is mostly home-grown, partly taken with permission and courtesy from Professor Shih-Wei Liao
More informationCSE 124: Networked Services Lecture-16
Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationAGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus
AGREEMENT PROTOCOLS Paxos -a family of protocols for solving consensus OUTLINE History of the Paxos algorithm Paxos Algorithm Family Implementation in existing systems References HISTORY OF THE PAXOS ALGORITHM
More informationPaxos and Distributed Transactions
Paxos and Distributed Transactions INF 5040 autumn 2016 lecturer: Roman Vitenberg Paxos what is it? The most commonly used consensus algorithm A fundamental building block for data centers Distributed
More informationApp Engine: Datastore Introduction
App Engine: Datastore Introduction Part 1 Another very useful course: https://www.udacity.com/course/developing-scalableapps-in-java--ud859 1 Topics cover in this lesson What is Datastore? Datastore and
More informationNoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu
NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related
More informationDistributed Database Case Study on Google s Big Tables
Distributed Database Case Study on Google s Big Tables Anjali diwakar dwivedi 1, Usha sadanand patil 2 and Vinayak D.Shinde 3 1,2,3 Computer Engineering, Shree l.r.tiwari college of engineering Abstract-
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationCIT 668: System Architecture. Amazon Web Services
CIT 668: System Architecture Amazon Web Services Topics 1. AWS Global Infrastructure 2. Foundation Services 1. Compute 2. Storage 3. Database 4. Network 3. AWS Economics Amazon Services Architecture Regions
More informationToday s Papers. Google Chubby. Distributed Consensus. EECS 262a Advanced Topics in Computer Systems Lecture 24. Paxos/Megastore November 24 th, 2014
EECS 262a Advanced Topics in Computer Systems Lecture 24 Paxos/Megastore November 24 th, 2014 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley Today s Papers
More informationCS5412: DIVING IN: INSIDE THE DATA CENTER
1 CS5412: DIVING IN: INSIDE THE DATA CENTER Lecture V Ken Birman Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationThe Google File System
The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file
More informationData Storage in the Cloud
Data Storage in the Cloud KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA Outline 1. Distributed File Systems 1.1. Google File System (GFS) 2. NoSQL Data Store 2.1. BigTable Elgazzar - CISC 886 - Fall
More informationProseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita
Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationCloud Computing Platform as a Service
HES-SO Master of Science in Engineering Cloud Computing Platform as a Service Academic year 2015/16 Platform as a Service Professional operation of an IT infrastructure Traditional deployment Server Storage
More informationData Management in the Cloud. Tim Kraska
Data Management in the Cloud Tim Kraska Montag, 22. Februar 2010 Systems Group/ETH Zurich MILK? [Anology from IM 2/09 / Daniel Abadi] 22.02.2010 Systems Group/ETH Zurich 2 Do you want milk? Buy a cow High
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationGFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures
GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationCSE 124: Networked Services Fall 2009 Lecture-19
CSE 124: Networked Services Fall 2009 Lecture-19 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 Some of these slides are adapted from various sources/individuals including but
More informationDistributed Systems 11. Consensus. Paul Krzyzanowski
Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one
More informationThe Google File System. Alexandru Costan
1 The Google File System Alexandru Costan Actions on Big Data 2 Storage Analysis Acquisition Handling the data stream Data structured unstructured semi-structured Results Transactions Outline File systems
More informationVlad Vinogradsky
Vlad Vinogradsky vladvino@microsoft.com http://twitter.com/vladvino Commercially available cloud platform offering Billing starts on 02/01/2010 A set of cloud computing services Services can be used together
More informationTAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton
TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem
More information18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.
18-hdfs-gfs.txt Thu Oct 27 10:05:07 2011 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File
More informationCS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Google CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface
More informationCS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.
CS 138: Google CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface
More informationDistributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems Fall 2017 Exam 3 Review Paul Krzyzanowski Rutgers University Fall 2017 December 11, 2017 CS 417 2017 Paul Krzyzanowski 1 Question 1 The core task of the user s map function within a
More informationReplication in Distributed Systems
Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over
More informationMicrosoft Azure Storage
Microsoft Azure Storage Enabling the Digital Enterprise MICROSOFT AZURE STORAGE (BLOB/TABLE/QUEUE) July 2015 The goal of this white paper is to explore Microsoft Azure Storage, understand how it works
More informationW b b 2.0. = = Data Ex E pl p o l s o io i n
Hypertable Doug Judd Zvents, Inc. Background Web 2.0 = Data Explosion Web 2.0 Mt. Web 2.0 Traditional Tools Don t Scale Well Designed for a single machine Typical scaling solutions ad-hoc manual/static
More informationDistributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 15. Distributed File Systems Paul Krzyzanowski Rutgers University Fall 2017 1 Google Chubby ( Apache Zookeeper) 2 Chubby Distributed lock service + simple fault-tolerant file system
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationSaranya Sriram Developer Evangelist Microsoft Corporation India
Saranya Sriram Developer Evangelist Microsoft Corporation India Microsoft s Cloud ReCap Azure Services Platform Agenda Data is King Motivation? Why data outside your premise? Microsoft s Data Storage offerings
More informationLessons Learned While Building Infrastructure Software at Google
Lessons Learned While Building Infrastructure Software at Google Jeff Dean jeff@google.com Google Circa 1997 (google.stanford.edu) Corkboards (1999) Google Data Center (2000) Google Data Center (2000)
More information