Vineet Gupta GM Software Engineering Directi

Size: px
Start display at page:

Download "Vineet Gupta GM Software Engineering Directi"

Transcription

1 Intelligent People. Uncommon Ideas. Vineet Gupta GM Software Engineering Directi Licensed under Creative Commons Attribution Sharealike Noncommercial

2 Characteristics App Tier Scaling Replication Partitioning Consistency Normalization Caching Data Engine Types

3 Offline Processing (Batching / Queuing) Distributed Processing Map Reduce Non-blocking IO Fault Detection, Tolerance and Recovery

4 Characteristics App Tier Scaling Replication Partitioning Consistency Normalization Caching Data Engine Types

5 22M+ users Dozens of DB servers Dozens of Web servers Six specialized graph database servers to run recommendations engine Source:

6 1 TB / Day 100 M blogs indexed / day 10 B objects indexed / day 0.5 B photos and videos Data doubles in 6 months Users double in 6 months Source:

7 2 PB Raw Storage 470 M photos, 4-5 sizes each 400 k photos added / day 35 M photos in Squid cache (total) 2 M photos in Squid RAM 38k reqs / sec to Memcached 4 B queries / day Source:

8 Virtualized database spans 600 production instances residing in 100+ server clusters distributed over 8 datacenters 2 PB of data 26 B SQL queries / day 1 B page views / day 3 B API calls / month 15,000 App servers Source:

9 450,000 low cost commodity servers in 2006 Indexed 8 B web-pages in GFS clusters (1 cluster = 1,000 5,000 machines) Read / write thruput = 40 GB / sec across a cluster Map-Reduce 100k jobs / day 20 PB of data processed / day 10k MapReduce programs Source:

10 Data Size ~ PB Data Growth ~ TB / day No of servers 10s to 10,000 No of datacenters 1 to 10 Queries B+ / day Specialized needs more / other than RDBMS

11 Characteristics App Tier Scaling Replication Partitioning Consistency Normalization Caching Data Engine Types

12 CPU CPU CPU RAM RAM RAM App Server DB Server Host

13 Sunfire E20k 36x 1.8GHz processors $450,000 - $2,500,000 PowerEdge SC1435 Dualcore 1.8 GHz processor Around $1,500

14 Increasing the hardware resources on a host Pros Simple to implement Fast turnaround time Cons Finite limit Hardware does not scale linearly (diminishing returns for each incremental unit) Requires downtime Increases Downtime Impact Incremental costs increase exponentially

15 App Server DB Server Host Host

16 Split services on separate nodes Each node performs different tasks Pros Increases per application Availability Task-based specialization, optimization and tuning possible Reduces context switching Simple to implement for out of band processes No changes to App required Flexibility increases Cons Sub-optimal resource utilization May not increase overall availability Finite Scalability

17 Web Server Load Balancer Web Server DB Server Web Server

18 Add more nodes for the same service Identical, doing the same task Load Balancing Hardware balancers are faster Software balancers are more customizable

19 Web Server User 1 User 2 Load Balancer Web Server DB Server Web Server

20 Web Server User 1 User 2 Load Balancer Web Server DB Server Asymmetrical load distribution Downtime Web Server

21 Web Server User 1 User 2 Load Balancer Web Server Session Store SPOF Reads and Writes generate network + disk IO Web Server

22 User 1 User 2 Load Balancer Web Server Web Server Web Server

23 Pros No SPOF Easier to setup Fast Reads Cons n x Writes Increase in network IO with increase in nodes Stale data (rare)

24 Web Server User 1 User 2 Load Balancer Web Server DB Server Web Server

25 No Sessions Stuff state in a cookie and sign it! Cookie is sent with every request / response Super Slim Sessions Keep small amount of frequently used data in cookie Pull rest from DB (or central session store)

26 Bad Sticky sessions Good Clustered sessions for small number of nodes and / or small write volume Central sessions for large number of nodes or large write volume Great No Sessions!

27 HTTP Accelerators / Reverse Proxy Static content caching, redirect to lighter HTTP Async NIO on user-side, Keep-alive connection pool CDN Get closer to your user Akamai, Limelight IP Anycasting Async NIO

28 App-Layer Add more nodes and load balance! Avoid Sticky Sessions Avoid Sessions!! Data Store Tricky! Very Tricky!!!

29 Characteristics App Tier Scaling Replication Partitioning Consistency Normalization Caching Data Engine Types

30 App Layer T1, T2, T3, T4

31 App Layer T1, T2, T3, T4 T1, T2, T3, T4 T1, T2, T3, T4 T1, T2, T3, T4 T1, T2, T3, T4 Each node has its own copy of data Shared Nothing Cluster

32 Read : Write = 4:1 Scale reads at cost of writes! Duplicate Data each node has its own copy Master Slave Writes sent to one node, cascaded to others Multi-Master Writes can be sent to multiple nodes Can lead to deadlocks Requires conflict management

33 App Layer Master Slave Slave Slave Slave n x Writes Async vs. Sync SPOF Async - Critical Reads from Master!

34 App Layer Master Master Slave Slave Slave n x Writes Async vs. Sync No SPOF Conflicts!

35 Asynchronous Guaranteed, but out-of-band replication from Master to Slave Master updates its own db and returns a response to client Replication from Master to Slave takes place asynchronously Faster response to a client Slave data is marginally behind the Master Requires modification to App to send critical reads and writes to master, and load balance all other reads Synchronous Guaranteed, in-band replication from Master to Slave Master updates its own db, and confirms all slaves have updated their db before returning a response to client Slower response to a client Slaves have the same data as the Master at all times Requires modification to App to send writes to master and load balance all reads

36 Replication at RDBMS level Support may exists in RDBMS or through 3rd party tool Faster and more reliable App must send writes to Master, reads to any db and critical reads to Master Replication at Driver / DAO level Driver / DAO layer ensures writes are performed on all connected DBs Reads are load balanced Critical reads are sent to a Master In most cases RDBMS agnostic Slower and in some cases less reliable

37 Per Server: Read Write Read Write Read Write 4R, 1W 2R, 1W 1R, 1W Read Read Read Read Write Write Write Write

38 Characteristics App Tier Scaling Replication Partitioning Consistency Normalization Caching Data Engine Types

39 Vertical Partitioning Divide data on tables / columns Scale to as many boxes as there are tables or columns Finite Horizontal Partitioning Divide data on rows Scale to as many boxes as there are rows! Limitless scaling

40 App Layer T1, T2, T3, T4, T5 Note: A node here typically represents a shared nothing cluster

41 App Layer T1 T2 T3 T4 T5 Facebook - User table, posts table can be on separate nodes Joins need to be done in code (Why have them?)

42 App Layer First million rows T1 T2 T3 T4 T5 Second million rows T1 T2 T3 T4 T5 Third million rows T1 T2 T3 T4 T5

43 Value Based Split on timestamp of posts Split on first alphabet of user name Hash Based Use a hash function to determine cluster Lookup Map First Come First Serve Round Robin

44 Characteristics App Tier Scaling Replication Partitioning Consistency Normalization Caching Data Engine Types

45 Consistency Availability Partition Tolerance Source:

46 Transactions make you feel alone No one else manipulates the data when you are Transactional serializability The behavior is as if a serial order exists Ta Tb Tc Te Td Tg Tf Ti Th Tj Tk Ti Doesn t Know About These Transactions and They Don t Know About Ti Tl Tn Tm To These Transactions Precede Ti Transaction Serializability Source: These Transactions Follow Ti Slide 46

47 Transactions live in the now inside services Time marches forward Transactions commit Advancing time Transactions see the committed transactions A service s biz-logic lives in the now Service Each Transaction Only Sees a Simple Advancing of Time with a Clear Set of Preceding Transactions Source: Slide 47

48 Messages contain unlocked data Assume no shared transactions Unlocked data may change Unlocking it allows change Messages are not from the now They are from the past There is no simultaneity at a distance! Similar to speed of light Knowledge travels at speed of light By the time you see a distant object it may have changed! By the time you see a message, the data may have changed! Services, transactions, and locks bound simultaneity! Inside a transaction, things appear simultaneous (to others) Simultaneity only inside a transaction! Simultaneity only inside a service! Source: Slide 48

49 All data from distant stars is from the past 10 light years away; 10 year old knowledge The sun may have blown up 5 minutes ago We won t know for 3 minutes more All data seen from a distant service is from the past By the time you see it, it has been unlocked and may change Each service has its own perspective Inside data is now ; outside data is past My inside is not your inside; my outside is not your outside This is like going from Newtonian to Einstonian physics Newton s time marched forward uniformly Instant knowledge Classic distributed computing: many systems look like one RPC, 2-phase commit, remote method calls In Einstein s world, everything is relative to one s perspective Today: No attempt to blur the boundary Source: Slide 49

50 Can t have the same data at many locations Unless it is a snapshot Changing distributed data needs versions Creates a snapshot Data Owning Service Wednesday s Tuesday s Wednesday s Wednesday s Tuesday s Wednesday s Listening Partner Service-8 Tuesday s Monday s Monday s Listening Partner Service-1 Listening Partner Service-5 Source: Listening Partner Service-7

51 Given what I know here and now, make a decision Remember the versions of all the data used to make this decision Record the decision as being predicated on these versions Other copies of the object may make divergent decisions Try to sort out conflicts within the family If necessary, programmatically apologize Very rarely, whine and fuss for human help Subjective Consistency Given the information I have at hand, make a decision and act on it! Remember the information at hand! Ambassadors Had Authority Back before radio, it could be months between communication with the king. Ambassadors would make treaties and much more... They had binding authority. The mess was sorted out later! Source:

52 Eventually, all the copies of the object share their changes I ll show you mine if you show me yours! Now, apply subjective consistency: Given the information I have at hand, make a decision and act on it! Everyone has the same information, everyone comes to the same conclusion about the decisions to take Eventual Consistency Given the same knowledge, produce the same result! Everyone sharing their knowledge leads to the same result... This is NOT magic; it is a design requirement! Idempotence, commutativity, and associativity of the operations (decisions made) are all implied by this requirement Source:

53 Characteristics App Tier Scaling Replication Partitioning Consistency Normalization Caching Data Engine Types

54 Normalization s Goal Is Eliminating Update Anomalies Can Be Changed Without Funny Behavior Each Data Item Lives in One Place Emp # Emp Name De-normalization is OK if you aren t going to update! Emp Phone Mgr # Mgr Name Mgr Phone 47 Joe Sam Sally Harry Pete Sam Mary Betty Classic problem with de-normalization Can t update Sam s phone # since there are many copies Source:

55 affiliations table affiliation_id description member_count 42 Microsoft 18,656 user table 598 Georgia Tech 23,488 user_affiliations table user_work_history first_ last_ table relati religi hom inter politi user_id onsh nam nam sex affiliation_id ous_ etow este cal_v _id ip_st view user_phone_numbers (foreign_key) e table user_screen_names (foreign n key) table d_in iews company_affil atus s user_id 42 company_na iation_id Atlan job_title (foreign_key) me user_id marr wom John user_id Doe (foreign Male 598 key) ta, (null) (null) (foreign_key) 5 phone_number phone_type screen_name (foreign_key) ied en im_service GA Program Microsoft Home geeknproud@exam Manager AIM Work ple.com Quality i Cell voip4life@example. Assurance Technologies Skype org Engineer

56 6 joins for 1 query! Do you think FB would do this? And how would you do joins with partitioned data? De-normalization removes joins But increases data volume But disk is cheap and getting cheaper And can lead to inconsistent data If you are lazy However this is not really an issue

57 Many Kinds of Computing are Append-Only Lots of observations are made about the world Debits, credits, Purchase-Orders, Customer-Change-Requests, etc As time moves on, more observations are added You can t change the history but you can add new observations Derived Results May Be Calculated Estimate of the current inventory Frequently inaccurate Historic Rollups Are Calculated Monthly bank statements

58 Transaction Logs Are the Truth High-performance & write-only Describe ALL the changes to the data Data-Base the Current Opinion Describes the latest value of the data as perceived by the application The Database Is a Caching of the Transaction Log! Log DB It is the subset of the latest committed values represented in the transaction log Source:

59 Listening Partner Service-1 Listening Partner Service-5 Listening Partner Service-7 Listening Partner Service-8 Tuesday s Wednesday s Wednesday s Wednesday s Monday s Tuesday s Wednesday s Monday s Tuesday s Data Owning Service Listening Partner Service-1 Listening Partner Service-5 Listening Partner Service-7 Listening Partner Service-8 Tuesday s Tuesday s Tuesday s Wednesday s Wednesday s Wednesday s Wednesday s Wednesday s Wednesday s Wednesday s Wednesday s Wednesday s Monday s Monday s Monday s Tuesday s Tuesday s Tuesday s Wednesday s Wednesday s Wednesday s Monday s Monday s Monday s Tuesday s Tuesday s Tuesday s Data Owning Service Data Owning Service Source:

60 Characteristics App Tier Scaling Replication Partitioning Consistency Normalization Caching Data Engine Types

61 Makes scaling easier (cheaper) Core Idea Read data from persistent store into memory Store in a hash-table Read first from cache, if not, load from persistent store

62 App Server Cache

63 App Server Cache

64 App Server Cache

65

66 In-memory Distributed Hash Table Memcached instance manifests as a process (often on the same machine as web-server) Memcached Client maintains a hash table Which item is stored on which instance Memcached Server maintains a hash table Which item is stored in which memory location

67 Characteristics App Tier Scaling Replication Partitioning Consistency Normalization Caching Data Engine Types

68 Amazon - S3, SimpleDb, Dynamo Google - App Engine Datastore, BigTable Microsoft SQL Data Services, Azure Storages Facebook Cassandra LinkedIn - Project Voldemort Ringo, Scalaris, Kai, Dynomite, MemcacheDB, ThruDB, CouchDB, Hbase, Hypertable

69 Basic Concepts No tables - Containers-Entity No schema - each tuple has its own set of properties Amazon SimpleDB strings only Microsoft Azure SQL Data Services Strings, blob, datetime, bool, int, double, etc. No x-container joins as of now Google App Engine Datastore Strings, blob, datetime, bool, int, double, etc.

70 Google BigTable Sparse, Distributed, multi-dimensional sorted map Indexed by row key, column key, timestamp Each value is an un-interpreted array of bytes Amazon Dynamo Data partitioned and replicated using consistent hashing Decentralized replica sync protocol Consistency thru versioning Facebook Cassandra Used for Inbox search Open Source Scalaris Keys stored in lexicographical order Improved Paxos to provide ACID Memory resident, no persistence

71 Real Life Scaling requires trade offs No Silver Bullet Need to learn new things Need to un-learn Balance!

72

73 Intelligent People. Uncommon Ideas. Licensed under Creative Commons Attribution Sharealike Noncommercial

Building a Scalable Architecture for Web Apps - Part I (Lessons Directi)

Building a Scalable Architecture for Web Apps - Part I (Lessons Directi) Intelligent People. Uncommon Ideas. Building a Scalable Architecture for Web Apps - Part I (Lessons Learned @ Directi) By Bhavin Turakhia CEO, Directi (http://www.directi.com http://wiki.directi.com http://careers.directi.com)

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Rule 14 Use Databases Appropriately

Rule 14 Use Databases Appropriately Rule 14 Use Databases Appropriately Rule 14: What, When, How, and Why What: Use relational databases when you need ACID properties to maintain relationships between your data. For other data storage needs

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate

More information

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014 Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify

More information

Scalability of web applications

Scalability of web applications Scalability of web applications CSCI 470: Web Science Keith Vertanen Copyright 2014 Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing

More information

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)

More information

CompSci 516 Database Systems

CompSci 516 Database Systems CompSci 516 Database Systems Lecture 20 NoSQL and Column Store Instructor: Sudeepa Roy Duke CS, Fall 2018 CompSci 516: Database Systems 1 Reading Material NOSQL: Scalable SQL and NoSQL Data Stores Rick

More information

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,

More information

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed

More information

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours

More information

Distributed Data Store

Distributed Data Store Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

Extreme Computing. NoSQL.

Extreme Computing. NoSQL. Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13 Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University

More information

Relational databases

Relational databases COSC 6397 Big Data Analytics NoSQL databases Edgar Gabriel Spring 2017 Relational databases Long lasting industry standard to store data persistently Key points concurrency control, transactions, standard

More information

Introduction to NoSQL Databases

Introduction to NoSQL Databases Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction

More information

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Scaling Up HBase Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.

More information

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these

More information

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Bigtable. Presenter: Yijun Hou, Yixiao Peng Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng

More information

CPS 512 midterm exam #1, 10/7/2016

CPS 512 midterm exam #1, 10/7/2016 CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say

More information

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 1 OBJECTIVES ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 2 WHAT

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

CS November 2018

CS November 2018 Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

BigTable. CSE-291 (Cloud Computing) Fall 2016

BigTable. CSE-291 (Cloud Computing) Fall 2016 BigTable CSE-291 (Cloud Computing) Fall 2016 Data Model Sparse, distributed persistent, multi-dimensional sorted map Indexed by a row key, column key, and timestamp Values are uninterpreted arrays of bytes

More information

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 16: NoSQL and JSon Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5 Today s lecture: JSon The book covers

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 16: NoSQL and JSon CSE 414 - Spring 2016 1 Announcements Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5] Today s lecture:

More information

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff Percona Live! MySQL Conference Santa Clara, April 12th, 2012 v1.3 Intro: Globalizing NDB Proposed Architecture What We Learned

More information

Architekturen für die Cloud

Architekturen für die Cloud Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >

More information

DIVING IN: INSIDE THE DATA CENTER

DIVING IN: INSIDE THE DATA CENTER 1 DIVING IN: INSIDE THE DATA CENTER Anwar Alhenshiri Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs it to

More information

CA485 Ray Walshe NoSQL

CA485 Ray Walshe NoSQL NoSQL BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis Motivation Lots of (semi-)structured data at Google URLs: Contents, crawl metadata, links, anchors, pagerank,

More information

CSE 344 JULY 9 TH NOSQL

CSE 344 JULY 9 TH NOSQL CSE 344 JULY 9 TH NOSQL ADMINISTRATIVE MINUTIAE HW3 due Wednesday tests released actual_time should have 0s not NULLs upload new data file or use UPDATE to change 0 ~> NULL Extra OOs on Mondays 5-7pm in

More information

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.

More information

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON

More information

Replication. Feb 10, 2016 CPSC 416

Replication. Feb 10, 2016 CPSC 416 Replication Feb 10, 2016 CPSC 416 How d we get here? Failures & single systems; fault tolerance techniques added redundancy (ECC memory, RAID, etc.) Conceptually, ECC & RAID both put a master in front

More information

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases Key-Value Document Column Family Graph John Edgar 2 Relational databases are the prevalent solution

More information

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures Lecture 20 -- 11/20/2017 BigTable big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures what does paper say Google

More information

CIT 668: System Architecture. Distributed Databases

CIT 668: System Architecture. Distributed Databases CIT 668: System Architecture Distributed Databases Topics 1. MySQL 2. Concurrency 3. Transactions and ACID 4. Database scaling 5. Replication 6. Partitioning 7. Brewer s CAP Theorem 8. ACID vs. BASE 9.

More information

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of

More information

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Goal of the presentation is to give an introduction of NoSQL databases, why they are there. 1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL

More information

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a world with flat data storage Simple, Centralized, and easy to program Unfortunately, datacenter networks

More information

Distributed KIDS Labs 1

Distributed KIDS Labs 1 Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

Advanced Database Technologies NoSQL: Not only SQL

Advanced Database Technologies NoSQL: Not only SQL Advanced Database Technologies NoSQL: Not only SQL Christian Grün Database & Information Systems Group NoSQL Introduction 30, 40 years history of well-established database technology all in vain? Not at

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

The NoSQL Ecosystem. Adam Marcus MIT CSAIL The NoSQL Ecosystem Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in The Architecture of Open Source Applications

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence. SCALABLE DATABASES From Relational Databases To Polyglot Persistence Sergio Bossa sergio.bossa@gmail.com http://twitter.com/sbtourist About Me Software architect and engineer Gioco Digitale (online gambling

More information

Datacenter replication solution with quasardb

Datacenter replication solution with quasardb Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION

More information

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies! DEMYSTIFYING BIG DATA WITH RIAK USE CASES Martin Schneider Basho Technologies! Agenda Defining Big Data in Regards to Riak A Series of Trade-Offs Use Cases Q & A About Basho & Riak Basho Technologies is

More information

Presented by Sunnie S Chung CIS 612

Presented by Sunnie S Chung CIS 612 By Yasin N. Silva, Arizona State University Presented by Sunnie S Chung CIS 612 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See http://creativecommons.org/licenses/by-nc-sa/4.0/

More information

CS5412: DIVING IN: INSIDE THE DATA CENTER

CS5412: DIVING IN: INSIDE THE DATA CENTER 1 CS5412: DIVING IN: INSIDE THE DATA CENTER Lecture V Ken Birman We ve seen one cloud service 2 Inside a cloud, Dynamo is an example of a service used to make sure that cloud-hosted applications can scale

More information

Time Series Live 2017

Time Series Live 2017 1 Time Series Schemas @Percona Live 2017 Who Am I? Chris Larsen Maintainer and author for OpenTSDB since 2013 Software Engineer @ Yahoo Central Monitoring Team Who I m not: A marketer A sales person 2

More information

PRIMARY-BACKUP REPLICATION

PRIMARY-BACKUP REPLICATION PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons

More information

Column-Family Databases Cassandra and HBase

Column-Family Databases Cassandra and HBase Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed

More information

4. Managing Big Data. Cloud Computing & Big Data MASTER ENGINYERIA INFORMÀTICA FIB/UPC. Fall Jordi Torres, UPC - BSC

4. Managing Big Data. Cloud Computing & Big Data MASTER ENGINYERIA INFORMÀTICA FIB/UPC. Fall Jordi Torres, UPC - BSC 4. Managing Big Data Cloud Computing & Big Data MASTER ENGINYERIA INFORMÀTICA FIB/UPC Fall - 2013 Jordi Torres, UPC - BSC www.jorditorres.eu Slides are only for presentation guide We will discuss+debate

More information

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 15. Distributed File Systems Paul Krzyzanowski Rutgers University Fall 2017 1 Google Chubby ( Apache Zookeeper) 2 Chubby Distributed lock service + simple fault-tolerant file system

More information

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Google Bigtable 2 A distributed storage system for managing structured data that is designed to scale to a very

More information

Exam 2 Review. October 29, Paul Krzyzanowski 1

Exam 2 Review. October 29, Paul Krzyzanowski 1 Exam 2 Review October 29, 2015 2013 Paul Krzyzanowski 1 Question 1 Why did Dropbox add notification servers to their architecture? To avoid the overhead of clients polling the servers periodically to check

More information

Everything You Need to Know About MySQL Group Replication

Everything You Need to Know About MySQL Group Replication Everything You Need to Know About MySQL Group Replication Luís Soares (luis.soares@oracle.com) Principal Software Engineer, MySQL Replication Lead Copyright 2017, Oracle and/or its affiliates. All rights

More information

Ghislain Fourny. Big Data 5. Wide column stores

Ghislain Fourny. Big Data 5. Wide column stores Ghislain Fourny Big Data 5. Wide column stores Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2 Where we are User interfaces

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 26: Parallel Databases and MapReduce CSE 344 - Winter 2013 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Cluster will run in Amazon s cloud (AWS)

More information

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems Distributed Architectures & Microservices CS 475, Spring 2018 Concurrent & Distributed Systems GFS Architecture GFS Summary Limitations: Master is a huge bottleneck Recovery of master is slow Lots of success

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

MySQL HA Solutions. Keeping it simple, kinda! By: Chris Schneider MySQL Architect Ning.com

MySQL HA Solutions. Keeping it simple, kinda! By: Chris Schneider MySQL Architect Ning.com MySQL HA Solutions Keeping it simple, kinda! By: Chris Schneider MySQL Architect Ning.com What we ll cover today High Availability Terms and Concepts Levels of High Availability What technologies are there

More information

App Engine: Datastore Introduction

App Engine: Datastore Introduction App Engine: Datastore Introduction Part 1 Another very useful course: https://www.udacity.com/course/developing-scalableapps-in-java--ud859 1 Topics cover in this lesson What is Datastore? Datastore and

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013 PNUTS: Yahoo! s Hosted Data Serving Platform Reading Review by: Alex Degtiar (adegtiar) 15-799 9/30/2013 What is PNUTS? Yahoo s NoSQL database Motivated by web applications Massively parallel Geographically

More information

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and Engineering wuct@cs.sjtu.edu.cn Schedule (1) Storage system part (first eight weeks) lec1: Introduction on

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious

More information

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX /

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX / MySQL High Availability Michael Messina Senior Managing Consultant, Rolta-AdvizeX mmessina@advizex.com / mike.messina@rolta.com Introduction Michael Messina Senior Managing Consultant Rolta-AdvizeX, Working

More information

LazyBase: Trading freshness and performance in a scalable database

LazyBase: Trading freshness and performance in a scalable database LazyBase: Trading freshness and performance in a scalable database (EuroSys 2012) Jim Cipar, Greg Ganger, *Kimberly Keeton, *Craig A. N. Soules, *Brad Morrey, *Alistair Veitch PARALLEL DATA LABORATORY

More information

Real World Web Scalability. Ask Bjørn Hansen Develooper LLC

Real World Web Scalability. Ask Bjørn Hansen Develooper LLC Real World Web Scalability Ask Bjørn Hansen Develooper LLC Hello. 28 brilliant methods to make your website keep working past $goal requests/transactions/sales per second/hour/day Requiring minimal extra

More information

A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP

A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP A Brief Introduction of TiDB Dongxu (Edward) Huang CTO, PingCAP About me Dongxu (Edward) Huang, Cofounder & CTO of PingCAP PingCAP, based in Beijing, China. Infrastructure software engineer, open source

More information

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment. Distributed Systems 15. Distributed File Systems Google ( Apache Zookeeper) Paul Krzyzanowski Rutgers University Fall 2017 1 2 Distributed lock service + simple fault-tolerant file system Deployment Client

More information

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM

More information

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables COSC 6339 Big Data Analytics NoSQL (II) HBase Edgar Gabriel Fall 2018 HBase Column-Oriented data store Distributed designed to serve large tables Billions of rows and millions of columns Runs on a cluster

More information

CSE-E5430 Scalable Cloud Computing Lecture 9

CSE-E5430 Scalable Cloud Computing Lecture 9 CSE-E5430 Scalable Cloud Computing Lecture 9 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 15.11-2015 1/24 BigTable Described in the paper: Fay

More information

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Performance and Forgiveness June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Margo Seltzer Architect Outline A consistency primer Techniques and costs of consistency

More information

Flexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco

Flexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco Flexible Network Analytics in the Cloud Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco Introduction Harsh realities of network analytics netbeam Demo

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

MarkLogic Server. Database Replication Guide. MarkLogic 6 September, Copyright 2012 MarkLogic Corporation. All rights reserved.

MarkLogic Server. Database Replication Guide. MarkLogic 6 September, Copyright 2012 MarkLogic Corporation. All rights reserved. Database Replication Guide 1 MarkLogic 6 September, 2012 Last Revised: 6.0-1, September, 2012 Copyright 2012 MarkLogic Corporation. All rights reserved. Database Replication Guide 1.0 Database Replication

More information

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores Nikhil Dasharath Karande 1 Department of CSE, Sanjay Ghodawat Institutes, Atigre nikhilkarande18@gmail.com Abstract- This paper

More information

HBase Solutions at Facebook

HBase Solutions at Facebook HBase Solutions at Facebook Nicolas Spiegelberg Software Engineer, Facebook QCon Hangzhou, October 28 th, 2012 Outline HBase Overview Single Tenant: Messages Selection Criteria Multi-tenant Solutions

More information

Lessons Learned While Building Infrastructure Software at Google

Lessons Learned While Building Infrastructure Software at Google Lessons Learned While Building Infrastructure Software at Google Jeff Dean jeff@google.com Google Circa 1997 (google.stanford.edu) Corkboards (1999) Google Data Center (2000) Google Data Center (2000)

More information

Large-Scale Web Applications

Large-Scale Web Applications Large-Scale Web Applications Mendel Rosenblum Web Application Architecture Web Browser Web Server / Application server Storage System HTTP Internet CS142 Lecture Notes - Intro LAN 2 Large-Scale: Scale-Out

More information

How Eventual is Eventual Consistency?

How Eventual is Eventual Consistency? Probabilistically Bounded Staleness How Eventual is Eventual Consistency? Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica (UC Berkeley) BashoChats 002, 28 February

More information