Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley
|
|
- Joy Lawson
- 5 years ago
- Views:
Transcription
1 Big Data and NoSQL
2 Sources P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley
3 Very short history of DBMSs The seventies: IMS end of the sixties, built for the Apollo program (today: Version 15) and IDS (then IDMS), hierarchical and network DBMSs, navigational The eighties for twenty years: Relational DBMSs The nineties: client/server computing, three tiers, thin clients
4 Object Oriented Databased In the nineties, Object Oriented databases were proposed to overcome the impedance mismatch They influenced Relational Databases, and disappeared
5 Big Data Mid 2000s, Big Data: Volume: DBMSs do not scale enough for some applications Velocity: Computational speed Development velocity: Variety: DBMS require upfront schema design and data cleaning Schemas conflict with variety
6 Big Data Examples Managing and analysing: Google searches Twitter feeds Facebook posts Amazon sales Connection data for a mobile phone company Location data for a car-black-box company
7 Big Data platforms The google stack: Hardware: each Google Modular Data Center houses Linux servers with AC and disks GFS: distributed and redundant FS MapReduce BigTable, on top of GFS Hadoop open source HDFS, Hadoop MapReduce HBase SQL on Hadoop: Apache Hive, IBM Jaql, Apache Pig, Cloudera Impala
8 Big Data systems: NoSQL systems NoSQL: Giving up something to get something more Giving up: ACID transactions, to gain distribution Upfront schema, to gain Velocity Variety First normal form, to reduce the need for joins Different from NewSQL
9 Types of NoSQL systems Key-value stores (Amazon Dynamo, Riak, Voldemort ) Document databases: XML databases: MarkLogic, exist JSON databases: CouchDB, Membase, Couchbase MongoDB Sparse table databases: Hbase Graph databases (not really about BigData): Neo4j
10 NewSQL NewSQL is a different approach to Velocity, much less disruptive than NoSQL Column databases In memory databases
11 NoSQL
12 Why NoSQL Impedance mismatch The schema problem: Restrictive Heavy to set up Integration databases -> application databases Cluster architecture Google BigTable Amazon Dynamo
13 NoSQL: reasons of success Support cluster architecture (Velocity, Volume) Google BigTable Amazon Dynamo Remove schema restriction (Variety, Velocity) Simple for simple tasks
14 NoSQL A set of ill-defined systems that are not RDMBS Usually do not support SQL Are usually Open Source (not always) Often cluster-oriented (not always), hence no ACID Recent (after 2000) Schema free Oriented toward a single application It is more a movement than a technology
15 Aggregate data models From many simple tables -> to just one collection of aggregated objects (simplified object data model) Aggregate data model is essential in order to work without transactions and without joins
16 Aggregate data models NoSQL data models: Aggregate data models: Key-value Document Column family Graph model
17 Graph model Set of triples <nodeid, property, nodeid> (FlockDB, Neo4J)
18 Aggregate orientation
19 Aggregate data models Key value stores: the database is a collection of <key,value> pairs, where the value is opaque (Dynamo, Riak, Voldemort) Document database: a collection of documents (XML or JSON) that can be searched by content (MarkLogic, MongoDB) Column-family stores: a set of <key, record> pair (BigTable, HBase, Cassandra) Columns are grouped in column families
20 Key-value stores implementation Implementation model: Key-based distribution of the pairs on a huge farm of inexpensive machines Constant time access Constant time parallel execution on all the pairs Flexible fault-tolerance MapReduce execution model Amazon Dynamo, Riak, Voldemort
21 Schemaless databases Schema first vs. schema later Homogeneous vs. non homogeneous
22 Materialized views OLAP applications greatly benefit from materialized views Materialized views can be used to regain the flexibility of the relational model
23 Key-Value distribution Sharding + replication Sharding: splitting data among nodes according to a key Master-slave replication No update conflict Read resilience Master election P2P replication No single point of failure The distributed consistency problem
24 Levels of Consistency Wrt. to write-write conflicts: avoiding to lose an update Read consistency: Fresh data No intermediate data Session consistency Transactional consistency Only write values that are based on currently valid data
25 The CAP Theorem: example Would like: Consistency + Availability + Partition tolerance Store three copies of a value for Availability 1 read 3 writes: Read from any 1 node Before committing an update wait for three writes to be completed 1 write 3 reads: As soon as one write is ok, commit Always read 3 copies and return newest value 2 writes 2 reads: If you read 2, at least one is current Consistency + Availability + Partition tolerance = Impossible
26 The CAP Theorem You cannot have all of: Consistency Availability Partition tolerance A trade-off between consistency and latency Relaxing consistency Two writes in the same cart Relaxing durability
27 Consistency: single operation atomicity The problem: avoiding r/w and w/w conflicts on a single operation Quorum: in a P2P system, an operation is successful if it gets a quorum of confirmations The write quorum: W > N/2 The read quorum: R+W > N
28 Consistency: update consistency The problem: only update a value if nobody else did change it in the meanwhile Optimistic approach: You read the data item with a version stamp Every time you update, you change the version The update operation has the previous-version parameter, and fails if the stamp changed: Compare-And-Set (CAS)
29 Consistency: replication optimism Assume we do not have quorum, and two copies with versionid are updated in parallel: what does happen? When version is a counter When version is a random GUID P2P consistency problem: deciding the temporal relationship between two different versions Local counter or GUID does not help The vector clock: Assume nodes A,B,C, the version stamp is [A:7;B:5;C:9]
30 Parallelism: Map-Reduce Map(m): apply m in parallel to each object, to get a set of <key, value> pairs Shuffle-sort: collect all pairs with the same <key> to the same node, get sets with shape {<k,v1>,,<k,vn>} Reduce(r): apply r to each set {<k,v1>,,<k,vn>} to produce a result
31 Map-Reduce INPUT Map m: d seq(k,v) Shuffle And Sort Reduce r: seq(k,v) k,v OUTPUT Data 1 Data 2 Data 3 Data 4 K3 v11 K2 v12 K3 v13 K4 v21 K1 v22 K3 v23 K4 v31 K5 v32 K4 v33 K3 v41 K2 v42 K3 v43 K1 v22 K2 v42 K2 v12 K3 v11 K3 v43 K3 v23 K3 v13 K3 v41 K4 v33 K4 v31 k4v21 K5 v32 K1 r(v22) K2 r(v42,v12) K3 r(v11,v43, ) K4 r(v33,v31,v21) K5 r(v32)
32 Example: word count Problem: counting the number of occurrences for each word in a big collection of documents Map: takes a couple (k, document), ignores k, returns a pair (w,1) for each word w in document Shuffle&Sort: groups the Map output by w and produces pairs of the form (w, [1,,1]) Reduce: takes a pair (w, [1,,1]), and outputs (w, 1+ +1)
33 Example: word count INPUT Map(m) Shuffle And Sort Reduce(r) OUTPUT NoSQL Parallel NoSQL Velocity NoSQL DBMS Velocity Map Velocity NoSQL Parallel NoSQL NoSQL 1 Parallel 1 NoSQL 1 Velocity 1 NoSQL 1 DBMS 1 Velocity 1 Map 1 Velocity 1 NoSQL 1 Parallel 1 NoSQL 1 DBMS 1 Parallel 1 Parallel 1 NoSQL 1 NoSQL 1 NoSQL 1 NoSQL 1 NoSQL 1 Velocity 1 Velocity 1 Velocity 1 Map 1 DBMS 1 Parallel 2 NoSQL 5 Velocity 3 Map 1
34 Pseudo code Map( _, v ): for each w in v do emit(w, 1) Reduce(k, v): c=0; for x in v do c = c +1; emit(k, c)
35 Exercises Sales(Date,StoreId,ProdId,Amount) How to compute group_by({date},{sum(amount)})? Sales+Stores(StoreId,Region) How to compute join(sales,stores)?
36 Implementing map-reduce: Hadoop Input and output of each phase are stored in a distributed file system that manages the partitioning and the replication Spark approach: when possible, input and output are just kept in main memory The computation is divided among many small tasks A task manager assigns the task and, when a task fails re-executes it
37 Dataflow systems Dataflow systems are similar to map-reduce systems but they implement a wider range of parallel patterns, with vertices that generalize the map and reduce vertices and edges that generalize the key-based communication between map and reduce
38 Key-Value Databases Basically, a persistent hash table Sharding + replication Consistency Single object Riak: for each bucket (data space): Newest write wins / create siblings Setting read / write quorum Query By key Full store scan (not always provided) Uses: session information, user profiles, shopping cart data by userid
39 Document Databases: MongoDB One instance, many databases, many collections JSON documents with _id field Sharding + replication
40 Consistency Master/slave replication Automated failover, server maintenance, disaster recovery, read scaling Master is dynamically re-elected over fail One can specify a write quorum One can specify whether reads can be directed to slaves
41 Querying CouchDB: query via views (virtual or materialized) MongoDB: Selection, projection, aggregation
42 Column-family Stores A column-family (similar to a table in relational databases) is a set of <key,record> pairs If can be vertically divided in keyspaces Records are not necessarily homogeneous
43 Consistency In Cassandra: The DBA fixes the number of replicas for each keyspace the programmer decides the quorum for read and write operations (1, majority, all ) Transactions: Atomicity at the row level Possibility to use external transactional libraries
44 Queries (Cassandra) Row retrieval: GET Customer[ johnsmith00012 ] Field (column) retrieval: GET Customer[ johnsmith00012 ][ age ] After you create an index on age: GET Customer WHERE age = 35 Cassandra supports CQL: Select-project (no join) SQL
45 Graph Databases A graph database stores a graph A graph is, essentially, a database with one ternary table: Edges(NodeId1, NodeId2, EdgeAttributes) You may also have Nodes(NodeId, NodeAttributes) (optional) Example: Neo4J
46 Graph model
47 Consistency Graph databases are usually not sharded and transactional Neo4J supports master-slave replication Data can be sharded at the application level with no database support, which is quite hard
48 Querying: Cypher MATCH (me {name:"giorgio"}) RETURN me
49 Querying: Cypher MATCH (expert) -[:WORKED_WITH]-> (neodb:database {name:"neo4j"}) RETURN neodb, expert
50 Querying: Cypher MATCH (me {name:"giorgio"}) MATCH (expert) -[:WORKED_WITH]-> (neodb:database {name:"neo4j"}) MATCH path = shortestpath( (me)-[:friend*..5]-(expert) ) RETURN neodb, expert, path
51 Querying: Cypher MATCH pattern matches WHERE filtering conditions RETURN what to return ORDER BY properties to order by SKIP nodes to skip from the top LIMIT limit results
52 NoSQL systems advantages Support for cluster architecture: Volume and Velocity Aggregate model, schemaless architecture: Velocity of development for simple applications Schemaless architecture: Supports Variability Flexible consistency: Supports Velocity
53 NoSQL systems problems Transactional support is limited to a single aggregate Flexible consistency is hard to manage No SQL, no optimization: Complex data needs to be pre-aggregated different queries require the construction of different re-aggregations of the same data
54 Big Data architectural trends The data lake Polyglot systems
55 The Data Lake Standard Data Warehouse architecture: Long phase of data design to decide the schema Complex phase of data cleaning to get high quality data Ready to play The Data Lake: Just collect all data you have in the Data Lake Run ML algorithms on the Lake
56 Polyglot systems Combine transactional RDBMSs, DSSs and NoSQL systems Advantages: pay the price of schemas and transactions only where they are needed Problems: maintenance and security
57 SQL on top of MapReduce Serdar Yegulalp compiled this list in 2014 (ask Google): Apache Hive: The original SQL-on-Hadoop solution Stinger: Hortonworks development of Apache Hive Apache Drill: An open source implementation of Google's Dremel (aka BigQuery), to access multiple types of data stores Spark SQL: Apache's Spark project is for real-time, in-memory, parallelized processing of Hadoop data. Apache Phoenix: Its developers call it a "SQL skin for HBase". Cloudera Impala: another implementation of Dremel/Apache Drill for Hadoop. HAWQ for Pivotal HD: Pivotal version for its own Hadoop distribution Presto: Built by Facebook's engineers, reminiscent of Apache Oracle Big Data SQL IBM BigSQL
58 Conclusion There is no winner : DBMSs, DSSs, parallel and distributed DBs, NoSQL systems: they are all here to stay There is a terrible trend of moving everything to NoSQL and Machine Learning due to hype: great occasion for consultants, and for waste The only way of making a good choice is having a real understanding of: The business problem to be solved The current state of the technology
CISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationNon-Relational Databases. Pelle Jakovits
Non-Relational Databases Pelle Jakovits 25 October 2017 Outline Background Relational model Database scaling The NoSQL Movement CAP Theorem Non-relational data models Key-value Document-oriented Column
More informationCIB Session 12th NoSQL Databases Structures
CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is
More informationNOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY
NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.
More informationIntroduction to NoSQL Databases
Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction
More informationOverview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL
* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL
More informationNoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,
More informationNoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014
NoSQL Databases Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 10, 2014 Amir H. Payberah (SICS) NoSQL Databases April 10, 2014 1 / 67 Database and Database Management System
More informationUnderstanding NoSQL Database Implementations
Understanding NoSQL Database Implementations Sadalage and Fowler, Chapters 7 11 Class 07: Understanding NoSQL Database Implementations 1 Foreword NoSQL is a broad and diverse collection of technologies.
More informationDistributed Non-Relational Databases. Pelle Jakovits
Distributed Non-Relational Databases Pelle Jakovits Tartu, 7 December 2018 Outline Relational model NoSQL Movement Non-relational data models Key-value Document-oriented Column family Graph Non-relational
More informationIntroduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos
Instituto Politécnico de Tomar Introduction to Big Data NoSQL Databases Ricardo Campos Mestrado EI-IC Análise e Processamento de Grandes Volumes de Dados Tomar, Portugal, 2016 Part of the slides used in
More informationWhy NoSQL? Why Riak?
Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense? Riak Voldemort HBase MongoDB Neo4j Cassandra CouchDB Membase Redis (and the list goes on...) 2 What went wrong with
More informationCOSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan
COSC 416 NoSQL Databases NoSQL Databases Overview Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Databases Brought Back to Life!!! Image copyright: www.dragoart.com Image
More informationIntroduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases
Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases Key-Value Document Column Family Graph John Edgar 2 Relational databases are the prevalent solution
More informationGoal of the presentation is to give an introduction of NoSQL databases, why they are there.
1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.
More informationRelational databases
COSC 6397 Big Data Analytics NoSQL databases Edgar Gabriel Spring 2017 Relational databases Long lasting industry standard to store data persistently Key points concurrency control, transactions, standard
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationPresented by Sunnie S Chung CIS 612
By Yasin N. Silva, Arizona State University Presented by Sunnie S Chung CIS 612 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See http://creativecommons.org/licenses/by-nc-sa/4.0/
More informationChapter 24 NOSQL Databases and Big Data Storage Systems
Chapter 24 NOSQL Databases and Big Data Storage Systems - Large amounts of data such as social media, Web links, user profiles, marketing and sales, posts and tweets, road maps, spatial data, email - NOSQL
More informationJargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems
Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons
More informationRule 14 Use Databases Appropriately
Rule 14 Use Databases Appropriately Rule 14: What, When, How, and Why What: Use relational databases when you need ACID properties to maintain relationships between your data. For other data storage needs
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline
More informationDEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!
DEMYSTIFYING BIG DATA WITH RIAK USE CASES Martin Schneider Basho Technologies! Agenda Defining Big Data in Regards to Riak A Series of Trade-Offs Use Cases Q & A About Basho & Riak Basho Technologies is
More informationNoSQL Databases. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 4/22/15
NoSQL Databases CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/22/15 Agenda Check-in NoSQL Databases Aggregate databases Key-value, document, and column family Graph databases Related
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious
More informationCompSci 516 Database Systems
CompSci 516 Database Systems Lecture 20 NoSQL and Column Store Instructor: Sudeepa Roy Duke CS, Fall 2018 CompSci 516: Database Systems 1 Reading Material NOSQL: Scalable SQL and NoSQL Data Stores Rick
More informationModern Database Concepts
Modern Database Concepts Basic Principles Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz NoSQL Overview Main objective: to implement a distributed state Different objects stored on different
More informationCMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22
More informationCassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent
Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these
More informationNOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe
NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks
More informationColumn-Family Databases Cassandra and HBase
Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed
More informationIntroduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University
Introduction to Computer Science William Hsu Department of Computer Science and Engineering National Taiwan Ocean University Chapter 9: Database Systems supplementary - nosql You can have data without
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationDatabase Evolution. DB NoSQL Linked Open Data. L. Vigliano
Database Evolution DB NoSQL Linked Open Data Requirements and features Large volumes of data..increasing No regular data structure to manage Relatively homogeneous elements among them (no correlation between
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationCISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL
CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationGetting to know. by Michelle Darling August 2013
Getting to know by Michelle Darling mdarlingcmt@gmail.com August 2013 Agenda: What is Cassandra? Installation, CQL3 Data Modelling Summary Only 15 min to cover these, so please hold questions til the end,
More informationDesign Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013
Design Patterns for Large- Scale Data Management Robert Hodges OSCON 2013 The Start-Up Dilemma 1. You are releasing Online Storefront V 1.0 2. It could be a complete bust 3. But it could be *really* big
More informationData Management for Big Data Part 1
2018-04-09 2 Outline Today Part 1 Data Management for Big Data Part 1 Valentina Ivanova IDA, Linköping University RDBMS NoSQL NewSQL DBMS OLAP vs OLTP (ACID) NoSQL Concepts and Techniques Horizontal scalability
More informationThe NoSQL Landscape. Frank Weigel VP, Field Technical Opera;ons
The NoSQL Landscape Frank Weigel VP, Field Technical Opera;ons What we ll talk about Why RDBMS are not enough? What are the different NoSQL taxonomies? Which NoSQL is right for me? Macro Trends Driving
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationDATABASE DESIGN II - 1DL400
DATABASE DESIGN II - 1DL400 Fall 2016 A second course in database systems http://www.it.uu.se/research/group/udbl/kurser/dbii_ht16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationThe NoSQL Ecosystem. Adam Marcus MIT CSAIL
The NoSQL Ecosystem Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in The Architecture of Open Source Applications
More informationCS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database
CS-580K/480K dvanced Topics in Cloud Computing NoSQL Database 1 1 Where are we? Cloud latforms 2 VM1 VM2 VM3 3 Operating System 4 1 2 3 Operating System 4 1 2 Virtualization Layer 3 Operating System 4
More informationFacebook, 14 Fast projection index, 84 First database revolution data handling code, 6 DBMS, 6 network and hierarchical model, 6 7
Index A Aerospike, 91, 217 Aerospike query language (AQL), 218 AJAX. See Asynchronous JavaScript and XML (AJAX) Alternative persistence model, 92 Amazon ACID RDBMS, 46 Dynamo, 14, 45 46 DynamoDB, 219 hashing,
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationFinal Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm
Final Exam Logistics CS 133: Databases Fall 2018 Lec 25 12/06 NoSQL Final exam take-home Available: Friday December 14 th, 4:00pm in Olin Due: Monday December 17 th, 5:15pm Same resources as midterm Except
More informationAdvances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis
Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMS, with the aim of achieving
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationPROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.
PROFESSIONAL NoSQL Shashank Tiwari WILEY John Wiley & Sons, Inc. Examining CONTENTS INTRODUCTION xvil CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More informationIntroduction to NoSQL by William McKnight
Introduction to NoSQL by William McKnight All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationDistributed Databases: SQL vs NoSQL
Distributed Databases: SQL vs NoSQL Seda Unal, Yuchen Zheng April 23, 2017 1 Introduction Distributed databases have become increasingly popular in the era of big data because of their advantages over
More informationDatabases and Big Data Today. CS634 Class 22
Databases and Big Data Today CS634 Class 22 Current types of Databases SQL using relational tables: still very important! NoSQL, i.e., not using relational tables: term NoSQL popular since about 2007.
More informationNOSQL DATABASE SYSTEMS: DATA MODELING. Big Data Technologies: NoSQL DBMS (Data Modeling) - SoSe
NOSQL DATABASE SYSTEMS: DATA MODELING Big Data Technologies: NoSQL DBMS (Data Modeling) - SoSe 2017 24 Data Modeling Object-relational impedance mismatch Example: orders, order lines, customers (with different
More informationSpotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014
Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationRDBMS - Features. Lecture 5
RDBMS - Features Lecture 5 RDBMS Features Data stored in rows (attributes as columns) and tables Relationships represented by data Data Manipulation Language Data Definition Language Transactions Abstraction
More informationIntroduction to Graph Databases
Introduction to Graph Databases David Montag @dmontag #neo4j 1 Agenda NOSQL overview Graph Database 101 A look at Neo4j The red pill 2 Why you should listen Forrester says: The market for graph databases
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 1. Introduction Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ What is Big Data? buzzword?
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More informationSQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
SQL, NoSQL, MongoDB CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL Databases Really better called Relational Databases Key construct is the Relation, a.k.a. the table Rows represent records Columns
More informationMapReduce and Friends
MapReduce and Friends Craig C. Douglas University of Wyoming with thanks to Mookwon Seo Why was it invented? MapReduce is a mergesort for large distributed memory computers. It was the basis for a web
More informationCA485 Ray Walshe NoSQL
NoSQL BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases
Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur Schmid, Daniyal Kazempour,
More information10. Replication. Motivation
10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure
More informationA BigData Tour HDFS, Ceph and MapReduce
A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!
More informationMIS Database Systems.
MIS 335 - Database Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query in a Database
More informationBIS Database Management Systems.
BIS 512 - Database Management Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query
More informationIntro To Big Data. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2017
Intro To Big Data John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2017 Big data is a broad term for data sets so large or complex that traditional data processing applications
More informationHands-on immersion on Big Data tools
Hands-on immersion on Big Data tools NoSQL Databases Donato Summa THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Summary : Definition Main features NoSQL DBs classification
More informationCassandra- A Distributed Database
Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional
More informationHadoop Development Introduction
Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand
More informationWebinar Series TMIP VISION
Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing
More informationAdvanced Data Management Technologies
ADMT 2017/18 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2017/18 Unit 15
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationNoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu
NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related
More informationA Study of NoSQL Database
A Study of NoSQL Database International Journal of Engineering Research & Technology (IJERT) Biswajeet Sethi 1, Samaresh Mishra 2, Prasant ku. Patnaik 3 1,2,3 School of Computer Engineering, KIIT University
More informationEventual Consistency 1
Eventual Consistency 1 Readings Werner Vogels ACM Queue paper http://queue.acm.org/detail.cfm?id=1466448 Dynamo paper http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Apache Cassandra
More informationA NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015
A NoSQL Introduction for Relational Database Developers Andrew Karcher Las Vegas SQL Saturday September 12th, 2015 About Me http://www.andrewkarcher.com Twitter: @akarcher LinkedIn, Twitter Email: akarcher@gmail.com
More informationModern Database Concepts
Modern Database Concepts Introduction to the world of Big Data Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz What is Big Data? buzzword? bubble? gold rush? revolution? Big data is like teenage
More informationThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More information1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions
Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop
More information6.830 Lecture Spark 11/15/2017
6.830 Lecture 19 -- Spark 11/15/2017 Recap / finish dynamo Sloppy Quorum (healthy N) Dynamo authors don't think quorums are sufficient, for 2 reasons: - Decreased durability (want to write all data at
More informationCSE 344 JULY 9 TH NOSQL
CSE 344 JULY 9 TH NOSQL ADMINISTRATIVE MINUTIAE HW3 due Wednesday tests released actual_time should have 0s not NULLs upload new data file or use UPDATE to change 0 ~> NULL Extra OOs on Mondays 5-7pm in
More informationDatabases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )
Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2016 Rise of Web and cluster-based computing NoSQL Movement Relationships vs. Aggregates Key-value store XML or JSON
More informationHow do we build TiDB. a Distributed, Consistent, Scalable, SQL Database
How do we build TiDB a Distributed, Consistent, Scalable, SQL Database About me LiuQi ( 刘奇 ) JD / WandouLabs / PingCAP Co-founder / CEO of PingCAP Open-source hacker / Infrastructure software engineer
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationStudy of NoSQL Database Along With Security Comparison
Study of NoSQL Database Along With Security Comparison Ankita A. Mall [1], Jwalant B. Baria [2] [1] Student, Computer Engineering Department, Government Engineering College, Modasa, Gujarat, India ank.fetr@gmail.com
More informationAdvances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis
Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMSs, with the aim of achieving
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationAN introduction to nosql databases
AN introduction to nosql databases Terry McCann @SQLshark Purpose of this presentation? It is important for a data scientist / data engineer to have the right tool for the right job. We will look at an
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationCmprssd Intrduction To
Cmprssd Intrduction To Hadoop, SQL-on-Hadoop, NoSQL Arseny.Chernov@Dell.com Singapore University of Technology & Design 2016-11-09 @arsenyspb Thank You For Inviting! My special kind regards to: Professor
More informationCOSC 304 Introduction to Database Systems. NoSQL Databases. Dr. Ramon Lawrence University of British Columbia Okanagan
COSC 304 Introduction to Database Systems NoSQL Databases Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Relational Databases Relational databases are the dominant form
More informationShen PingCAP 2017
Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL
More information