Storing data in databases

Similar documents
SQLite vs. MongoDB for Big Data

What is database? Types and Examples

relational Key-value Graph Object Document

Unit 10 Databases. Computer Concepts Unit Contents. 10 Operational and Analytical Databases. 10 Section A: Database Basics

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Database Systems CSE 414

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases

Relational databases

Database Solution in Cloud Computing

Non-Relational Databases. Pelle Jakovits

NoSQL DBs and MongoDB DATA SCIENCE BOOTCAMP

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Webinar Series TMIP VISION

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

Hands-on immersion on Big Data tools

Big Data Analytics. Rasoul Karimi

Understanding NoSQL Database Implementations

COMP9321 Web Application Engineering

A NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015

Stages of Data Processing

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Distributed Databases: SQL vs NoSQL

IEMS 5722 Mobile Network Programming and Distributed Server Architecture

A Review to the Approach for Transformation of Data from MySQL to NoSQL

Relational Database Features

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

Survey of the Azure Data Landscape. Ike Ellis

NOSQL Databases and Neo4j

CSE 344 JULY 9 TH NOSQL

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018

What is a Database? Peter Wood

In-Memory Data processing using Redis Database

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

Column-Family Databases Cassandra and HBase

Chapter 24 NOSQL Databases and Big Data Storage Systems

Introduction to NoSQL Databases

Presented by Sunnie S Chung CIS 612

Comparing SQL and NOSQL databases

/ Cloud Computing. Recitation 8 October 18, 2016

A Sandbox environment for Hadoop

International Journal of Informative & Futuristic Research ISSN:

YeSQL: Battling the NoSQL Hype Cycle with Postgres

Distributed Non-Relational Databases. Pelle Jakovits

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Spotfire Advanced Data Services. Lunch & Learn Tuesday, 21 November 2017

Databases and Big Data Today. CS634 Class 22

A Non-Relational Storage Analysis

Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless

Database Technologies. Madalina CROITORU IUT Montpellier

GiftWorks Import Guide Page 2

Module - 17 Lecture - 23 SQL and NoSQL systems. (Refer Slide Time: 00:04)

NoSQL + SQL = MySQL Get the Best of Both Worlds

SESSION TITLE GOES HERE Second Cosmos for Line the Goes Business Here Intelligence Professional

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Intro To Big Data. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2017

CISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Document databases Graph databases Metadata Column databases

Bruce Moore Fall 99 Internship September 23, 1999 Supervised by Dr. John P.

Introduction to Graph Databases

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless

Localization and value creation

Polyglot Persistence in Today s Data World

Motivation Overview of NoSQL space Comparing technologies used Getting hands dirty tutorial section

Access Basics: When and How

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Accessing other data fdw, dblink, pglogical, plproxy,...

CIB Session 12th NoSQL Databases Structures

Study of NoSQL Database Along With Security Comparison

COMP9321 Web Application Engineering

Avancier Methods (AM) From logical model to physical database

A Review Paper on Big data & Hadoop

VS2010 C# Programming - DB intro 1

Announcements. Using Electronics in Class. Review. Staff Instructor: Alvin Cheung Office hour on Wednesdays, 1-2pm. Class Overview

Challenges for Data Driven Systems

Shine a Light on Dark Data with Vertica Flex Tables

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

NoSQL Databases Analysis

Data Analysis Using Sql And Excel 2nd Edition

Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013

Connecting to Webex for eorganic Webinar Attendees: Instructions and Troubleshooting

(Poor) Example code. Objec+ves. Comparing Rela+onal Databases and Elas+csearch. Review 3/13/17. for(; iter.hasnext();) {... } Elas+csearch MongoDB

COSC 304 Introduction to Database Systems. NoSQL Databases. Dr. Ramon Lawrence University of British Columbia Okanagan

An Adventure in Data Modeling

Understanding the SAP HANA Difference. Amit Satoor, SAP Data Management

Kim Greene - Introduction

Data Formats and APIs

Query Languages for Document Stores

Review - Relational Model Concepts

Session 6: Relational Databases

GOING MOBILE: Setting The Scene for RTOs.

Performance Evaluation of Redis and MongoDB Databases for Handling Semi-structured Data

CISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Graph databases Neo4j syntax and examples Document databases

CSC 355 Database Systems

Advanced Database Project: Document Stores and MongoDB

CONTENT CALENDAR USER GUIDE SOCIAL MEDIA TABLE OF CONTENTS. Introduction pg. 3

How to integrate data into Tableau

Transcription:

Storing data in databases The webinar will begin at 3pm You now have a menu in the top right corner of your screen. The red button with a white arrow allows you to expand and contract the webinar menu, in which you can write questions/comments. We won t have time to answer questions while we are presenting, but will answer them at the end You will be on mute throughout we can t hear you.

Storing data in databases Webinar 25 October 2016 Peter Smyth UK Data Service

Can you hear us?

Can you hear us? If Not: Check your volume, and that your speaker/headset is plugged in. Your invitation also included a phone number, you can call that to listen in. o UK +44 (0) 330 221 9914 o US +1 (914) 614-3429 We are recording this webinar, so you can always listen to it later.

Overview of this webinar Definition of a database Why Excel isn t always good enough Different Database types and availability Relational Databases A bit of history Data organisation Limitations Query examples Document Databases MongoDB Query examples Graph Database demo

Definition of Database A structured set of data held in a computer, especially one that is accessible in various ways. (Oxford University Press) Structured = Ordered? Or Arranged? Nothing about the details of the structuring Accessible = Searchable, able to query the contents to see what is there

Not a database! - Why not?

What about Excel? Worksheets are tabular in nature - very structured You can join sheets together using the VLOOKUP function There is a set of Database type functions (DSUM, DCOUNT etc.) You can write queries to filter the rows

Excel Restrictions Sheets have limit of 1 million rows (2 20 ) VLOOKUP can only return a single column The database functions can only return a single value Setting up queries is quite complex

Why use a desktop database? Size of data Convenience of a desktop system Flexibility in collecting and persisting data Flexibility in querying and analysis

Growing and shrinking data Desktop Application Big Data Environment Sent Tweet Data from Tweet Tweets All tweets from user All tweets from User & Friends All Smart meter data Smart meter by day Smart meter by Month Smart meter data By Month and Geography 1Kb 1Mb 1Gb 10+ Gb

Growing and shrinking data Sent Tweet Desktop Application Desktop Database Tweets Big Data Environment Data from Tweet All tweets from user All tweets from User & Friends All Smart meter data Smart meter by day Smart meter by Month By Month and Geography Smart meter data 1Kb 1Gb 5GB 25 Gb 25+ GB

Types of Databases There are many different types of Databases For the end user there are probably four main types. Relational Databases (MySQL, MS SQL, SQLite, Postgres ) Document databases MongoDB, CouchDB, ) Graph databases (Neo4j, Titan, ) Wide column stores (Cassandra, Hbase,, )

Types of Databases Relational Databases predominate by a long way Data held in tables with defined relationships between the tables Document databases and wide column databases use storage architectures designed to overcome some of the scalability problems of relational databases. Since Big Data sources have become available, these are gaining in popularity Graph Databases are designed to optimise specific type of querying of data where you are more interested in the relationship between different items that the actual attributes of the items, often used with Social networks

Types of Databases The link below provides a table of the different Databases systems available and their relative use. Both Commercial and Free databases systems are included. http://db-engines.com/en/ranking

Types of Databases (Table) Freely available options

The Relational Model Why do we have it? What is it good for? What are the pros and cons? What do we mean by relational?

The Relational Model - History The term "relational database" was first used by E. F. Codd in 1970 in the paper "A Relational Model of Data for Large Shared Data Banks Although not necessarily the primary driver, it should be noted that at the time computer storage was very expensive The Relational model can be very efficient when storing data. Typically data items are stored only once

The Relational Model - History Storage prices fell from about $193K per Gb in 1980 to about $0.03 in 2014 http://www.mkomo.com/cost-per-gigabyte-update

The Relational Model How it works If I wanted to record the details of a house and the people who lived there, I could create a table like this: HouseHold_All HouseHold_Id Address PostCode Person_id FirstName LastName DOB Sex Age No_of _Rooms No_of_Occupan ts Type Construction I would need a single record for each person at that address

The Relational Model How it works And populate it with data, like this HouseHold_Id Address PostCode Person_id FirstName LastName DOB Sex Age No_of _Rooms No_of_Occupa nts Type Construction Some street, Some 1Town AA1 2BB 1Alfie Smith 17/09/1963 M 60 8 5Semi Brick Some street, Some 1Town AA1 2BB 2Jane Smith 05/02/1970 F 60 8 5Semi Brick Some street, Some 1Town AA1 2BB 3John Smith 03/01/2001 M 60 8 5Semi Brick Some street, Some 1Town AA1 2BB 4Jack Smith 10/10/2005 M 60 8 5Semi Brick Some street, Some 1Town AA1 2BB 5 Jenny Smith 07/05/2009 F 60 8 5Semi Brick These records all relate to the same household, but the data about the house itself is repeated for each person in the house

The Relational Model How it works It makes more sense to use multiple tables and split the data between them This eliminates the need to duplicate data The arrows represent relationships between the tables. If I only wanted details about the a person, I wouldn t need to refer to the other tables

The Relational Model How it works All of the Occupant information is kept in a single table. Details of the Property are only recorded once in the three smaller tables

The Relational Model - Advantages Data is only stored once (across multiple tables if necessary) Efficient for well known and structured data Well defined and understood query language (SQL) variants available for all relational databases Schema on Write allows comprehensive data checking before loading making for cleaner data

The Relational Model - Disadvantages The need for multiple tables increases loading times Uses vertical scaling Not really relevant for desktop databases Schema on write cannot deal with unstructured data efficiently, if at all

Document Databases Why do we have it? What is it good for? What are the pros and cons? What is meant by a document?

Document Database A document does not mean a pdf or word document A document is semi-structured data It is structured in that every data item in the document has name associated with it It is semi- in that different documents in the same collection of documents don t have to have the same set of names

JSON Example semi-structured data The most popular format for Semi-structured data is JSON. Most data that can be downloaded from a Web based API will be in JSON format (or at least offer JSON as a choice of format)

JSON Example semi-structured data The following is a simple example of JSON formatted data { Name : Manchester, PostCode : M13 9PL, Established : 1824 } It is split over several lines just to aid reading. Everything between the { and } represents a single record, or document

Document Databases The semi-structured nature means that it is difficult to store the data in tables Not all fields need to be in each document Fields don t need to be in the same order { 'id' : 1234, 'Name' : 'Peter', 'Tel' : 012345678 } { 'Name' : 'John', 'id' : 3523, 'Email' : ['John@abc.com', 'j.smith@xyz.com'],'mob' : 012345678} Even more difficult to create a schema for the data in advance Instead, data is stored as-is and a schema is created when the data is read Schema on read

Document Databases - NoSQL Non-Relational databases like MongoDB typically do not use SQL to query the data. When you install MongoDB you are provided with a Simple Shell interface from which you can query the database. Use of the Shell to query requires a knowledge of Javascript. As an alternative, both Python and R have packages which interface to MongoDB to allow querying of the database using native Python or R like constructs The unstructured nature of the data, adds to the complexity of querying

A Graphics Database Neo4j The default installation of Neo4j provides a simple default Movies database. It also comes with tutorials to help get you started

Summary The size of your data may be enough to make you decide on using a desktop database But it may not be the only consideration o How are you collecting the data over time? o What is the structure of the data? o How do you intend to use the data o Can you clean and structure the data as you collect it? o Do you need to keep all of the raw data just in case?

Questions Peter Smyth Peter.smyth@manchester.ac.uk ukdataservice.ac.uk/help/ Subscribe to the UK Data Service news list at https://www.jiscmail.ac.uk/cgibin/webadmin?a0=ukdataservice Follow us on Twitter https://twitter.com/ukdataservice or Facebook https://www.facebook.com/ukdataservice