Ghislain Fourny. Big Data 2. Lessons learnt from the past

Similar documents
Ghislain Fourny. Big Data 2. Lessons learnt from the past

FOR THOSE WHO DO. Lenovo Annual Report

Ghislain Fourny. Information Systems for Engineers 1. Introduction

MKA PLC Controller OVERVIEW KEY BENEFITS KEY FEATURES

Creating An Effective Academic Poster. ~ A Student Petersheim Workshop

TITLE - Size 16 - Bold

BOOTSTRAP AFFIX PLUGIN

Information Retrieval 12. Wrap-Up

Example project Functional Design. Author: Marion de Groot Version

The L A TEX Template for MCM Version v6.2

TITLE. Tips for Producing a Newsletter IN THIS ISSUE

Paper Template for INTERSPEECH 2018

The Next Big Thing Prepared for Meeting C

Brand Guidelines MAY 2016

Timon Hazell, LEED AP Senior BIM Engineer. Galen S. Hoeflinger, AIA BIM Technologist Manager

City of Literature Branding

RHYMES WITH HAPPIER!

Brand identity guidelines

ALWAYS MOVING FORWARD MIDWAY S GRAPHIC IDENTITY STANDARDS MANUAL

The POGIL Project Publication Guidelines

TITLE SUBTITLE Issue # Title Subtitle. Issue Date. How to Use This Template. by [Article Author] Article Title. Page # Article Title.

A Road To Better User Experience. The lonely journey every front-end developer must walk.

An output routine for an illustrated book

Ghislain Fourny. Big Data 7. Syntax

Colors. F0563A Persimmon. 3A414C Cobalt. 8090A2 Slate Shale. C4CDD6 Alloy Coal. EFF3F5 Silver. EDF3F9 Horizon.

Connected TV Applications for TiVo. Project Jigsaw. Design Draft. 26 Feb 2013

Thomas F. Sturm A Tutorial for Poster Creation with Tcolorbox

Intermediate District 288. Brand Manual. Visual Identity Guide

Thomas F. Sturm A Tutorial for Poster Creation with Tcolorbox

Gestures: ingsa GESTURES

Insights. Send the right message to the right person at the right time.

VISUAL. Standards Guide

American Political Science Review (APSR) Submission Template ANONYMISED AUTHOR(S) Anonymised Institution(s) Word Count: 658

BRAND IDENTITY GUIDELINE

[Main Submission Title] (Font: IBM Plex Sans Bold, 36 point)

BRAND GUIDELINES All rights reserved.

COLORS COLOR USAGE LOGOS LOCK UPS PHOTOS ELEMENTS ASSETS POWERPOINT ENVIRONMENTAL COLLATERAL PROMO ITEMS TABLE OF CONTENTS

cosmos a tech startup

BBN ANG 183 Typography Lecture 5A: Breaking text

TUSCALOOSA CITY SCHOOLS Graphic Standards and Logo Use Guide

Brand identity design. Professional logo design + Branding guidelines + Stationery Designed by JAVIER

OCTOBER 16 NEWSLETTER. Lake Mayfield Campground OR-LOW GOOD TIMES

Personal brand identity desigend by JAVIER

The everyhook package

VISUAL IDENTITY STARTER KIT FOR ENSURING OUR COMMUNICATIONS ARE COHESIVE, CONSISTENT AND ENGAGING 23 OCTOBER 2008

src0-dan/mobile.html <!DOCTYPE html> Dan Armendariz Computer Science 76 Building Mobile Applications Harvard Extension School

Thesis GWU Example Dissertation. by Shankar Kulumani

Wandle Valley Branding Guidelines 1

WRAS WIAPS BRAND GUIDELINES 2015

I D E N T I TY STA N DA R D S M A N UA L Rev 10.13

Teach Yourself Microsoft Publisher Topic 2: Text Boxes

BRAND GUIDELINES VAN S AIRCRAFT, INC. VERSION V1.1

Compassion. Action. Change.

CITIZEN SCIENCE DATA FACTORY

This is the Title of the Thesis

Pablo- Alejandro Quiñones. User Experience Portfolio

IDENTITY STANDARDS LIVINGSTONE COLLEGE DR. JIMMY R. JENKINS, SR. PRESIDENT

Prototyping Robotic Manipulators For SPHERES

Making the New Notes. Christoph Noack OpenOffice.org User Experience Max Odendahl OpenOffice.org Development Christian Jansen Sun Microsystems

CHI LAT E X Ext. Abstracts Template

RPM FOUNDATION BRANDING GUIDELINES AND GRAPHIC STANDARDS

Version 1.4 March 15, Notes Bayer- Kogenate 2010 WFH Microsoft Surface Project (HKOG-39563) Information Architecture Wireframes

Project Title. A Project Report Submitted in partial fulfillment of the degree of. Master of Computer Applications

AMERICA'S CAR MUSEUM BRANDING GUIDELINES AND GRAPHIC STANDARDS

DISTRIBUTED MEMORY COMPUTING IN ECONOMICS USING MPI

lipsum Access to 150 paragraphs of Lorem Ipsum dummy text a

THE ESPRESSO BOOK MACHINE PUBLISH INSTANTLY AT THE MSU LIBRARIES

Visual Identity Standards

CORPORATE IDENTITY MANUAL

Brand Guide. Last Revised February 9, :38 PM

logo graphic will go here

MBCA Section Newsletter Required Content Guidelines

Creating Websites without Code. Jesse Clark, Webmaster University of Northern Colorado

CLASP Website Redesign Client Deliverables Spring 2007

Saturday January 6, pm

Ghislain Fourny. Information Systems for Engineers Fall Outlook

The pdfreview package

Overly Companies (OSA, BRICO)

Visual identity guideline. BrandBook BLOOMINGFELD. Brandbook 2016.

My tags Ornare sociosqu, magna, nunc, erat duis, elit malesuada, arcu, quam ut. > View all. Recommended content

Transforming IT-speak:

Viewport, custom CSS, fonts

DFSA - Web Site Revamp

Getting started with Managana creating for web and mobile devices

RML Example 48: Paragraph flow controls

file:///users/nma/desktop/chris_mac/chris_school/kcc_nmawebsite/_technology/sitebuild/htdocs/gargiulo/data/johndoe/spring/art128...

Identity Guidelines Version_1

Chaparral Sports Day. Basketball Ashley Guerrero(captain), Carrera, Rasuly, Hamilton Alba, Razel Alba, Bannister, Phillips, Richardson.

Whitepaper. Call to Action

AnyCorp. AnyCorp. Yours FREE with <benefit here>! Call < > $<000,000.00> *** $0 ***

B R A N D I DENTI T Y GUIDELINES 2018 REV. 3 LAST UPDATED / DEC 2017

BBN ANG 183 Typography Lecture 5A: Breaking text

Formatting Theses and Papers using Microsoft Word

Abstract. Author summary. Introduction

NATURAL BUILDING TECHNOLOGIES Document: Feedback Sheet Revision: A Date: 13/07/16 Queries:

Style guide. March 2017 CC BY 4.0 The Tor Project

The colophon Package, v1.1

Ground PV Structure. 6 panels horizontal 7 panels vertical. Installation Guide

Customer Journey EIV and emsfaa. January 2018

A High Capacity Html Steganography Method

Transcription:

Ghislain Fourny Big Data 2. Lessons learnt from the past

Mr. Databases: Edgar Codd Wikipedia

Data Independence (Edgar Codd) Logical data model Lorem Ipsum Dolor sit amet Physical storage Consectetur Adipiscing Elit. In Imperdiet Ipsum ante

Data Shapes Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vel erat nec dui aliquet vulputate sed quis nulla. Donec eget ultricies magna, eu dignissim elit. Nullam sed urna nec nisl rhoncus ullamcorper placerat et enim. Integer varius ornare libero quis consequat. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean eu efficitur orci. Aenean ac posuere tellus. Ut id commodo turpis. Praesent nec libero metus. Praesent at turpis placerat, congue ipsum eget, scelerisque justo. Ut volutpat, massa ac lacinia cursus, nisl dui volutpat arcu, quis interdum sapien turpis in tellus. Suspendisse potenti. Vestibulum pharetra justo massa, ac venenatis mi condimentum nec. Proin viverra tortor non orci suscipit rutrum. Phasellus sit amet euismod diam. Nullam convallis nunc sit amet diam suscipit dapibus. Integer porta hendrerit nunc. Quisque pharetra congue porta. Suspendisse vestibulum sed mi in euismod. Etiam a purus suscipit, accumsan nibh vel, posuere ipsum. Nulla nec tempor nibh, id venenatis lectus. Duis lobortis id urna eget tincidunt.

Data Shapes Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vel erat nec dui aliquet vulputate sed quis nulla. Donec eget ultricies magna, eu dignissim elit. Nullam sed urna nec nisl rhoncus ullamcorper placerat et enim. Integer varius ornare libero quis consequat. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean eu efficitur orci. Aenean ac posuere tellus. Ut id commodo turpis. Praesent nec libero metus. Praesent at turpis placerat, congue ipsum eget, scelerisque justo. Ut volutpat, massa ac lacinia cursus, nisl dui volutpat arcu, quis interdum sapien turpis in tellus. Suspendisse potenti. Vestibulum pharetra justo massa, ac venenatis mi condimentum nec. Proin viverra tortor non orci suscipit rutrum. Phasellus sit amet euismod diam. Nullam convallis nunc sit amet diam suscipit dapibus. Integer porta hendrerit nunc. Quisque pharetra congue porta. Suspendisse vestibulum sed mi in euismod. Etiam a purus suscipit, accumsan nibh vel, posuere ipsum. Nulla nec tempor nibh, id venenatis lectus. Duis lobortis id urna eget tincidunt.

Take-away Concepts

Take-away Concepts Table Collection

Take-away Concepts Attribute Column Field Property

Take-away Concepts Primary Key Row ID Name

Take-away Concepts Row Business Object Item Entity Document Record

Relational Algebra

Function

Relation

Relation as a list

Relation as a list, n-way

Relations seen from set theory Given a family of sets A " #$"$% A relation is a subset of their Cartesian product % R ( A ) = A # A, ).# A %

Relation as a list, n-way

Relational table A B C

Relational Algebra x

Selection

Projection

Grouping A B C D

Sorting 1 2 3 4 5

Cartesian Product

Joining A B A B A A A B

Normal Forms

From your Bachelor: Normal Forms Update anomaly Consistency Delete anomaly Insert anomaly

1st Normal Form (tabular) The Key

1st Normal Form: counter-example Legi Name Lecture City State PLZ 32-000-000 Alan Turing Bletchley Park UK MK3 6EB Lecture ID Lecture Name xxx-xxxx-xxx Cryptography 263-3010-00L Big Data 62-000-000 Georg Cantor Pfäffikon SZ 8808 Lecture ID 263-3010-00L Lecture Name Big Data 123-4567-89L Set theory 25-000-000 Felix Bloch Pfäffikon ZH 8330 Lecture ID Lecture Name 123-4567-89L Set theory

1st Normal Form Legi Name Lecture ID Lecture Name City State PLZ 32-000-000 Alan Turing xxx-xxxx-xxx Cryptography Bletchley Park UK MK3 6EB 32-000-000 Alan Turing 263-3010-00L Big Data Bletchley Park UK MK3 6EB 62-000-000 Georg Cantor 263-3010-00L Big Data Pfäffikon SZ 8808 62-000-000 Georg Cantor 123-4567-89L Set theory Pfäffikon SZ 8808 25-000-000 Felix Bloch 123-4567-89L Set theory Pfäffikon ZH 8330

2nd Normal Form (not joined) The Whole Key

2nd Normal Form: Counter-example Legi Name Lecture ID Lecture Name City State PLZ 32-000-000 Alan Turing xxx-xxxx-xxx Cryptography Bletchley Park UK MK3 6EB 32-000-000 Alan Turing 263-3010-00L Big Data Bletchley Park UK MK3 6EB 62-000-000 Georg Cantor 263-3010-00L Big Data Pfäffikon SZ 8808 62-000-000 Georg Cantor 123-4567-89L Set theory Pfäffikon SZ 8808 25-000-000 Felix Bloch 123-4567-89L Set theory Pfäffikon ZH 8330

2nd Normal Form: Example Legi Name City State PLZ 32-000-000 Alan Turing Bletchley Park UK MK3 6EB 32-000-000 Alan Turing Bletchley Park UK MK3 6EB 62-000-000 Georg Cantor Pfäffikon SZ 8808 62-000-000 Georg Cantor Pfäffikon SZ 8808 Legi Lecture ID 32-000-000 xxx-xxxx-xxx 32-000-000 263-3010-00L 62-000-000 263-3010-00L 62-000-000 123-4567-89L 25-000-000 123-4567-89L Lecture ID Lecture Name xxx-xxxx-xxx Cryptography 263-3010-00L Big Data 25-000-000 Felix Bloch Pfäffikon ZH 8330 123-4567-89L Set theory

3rd Normal Form Nothing But The Key

3rd Normal Form: Counter-Example Legi 32-000-000 Name City State PLZ Alan Turing Bletchley Park UK MK3 6EB 32-000-000 Alan Turing Bletchley Park UK MK3 6EB 62-000-000 Georg Cantor Pfäffikon SZ 8808 62-000-000 Georg Cantor Pfäffikon SZ 8808 25-000-000 Felix Bloch Pfäffikon ZH 8330

3rd Normal Form: Example Legi 32-000-000 Name City State Alan Turing Bletchley Park UK City State PLZ Bletchley Park UK MK3 6EB 32-000-000 Alan Turing Bletchley Park UK Bletchley Park UK MK3 6EB 62-000-000 Georg Cantor Pfäffikon SZ Pfäffikon SZ 8808 62-000-000 Georg Cantor Pfäffikon SZ Pfäffikon SZ 8808 25-000-000 Felix Bloch Pfäffikon ZH Pfäffikon ZH 8330

Data Denormalization 3NF 2NF

Data Denormalization 3NF 1NF

Data Denormalization 3NF 0NF

SQL Brush-Up

SQL is a declarative language Logical model "What, not how" Physical execution

SQL is a declarative language Logical model "What, not how" Query Plan

SQL is a declarative language Logical model "What, not how" Parallelism

SQL is a functional language "Expressions that nest"

Getting a table SELECT * FROM customers

Selection SELECT * FROM customers WHERE city = "Zurich"

Projection SELECT name, city FROM customers

Grouping SELECT city, count(name) FROM customers GROUP BY city

Selecting groups SELECT city, count(name) FROM customers GROUP BY city HAVING count(name) > 2

Join SELECT cities.country, customers.name FROM customers JOIN cities ON customers.city = cities.name

Nesting SELECT country, count(name) FROM SELECT cities.country AS country, customers.name AS name FROM customers JOIN cities ON customers.city = cities.name GROUP BY country HAVING count(name) > 10

Some terminology Data Schema DML: Data Manipulation Language (Query, insert, remove rows) DDL: Data Definition Language (Create or table/schema, drop it)

Some terminology Create Read Update Delete

Transactions

The good old times of databases: ACID Atomicity Consistency Isolation Durability

Atomicity Either the entire transaction is applied, or none of it (rollback).

Consistency After a transaction, the database is in a consistent state again.

Isolation A transaction feels like nobody else is writing to the database.

Durability Updates made do not disappear again.

The new era: the CAP theorem Yet another impossibility triangle

The new era: the CAP theorem Consistency Availability Partition tolerance

(Atomic) Consistency All nodes see the same data.

Availability It is possible to query the database at all times.

Partition tolerance The database continues to function even if the network gets partitioned.

Performance

No such thing as "one size fits all" Data shapes matter

Optimize for read vs. write intensive OnLine Transaction Processing OnLine Analytical Processing OLTP OLAP Write-intensive Read-intensive

Indices

Data Scale-Up

Data can have... Lots of rows

Data can have... Lots of columns

Data can have... Lots of nesting

The rest of the lecture: Scaling up Data in the Large Lots of rows Object Storage Lots of rows Distributed File Systems Lots of rows/columns Column storage Lots of rows Massive Parallel Processing Data in the Small Lots of nesting Syntax Lots of nesting Data Models Lots of nesting Querying Lots of nesting Document Stores Data in the Very Small Lots of rows Graph Databases Lots of columns Data Warehousing Lots of columns Business reporting