Introduction to Databases CSE 414. Lecture 2: Data Models

Similar documents
Announcements. Using Electronics in Class. Review. Staff Instructor: Alvin Cheung Office hour on Wednesdays, 1-2pm. Class Overview

CSE 344 JANUARY 5 TH INTRO TO THE RELATIONAL DATABASE

Introduction to Data Management CSE 344. Lecture 2: Data Models

CSE 344 JANUARY 8 TH SQLITE AND JOINS

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

Lecture 2: Introduction to SQL

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

CMPT 354: Database System I. Lecture 2. Relational Model

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Lecture 3: SQL Part II

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Introduction to Database Systems CSE 444

DS Introduction to SQL Part 2 Multi-table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

INF 212 ANALYSIS OF PROG. LANGS SQL AND SPREADSHEETS. Instructors: Crista Lopes Copyright Instructors.

Introduction to Data Management CSE 414

The Relational Data Model

MTAT Introduction to Databases

Before we talk about Big Data. Lets talk about not-so-big data

Where Are We? Next Few Lectures. Integrity Constraints Motivation. Constraints in E/R Diagrams. Keys in E/R Diagrams

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

SQL. Assit.Prof Dr. Anantakul Intarapadung

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless

Announcements. Multi-column Keys. Multi-column Keys (3) Multi-column Keys. Multi-column Keys (2) Introduction to Data Management CSE 414

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

Lecture 3 Additional Slides. CSE 344, Winter 2014 Sudeepa Roy

Announcements. Multi-column Keys. Multi-column Keys. Multi-column Keys (3) Multi-column Keys (2) Introduction to Data Management CSE 414

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Database Systems CSE 303. Lecture 02

Database Systems CSE 303. Lecture 02

Introduction to Data Management CSE 344

JSON - Overview JSon Terminology

CSE 544 Principles of Database Management Systems

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

Database Systems CSE 414

Announcements. JSon Data Structures. JSon Syntax. JSon Semantics: a Tree! JSon Primitive Datatypes. Introduction to Database Systems CSE 414

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2006 Lecture 3 - Relational Model

Lecture 16. The Relational Model

CS143: Relational Model

CSE 544 Principles of Database Management Systems

CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA

Informatics 1: Data & Analysis

COMP 430 Intro. to Database Systems

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

CS 564 Midterm Review

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

user specifies what is wanted, not how to find it

Database Systems CSE 414

Overview of Data Management

COMP 430 Intro. to Database Systems

Keys, SQL, and Views CMPSCI 645

Introduction to Database Systems. The Relational Data Model

Introduction to Database Systems. The Relational Data Model. Werner Nutt

The Relational Data Model

SQL Simple Queries. Chapter 3.1 V3.01. Napier University

Relational Databases

Insertions, Deletions, and Updates

ACS-3902 Fall Ron McFadyen 3D21 Slides are based on chapter 5 (7 th edition) (chapter 3 in 6 th edition)

SQLite, Firefox, and our small IMDB movie database. CS3200 Database design (sp18 s2) Version 1/17/2018

Chapter 4. Basic SQL. SQL Data Definition and Data Types. Basic SQL. SQL language SQL. Terminology: CREATE statement

Introduction to Relational Databases

Chapter 1: Introduction

Announcements. Database Design. Database Design. Database Design Process. Entity / Relationship Diagrams. Introduction to Data Management CSE 344

SQL. Structured Query Language

Introduction to Database Systems CSE 414

SQL Functionality SQL. Creating Relation Schemas. Creating Relation Schemas

Database Fundamentals Chapter 1

CSE 544 Data Models. Lecture #3. CSE544 - Spring,

Introduction to Data Management CSE 344

Lecture 4. Lecture 4: The E/R Model

The Entity-Relationship Model (ER Model) - Part 2

CMPT 354: Database System I. Lecture 3. SQL Basics

Part I: Structured Data

Entity/Relationship Modelling

Part I: Structured Data

Lectures 2&3: Introduction to SQL

Relational Model. CS 377: Database Systems

Overview of Data Management

Introduction to Database Systems

COSC344 Database Theory and Applications. Lecture 6 SQL Data Manipulation Language (1)

Database Design Theory and Normalization. CS 377: Database Systems

Lecture 5. Lecture 5: The E/R Model

Database Technology Introduction. Heiko Paulheim

The Relational Model

Information Systems Engineering. SQL Structured Query Language DDL Data Definition (sub)language

Ø Last Thursday: String manipulation & Regular Expressions Ø guest lecture from the amazing Sam Lau Ø reviewed in section and in future labs & HWs

ECE 650 Systems Programming & Engineering. Spring 2018

L12: ER modeling 5. CS3200 Database design (sp18 s2) 2/22/2018

CSC 261/461 Database Systems Lecture 13. Fall 2017

Introduction to SQL. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

Chapter 5. Relational Model Concepts 9/4/2012. Chapter Outline. The Relational Data Model and Relational Database Constraints

CSE 544 Principles of Database Management Systems. Lecture 4: Data Models a Never-Ending Story

Introduction to Database Systems CSE 414. Lecture 19: E/R Diagrams

Transcription:

Introduction to Databases CSE 414 Lecture 2: Data Models CSE 414 - Autumn 2018 1

Class Overview Unit 1: Intro Unit 2: Relational Data Models and Query Languages Data models, SQL, Relational Algebra, Datalog Unit 3: Non-relational data Unit 4: RDMBS internals and query optimization Unit 5: Parallel query processing Unit 6: DBMS usability, conceptual design Unit 7: Transactions CSE 414 - Autumn 2018 2

Review What is a database? A collection of files storing related data What is a DBMS? An application program that allows us to manage efficiently the collection of data files CSE 414 - Autumn 2018 3

Data Models Recall our example: want to design a database of books: author, title, publisher, pub date, price, etc How should we describe this data? Data model = mathematical formalism (or conceptual way) for describing the data CSE 414 - Autumn 2018 4

Data Models Relational Data represented as relations Semi-structured (JSon) Data represented as trees Key-value pairs Used by NoSQL systems Graph Object-oriented Unit 2 Unit 3 CSE 414 - Autumn 2018 5

Example: storing FB friends Peter Person1 Person2 is_friend Mary John Or Peter John 1 John Mary 0 Mary Phil 1 Phil Peter 1 Phil As a graph As a relation We will learn the tradeoffs of different data models later this quarter CSE 414 - Autumn 2018 6

3 Elements of Data Models Instance The actual data Schema Describe what data is being stored Query language How to retrieve and manipulate data CSE 414 - Autumn 2018 7

Turing Awards in Data Management Charles Bachman, 1973 IDS and CODASYL Ted Codd, 1981 Relational model Jim Gray, 1998 Transaction processing Michael Stonebraker, 2014 INGRES and Postgres CSE 414 - Autumn 2018 8

Relational Model Data is a collection of relations / tables: columns / attributes / fields cname country no_employees for_profit rows / tuples / records GizmoWorks USA 20000 True Canon Japan 50000 True Hitachi Japan 30000 True HappyCam Canada 500 False mathematically, relation is a set of tuples each tuple appears 0 or 1 times in the table order of the rows is unspecified CSE 414 - Autumn 2018 9

The Relational Data Model Degree (arity) of a relation = #attributes Each attribute has a type. Examples types: Strings: CHAR(20), VARCHAR(50), TEXT Numbers: INT, SMALLINT, FLOAT MONEY, DATETIME, Few more that are vendor specific Statically and strictly enforced CSE 414 - Autumn 2018 10

Keys Key = one (or multiple) attributes that uniquely identify a record CSE 414 - Autumn 2018 11

Keys Key = one (or multiple) attributes that uniquely identify a record Key cname country no_employees for_profit GizmoWorks USA 20000 True Canon Japan 50000 True Hitachi Japan 30000 True HappyCam Canada 500 False CSE 414 - Autumn 2018 12

Keys Key = one (or multiple) attributes that uniquely identify a record Key Not a key cname country no_employees for_profit GizmoWorks USA 20000 True Canon Japan 50000 True Hitachi Japan 30000 True HappyCam Canada 500 False CSE 414 - Autumn 2018 13

Keys Key = one (or multiple) attributes that uniquely identify a record Key Not a key Is this a key? cname country no_employees for_profit GizmoWorks USA 20000 True Canon Japan 50000 True Hitachi Japan 30000 True HappyCam Canada 500 False CSE 414 - Autumn 2018 14

Keys Key = one (or multiple) attributes that uniquely identify a record Key Not a key Is this a key? No: future updates to the database may create duplicate no_employees cname country no_employees for_profit GizmoWorks USA 20000 True Canon Japan 50000 True Hitachi Japan 30000 True HappyCam Canada 500 False CSE 414 - Autumn 2018 15

Multi-attribute Key Key = fname,lname (what does this mean?) fname lname Income Department Alice Smith 20000 Testing Alice Thompson 50000 Testing Bob Thompson 30000 SW Carol Smith 50000 Testing CSE 414 - Autumn 2018 16

Multiple Keys Key Another key SSN fname lname Income Department 111-22-3333 Alice Smith 20000 Testing 222-33-4444 Alice Thompson 50000 Testing 333-44-5555 Bob Thompson 30000 SW 444-55-6666 Carol Smith 50000 Testing We can choose one key and designate it as primary key E.g.: primary key = SSN CSE 414 - Autumn 2018 17

Foreign Key Company(cname, country, no_employees, for_profit) Country(name, population) Company cname country no_employees for_profit Canon Japan 50000 Y Hitachi Japan 30000 Y Country Foreign key to Country.name name USA Japan population 320M 127M CSE 414 - Autumn 2018 18

Keys: Summary Key = columns that uniquely identify tuple Usually we underline A relation can have many keys, but only one can be chosen as primary key Foreign key: Attribute(s) whose value is a key of a record in some other relation Foreign keys are sometimes called semantic pointer CSE 414 - Autumn 2018 19

SQL Query Language Structured Query Language Developed by IBM in the 70s Most widely used language to query relational data Other relational query languages Datalog, relational algebra CSE 414 - Autumn 2018 20

Our First DBMS SQL Lite Will switch to SQL Server later in the quarter CSE 414 - Autumn 2018 21

Demo 1 CSE 414 - Autumn 2018 22

Discussion Tables are NOT ordered they are sets or multisets (bags) Tables are FLAT No nested attributes Tables DO NOT prescribe how they are implemented / stored on disk This is called physical data independence CSE 414 - Autumn 2018 23

Table Implementation How would you implement this? cname country no_employees for_profit GizmoWorks USA 20000 True Canon Japan 50000 True Hitachi Japan 30000 True HappyCam Canada 500 False CSE 414 - Autumn 2018 24

Table Implementation How would you implement this? cname country no_employees for_profit GizmoWorks USA 20000 True Canon Japan 50000 True Hitachi Japan 30000 True HappyCam Canada 500 False Row major: as an array of objects GizmoWorks USA 20000 True Canon Japan 50000 True Hitachi Japan 30000 True HappyCam Canada 500 False CSE 414 - Autumn 2018 25

Table Implementation How would you implement this? cname country no_employees for_profit GizmoWorks USA 20000 True Canon Japan 50000 True Hitachi Japan 30000 True HappyCam Canada 500 False Column major: as one array per attribute GizmoWorks Canon Hitachi HappyCam USA Japan Japan Canada 20000 50000 30000 500 CSE 414 - Autumn 2018 True True True False 26

Table Implementation How would you implement this? cname country no_employees for_profit GizmoWorks USA 20000 True Canon Japan 50000 True Hitachi Japan 30000 True HappyCam Canada 500 False Physical data independence The logical definition of the data remains unchanged, even when we make changes to the actual implementation CSE 414 - Autumn 2018 27

First Normal Form cname country no_employees for_profit Canon Japan 50000 Y Hitachi Japan 30000 Y All relations must be flat: we say that the relation is in first normal form CSE 414 - Autumn 2018 28

First Normal Form cname country no_employees for_profit Canon Japan 50000 Y Hitachi Japan 30000 Y All relations must be flat: we say that the relation is in first normal form E.g. we want to add products manufactured by each company: CSE 414 - Autumn 2018 29

First Normal Form cname country no_employees for_profit Canon Japan 50000 Y Hitachi Japan 30000 Y All relations must be flat: we say that the relation is in first normal form E.g., we want to add products manufactured by each company: cname country no_employees for_profit products Canon Japan 50000 Y pname price category SingleTouch 149.99 Photography Gadget 200 Toy Hitachi Japan 30000 Y pname price category CSE 414 - Autumn 2018 30 AC 300 Appliance

First Normal Form cname country no_employees for_profit Canon Japan 50000 Y Hitachi Japan 30000 Y All relations must be flat: we say that the relation is in first normal form E.g., we want to add products manufactured by each company: cname country no_employees for_profit products Non-1NF! Canon Japan 50000 Y pname price category SingleTouch 149.99 Photography Gadget 200 Toy Hitachi Japan 30000 Y pname price category CSE 414 - Autumn 2018 31 AC 300 Appliance

First Normal Form Company cname country no_employees for_profit Canon Japan 50000 Y Hitachi Japan 30000 Y Products Now it s in 1NF pname price category manufacturer SingleTouch 149.99 Photography Canon AC 300 Appliance Hitachi Gadget 200 Toy Canon CSE 414 - Autumn 2018 32

Demo 1 (cont d) CSE 414 - Autumn 2018 33

Data Models: Summary Schema + Instance + Query language Relational model: Database = collection of tables Each table is flat: first normal form Key: may consists of multiple attributes Foreign key: semantic pointer Physical data independence CSE 414 - Autumn 2018 34