Database Applications (15-415)

Similar documents
Database Applications (15-415)

Database Applications (15-415)

Database Applications (15-415)

Introduction to Database Design

Introduction to Database Design

Data Modeling. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

MIS Database Systems Entity-Relationship Model.

Database Management Systems. Chapter 1

Database Applications (15-415)

The Entity-Relationship Model. Steps in Database Design

Introduction to Database Design. Dr. Kanda Runapongsa Dept of Computer Engineering Khon Kaen University

Database Design. ER Model. Overview. Introduction to Database Design. UVic C SC 370. Database design can be divided in six major steps:

CIS 330: Web-driven Web Applications. Lecture 2: Introduction to ER Modeling

Database Management Systems. Syllabus. Instructor: Vinnie Costa

Overview. Introduction to Database Design. ER Model. Database Design

CSE 530A. ER Model. Washington University Fall 2013

The Entity-Relationship Model. Overview of Database Design

Overview of db design Requirement analysis Data to be stored Applications to be built Operations (most frequent) subject to performance requirement

The Entity-Relationship Model

Database Systems ( 資料庫系統 )

High Level Database Models

The Relational Model. Chapter 3

Relational Model. Topics. Relational Model. Why Study the Relational Model? Linda Wu (CMPT )

Introduction to Database Management Systems

The Entity-Relationship Model

COMP Instructor: Dimitris Papadias WWW page:

The Relational Model. Chapter 3. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Database Management Systems. Chapter 3 Part 2

The Relational Model 2. Week 3

The Entity-Relationship (ER) Model

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Introduction to Database Systems. Chapter 1. Instructor: . Database Management Systems, R. Ramakrishnan and J. Gehrke 1

CS/INFO 330 Entity-Relationship Modeling. Announcements. Goals of This Lecture. Mirek Riedewald

The Entity-Relationship Model. Overview of Database Design. ER Model Basics. (Ramakrishnan&Gehrke, Chapter 2)

Database Systems. Course Administration

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

The Entity-Relationship Model

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

The Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model

Contents. Database. Information Policy. C03. Entity Relationship Model WKU-IP-C03 Database / Entity Relationship Model

Database Management Systems MIT Introduction By S. Sabraz Nawaz

Relational Databases BORROWED WITH MINOR ADAPTATION FROM PROF. CHRISTOS FALOUTSOS, CMU /615

Introduction to Database Design

The Relational Model. Roadmap. Relational Database: Definitions. Why Study the Relational Model? Relational database: a set of relations

Introduction and Overview

The Entity-Relationship (ER) Model 2

From ER to Relational Model. Book Chapter 3 (part 2 )

Introduction and Overview

Database Management Systems. Chapter 2 Part 2

OVERVIEW OF DATABASE DEVELOPMENT

CIS 330: Applied Database Systems

LAB 3 Notes. Codd proposed the relational model in 70 Main advantage of Relational Model : Simple representation (relationstables(row,

The Relational Model

ER Model Overview. The Entity-Relationship Model. Database Design Process. ER Model Basics

Database Systems. Lecture2:E-R model. Juan Huo( 霍娟 )

VARDHAMAN COLLEGE OF ENGINEERING Shamshabad , Hyderabad B.Tech. CSE IV Semester (VCE - R11) T P C 3+1* -- 4 (A1511) DATABASE MANAGEMENT SYSTEMS

Introduction to Data Management. Lecture #3 (Conceptual DB Design) Instructor: Chen Li

Introduction to Data Management. Lecture #2 Intro II & Data Models I

The Relational Model (ii)

High-Level Database Models (ii)

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

Introduction to Data Management. Lecture #2 (Big Picture, Cont.)

Introduction to Data Management. Lecture #4 (E-R à Relational Design)

Conceptual Design. The Entity-Relationship (ER) Model

Why Study the Relational Model? The Relational Model. Relational Database: Definitions. The SQL Query Language. Relational Query Languages

Database Management Systems Chapter 1 Instructor: Oliver Schulte Database Management Systems 3ed, R. Ramakrishnan and J.

Introduction to Database Systems CS432. CS432/433: Introduction to Database Systems. CS432/433: Introduction to Database Systems

Review The Big Picture

The Relational Model

Conceptual Design. The Entity-Relationship (ER) Model

SYLLABUS ADMIN DATABASE SYSTEMS I WEEK 2 THE ENTITY-RELATIONSHIP MODEL. Assignment #2 changed. A2Q1 moved to A3Q1

BBM371- Data Management. Lecture 1: Course policies, Introduction to DBMS

Database Design CENG 351

DATABASE MANAGEMENT SYSTEMS

Introduction to Data Management. Lecture #1 (Course Trailer )

Databases Model the Real World. The Entity- Relationship Model. Conceptual Design. Steps in Database Design. ER Model Basics. ER Model Basics (Contd.

Page 1. Goals for Today" What is a Database " Key Concept: Structured Data" CS162 Operating Systems and Systems Programming Lecture 13.

CS 146 Database Systems

Introduction to Data Management. Lecture #1 (Course Trailer ) Instructor: Chen Li

Introduction to Data Management. Lecture #1 (Course Trailer )

CSIT5300: Advanced Database Systems

Introduction to Database Systems

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

Introduction to Data Management. Lecture #3 (Conceptual DB Design)

ENTITY-RELATIONSHIP. The database design process can be divided into six steps. The ER model is most relevant to the first three steps:

Introduction to Data Management. Lecture #5 (E-R Relational, Cont.)

CONCEPTUAL DESIGN: ER TO RELATIONAL TO SQL

Lecture 1: Introduction

Database Management System

Introduction to Data Management. Lecture #3 (Conceptual DB Design)

CMPUT 291 File and Database Management Systems

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Database Management Systems MIT Lesson 01 - Introduction By S. Sabraz Nawaz

Introduction to Data Management. Lecture #1 (The Course Trailer )

MIT Database Management Systems Lesson 01: Introduction


Introduction to Data Management. Lecture #3 (E-R Design, Cont d.)

Entity-Relationship Diagrams

Introduction to Data Management. Lecture #6 E-Rà Relational Mapping (Cont.)

Transcription:

Database Applications (15-415) The Entity Relationship Model Lecture 2, January 12, 2016 Mohammad Hammoud

Today Last Session: Course overview and a brief introduction on databases and database systems Today s Session: Introduction on databases and database systems (Continue) Main steps involved in designing databases Constructs of the entity relationship (ER) model Integrity constrains that can be expressed in the ER model Announcements: The first Problem Solving Assignment (PS1) is now posted on the course webpage. It is due on Jan 21 st by midnight Thursday, Jan 14 th is the first recitation A case study on the ER model will be solved together

Outline A Primer on Databases Database Design ER Model: Constructs and Constraints

A Motivating Scenario Qatar Foundation (QF) has a large collection of data (say 500GB) on employees, students, universities, research centers, etc., This data is accessed Performance concurrently(concurrency by several people Control) Queries on data must be Performance answered (Response quickly Time) Changes made to the data Correctness by different (Consistency) users must be applied consistently Access to certain parts of data Correctness (e.g., salaries) (Security) must be restricted This data should survive Correctness system (Durability crashes/failures and Atomicity)

Managing Data using File Systems What about managing QF data using local file systems? Files of fixed-length and variable-length records as well as formats Main memory vs. disk Computer systems with 32-bit addressing vs. 64-bit addressing schemes Special programs (e.g., C++ and Java programs) for answering user questions Special measures to maintain atomicity Special measures to maintain consistency of data Special measures to maintain data isolation Special measures to offer software and hardware fault-tolerance Special measures to enforce security policies in which different users are granted different permissions to access diverse subsets of data This becomes tedious and inconvenient, especially at large-scale, with evolving/new user queries and higher probability of failures!

Data Base Management Systems A special software is accordingly needed to make the preceding tasks easier This software is known as Data Base Management System (DBMS) DBMSs provide automatic: Data independence Efficient data access Data integrity and security Data administration Concurrent accesses and crash recovery Reduced application development and tuning time

Some Definitions A database is a collection of data which describes one or many real-world enterprises E.g., a university database might contain information about entities like students and courses, and relationships like a student enrollment in a course A DBMS is a software package designed to store and manage databases E.g., DB2, Oracle, MS SQL Server, MySQL and Postgres A database system = (Big) Data + DBMS + Application Programs

Data Models The user of a DBMS is ultimately concerned with some real-world enterprises (e.g., a University) The data to be stored and managed by a DBMS describes various aspects of the enterprises E.g., The data in a university database describes students, faculty and courses entities and the relationships among them A data model is a collection of high-level data description constructs that hide many low-level storage details A widely used data model called the entity-relationship (ER) model allows users to pictorially denote entities and the relationships among them

The Relational Model The relational model of data is one of the most widely used models today The central data description construct in the relational model is the relation A relation is basically a table (or a set) with rows (or records or tuples) and columns (or fields or attributes) Every relation has a schema, which describes the columns of a relation Conditions that records in a relation must satisfy can be specified These are referred to as integrity constraints

The Relational Model: An Example Let us consider the student entity in a university database Students Schema Students(sid: string, name: string, login: string, dob: string, gpa: real) Integrity Constraint: Every student has a unique sid value An attribute, field or column A record, tuple or row sid name login dob gpa 512412 Khaled khaled@qatar.cmu.edu 18-9-1995 3.5 512311 Jones jones@qatar.cmu.edu 1-12-1994 3.2 512111 Maria maria@qatar.cmu.edu 3-8-1995 3.85 An instance of a Students relation

Levels of Abstraction The data in a DBMS is described at three levels of abstraction, the conceptual (or logical), physical and external schemas The conceptual schema describes data in terms of a specific data model (e.g., the relational model of data) The physical schema specifies how data described in the conceptual schema are stored on secondary storage devices View 1 View 2 View 3 Conceptual Schema Physical Schema Disk The external schema (or views) allow data access to be customized at the level of individual users or group of users (views can be 1 or many)

Views A view is conceptually a relation Records in a view are computed as needed and usually not stored in a DBMS Example: University Database Conceptual Schema Physical Schema External Schema (View) Students(sid: string, name: string, login: string, dob: string, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) Relations stored as heap files Index on first column of Students Can be computed from the relations in the conceptual schema (so as to avoid data redundancy and inconsistency). Students can be allowed to find out course enrollments: Course_info(cid: string, enrollment: integer)

Iterating: Data Independence One of the most important benefits of using a DBMS is data independence With data independence, application programs are insulated from how data are structured and stored Data independence entails two properties: Logical data independence: users are shielded from changes in the conceptual schema (e.g., add/drop a column in a table) Physical data independence: users are shielded from changes in the physical schema (e.g., add index or change record order)

Queries in a DBMS The ease with which information can be queried from a database determines its value to users A DBMS provides a specialized language, called the query language, in which queries can be posed The relational model supports powerful query languages Relational calculus: a formal language based on mathematical logic Relational algebra: a formal language based on a collection of operators (e.g., selection and projection) for manipulating relations Structured Query Language (SQL): Builds upon relational calculus and algebra Allows creating, manipulating and querying relational databases Can be embedded within a host language (e.g., Java)

Concurrent Execution and Transactions An important task of a DBMS is to schedule concurrent accesses to data so as to improve performance T1 R(B) W(B) T2 R(A) W(A) R(C) W(C) An atomic sequence of database actions (read/writes) is referred to as transaction When several users access a database concurrently, the DBMS must order their requests carefully to avoid conflicts E.g., A check might be cleared while account balance is being computed! DBMS ensures that conflicts do not arise via using a locking protocol Shared vs. Exclusive locks

Ensuring Atomicity Transactions can be interrupted before running to completion for a variety of reasons (e.g., due to a system crash) DBMS ensures atomicity (all-or-nothing property) even if a crash occurs in the middle of a transaction This is achieved via maintaining a log (i.e., history) of all writes to the database Before a change is made to the database, the corresponding log entry is forced to a safe location (this protocol is called Write-Ahead Log or WAL) After a crash, the effects of partially executed transactions are undone using the log

The Architecture of a Relational DBMS Web Forms Application Front Ends SQL Interface SQL Commands Plan Executer Operator Evaluator Parser Optimizer Query Evaluation Engine Transaction Manager Lock Manager Files and Access Methods Buffer Manager Disk Space Manager Recovery Manager Concurrency Control DBMS Database Index Files Data Files System Catalog

People Who Work With Databases There are five classes of people associated with databases: 1. End users Store and use data in DBMSs Usually not computer professionals 2. Application programmers Develop applications that facilitate the usage of DBMSs for end-users Computer professionals who know how to leverage host languages, query languages and DBMSs altogether 3. Database Administrators (DBAs) Design the conceptual and physical schemas Ensure security and authorization Ensure data availability and recovery from failures Perform database tuning 4. Implementers Build DBMS software for vendors like IBM and Oracle Computer professionals who know how to build DBMS internals 5. Researchers Innovate new ideas which address evolving and new challenges/problems

The Architecture of a Relational DBMS Web Forms Application Front Ends SQL Application Interface End Users (e.g., university staff, travel agents, etc.) Programmers & DBAs SQL Commands Plan Executer Operator Evaluator Parser Optimizer Query Evaluation Engine Transaction Manager Lock Manager Files and Access Methods Implementers Buffer Manager and Researchers Disk Space Manager Recovery Manager Concurrency Control DBMS Database Index Files Data Files System Catalog

We live in a world of data Summary The explosion of data is occurring along the 3Vs dimensions DBMSs are needed for ensuring logical and physical data independence and ACID properties, among others The data in a DBMS is described at three levels of abstraction A DBMS typically has a layered architecture

Summary Studying DBMSs is one of the broadest and most exciting areas in computer science! This course provides an in-depth treatment of DBMSs with an emphasis on how to design, create, refine, use and build DBMSs and real-world enterprise databases Various classes of people who work with databases hold responsible jobs and are well-paid!

Outline A Primer on Databases Database Design ER Model: Constructs and Constraints

Database Design Requirements Analysis Users needs Conceptual Design A high-level description of the data (e.g., using the ER model) Logical Design The conversion of an ER design into a relational database schema Schema Refinement Normalization (i.e., restructuring tables to ensure some desirable properties) Physical Design Building indexes and clustering some tables Security Design Access controls

Outline A Primer on Databases Database Design ER Model: Constructs and Constraints

Entities and Entity Sets Entity: A real-world object distinguishable from other objects in an enterprise (e.g., University, Students and Faculty) Described using a set of attributes Entity set: A collection of similar entities (e.g., all employees) All entities in an entity set have the same set of attributes (until we consider ISA hierarchies, anyway!) Each entity set has a key Each attribute has a domain

Tools and An ER Diagram Entities ( Entity Sets ) Attributes ssn is the primary key, hence, underlined ssn name Employees lot

Relationship and Relationship Sets Relationship: Association among two or more entities (e.g., Mohammad is teaching 15-415) Described using a set of attributes Relationship set: Collection of similar relationships Same entity set could participate in different relationship sets, or in different roles in the same set

More Tools and ER Diagrams N M Relationships ( rel. sets ) and mapping constraints P name ssn lot ssn name Employees lot since Works_In did dname budget Departments subordinate supervisor Employees Reports_To A Binary Relationship A Self-Relationship

Ternary Relationships Suppose that departments have offices at different locations and we want to record the locations at which each employee works Consequently, we must record an association between an employee, a department and a location name since dname ssn lot did budget Employees Works_In Departments address Locations capacity This is referred to as a Ternary Relationship (vs. Self & Binary Relationships)

Key Constraints Consider the Employees and Departments entity sets with a Manages relationship set An employee can work in many departments A department can have many employees Each This department restriction can is have an example at most of one a key manager constraint name since dname ssn lot did budget Employees Manages Departments Key constraints are denoted by thin arrows

Cardinalities Entities can be related to one another as one-to-one, one-tomany and many-to-many This is said to be the cardinality of a given entity in relation to another 1 1 1 N One-to-One One-to-Many Many-to-One Many-to-Many N M

Cardinalities: Examples COUNTRY 1 1 has CAPITAL PERSON 1 N owns CAR STUDENT N takes M SECTION

Cardinalities: Examples COUNTRY has CAPITAL Book s notation: PERSON owns CAR STUDENT takes SECTION

Cardinalities: Examples COUNTRY 1 1 has CAPITAL Book s notation vs 1 to N notation PERSON 1 N owns CAR STUDENT N takes M SECTION

A Working Example Requirements: Students take courses offered by instructors; a course may have multiple sections; one instructor per section How to start? Nouns -> entity sets Verbs -> relationship sets

name... STUDENT ssn issn INSTRUCTOR Primary key = unique identifier underline

... name ssn STUDENT c-id c-name COURSE issn INSTRUCTOR But: sections of a course (with different instructors)?

ssn STUDENT c-id s-id SECTION COURSE issn INSTRUCTOR But: s-id is not unique... (see later)

ssn STUDENT c-id s-id SECTION COURSE issn INSTRUCTOR Q: how to record that students take courses?

ssn s-id STUDENT N takes M SECTION c-id COURSE issn INSTRUCTOR

s-id STUDENT N takes M SECTION N c-id COURSE teaches 1 INSTRUCTOR

STUDENT takes N c-id s-id M SECTION N 1 has COURSE N teaches 1 INSTRUCTOR

Participation Constraints Consider again the Employees and Departments entity sets as well as the Manages relationship set Should every department have a manager? If so, this is a participation constraint Such a constraint entails that every Departments entity must appear in an instance of the Manages relationship The participation of Departments in Manages is said to be total (vs. partial) name since dname ssn lot did budget Employees Manages Departments Total participation is denoted by a thick line Works_In since Total participation + key constraint are denoted by a thick arrow

Total vs. Partial Participations Total, Total COUNTRY has CAPITAL?? PERSON owns CAR?? STUDENT takes SECTION

Total vs. Partial Participations Total, Total COUNTRY has CAPITAL Partial, Total PERSON owns CAR?? STUDENT takes SECTION

Total vs. Partial Participations Total, Total COUNTRY has CAPITAL Partial, Total PERSON owns CAR Partial, Total STUDENT takes SECTION

Weak Entities A weak entity can be identified uniquely only by considering the primary key of another (owner) entity Owner entity set and weak entity set must participate in a oneto-many relationship set (one owner, many weak entities) Weak entity set must have total participation in this identifying relationship set The set of attributes of a weak entity set that uniquely identify a weak entity for a given owner entity is called partial key

Weak Entities: An Example Dependents has no unique key of its own Dependents is a weak entity with partial key pname Policy is an identifying relationship set pname + ssn are the primary key of Dependents Partial keys are underlined using broken lines name ssn lot cost pname age Employees Policy Dependents Weak entities and identifying relationships are drawn using thick lines

ISA (`is a ) Hierarchies Entities in an entity set can sometimes be classified into subclasses (this is kind of similar to OOP languages) If we declare B ISA A, every B entity is also considered to be an A entity ssn name lot Employees is specialized into subclasses hourly_wages hours_worked Employees ISA contractid Hourly_Emps and Contract_Emps are generalized by Employees Hourly_Emps Contract_Emps

Overlap and Covering Constraints Overlap constraints Can an entity belong to both B and C? A Covering constraints Can an A entity belong to neither B nor C? B C

Overlap Constraints: Examples Overlap constraints Can John be in Hourly_Emps and Contract_Emps? Intuitively, no A Can John be in Contract_Emps and in Senior_Emps? B Intuitively, yes Contract_Emps OVERLAPS Senior_Emps C

Covering Constraints: Examples Covering constraints Does every one in Employees belong to a one of its subclasses? Intuitively, no A Does every Motor_Vehicles B entity have to be either a Motorboats entity or a Cars entity? Intuitively, yes Motorboats AND Cars COVER Motor_Vehicles C

More Details on ISA Hierarchies Attributes are inherited (i.e., if B ISA A, the attributes defined for a B entity are the attributes for A plus B) We can have many levels of an ISA hierarchy Reasons for using ISA: To add descriptive attributes specific to a subclass To identify entities that participate in a relationship

Aggregation Aggregation allows indicating that a relationship set (identified through a dashed box) participates in another relationship set ssn name lot Employees Monitors until pid started_on pbudget since did dname budget Projects Sponsors Departments

Next Class Continue the ER Model and Start with the relational Model