4541.564; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room 301-203) ADVANCED DATABASES Copyright by S.-g. Lee Review - 1
General Info. Text Book Database System Concepts, 6 th Ed., Silberschatz, et al, McGraw Hill, 2011. Other references as needed. Class Web Page http://ids.snu.ac.kr/wiki/lectures username & password required Class notes will be posted before class (for personal use only) Term Projects 2 development projects 1 presentation project EVALUATION Exams (midterm & final): 50% Development Projects: 20% Reports & Presentations: 20% Others: 10% Tentative Schedule 1. Review 2. Chap 21: Information Retrieval 3. Chap 14: Transactions 4. Chap 15: Concurrency Control 5. Chap 16: Recovery 6. Chap 17: DB Architectures 7. Chap 18: Parallel DB 8. Chap 19: Distributed DB 9. Chap 20: DW & Data Mining 10. Chap 22: Object-based DB 11. RDF & SPARQL 12. Relational Completeness 13. Student presentations Copyright by S.-g. Lee Review - 2
INTRO Copyright by S.-g. Lee Review - 3
Data, Database Data A formal description of an entity, event, phenomena, or idea that is worth recording Database An integrated collection of persistent data representing the information of interest for various programs that compose the computerized information system of an organization. Data are separated from the programs that use them Copyright by S.-g. Lee Review - 4
DBMS Database Management System Collection of interrelated data and a set of programs to access those data Information System DB + DBMS + Application programs + utilities File System Part of OS Stores programs, data, documents, or anything (in disk) Copyright by S.-g. Lee Review - 5
Instances and Schemas Similar to types and variables in programming languages Schema the logical structure of the database e.g., the database consists of information about a set of customers and accounts and the relationship between them) Analogous to type information of a variable in a program Physical schema: database design at the physical level Logical schema: database design at the logical level Instance the actual content of the database at a particular point in time Analogous to the value of a variable Copyright by S.-g. Lee Review - 6
Data Models The underlying structure of a database Collection of conceptual tools for describing data data relationships data semantics consistency constraints Entity Relationship Model Relational Model Object Oriented Model Copyright by S.-g. Lee Review - 7
Database Languages Data Definition Language (DDL) Specifies the DB Schema create table drop column Data Manipulation Language (DML) Query Operate on the contents of the DB retrieve, insert, delete, change, etc. a statement requesting the retrieval of information query language: part of DML data model dependent Copyright by S.-g. Lee Review - 8
Storage Management DBMS must effectively and efficiently manage storage (disk) space Storage manager a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system Physical Storage Physical storage media hierarchy RAID Storage Access File Organization Storage Structures for Object-Oriented Databases Copyright by S.-g. Lee Review - 9
DB users DBA (DB Administrator) schema definition storage structure, access method definition schema & physical organization security & authorization backup and recovery Application programmers Sophisticated users use DML Naïve users Use interfaces provided by application programs Copyright by S.-g. Lee Review - 10
Overall System Structure Application interface Application programs object code Application programs Embedded DML precompiler Query evaluation engine query DML compiler Database scheme DDL interpreter Query processor Transaction manager Buffer manager File manager Storage manager Data files Indices Statistical data Data dictionary Copyright by S.-g. Lee Review - 11
Entity Relationship Model entity entity set vs. entity (instance) weak entity sets relationship relationship cardinality binary, ternary, n-ary attribute multivalued attributes, derived attributes generalization/specialization total-partial, exclusive-overlap aggregation Copyright by S.-g. Lee Review - 12
ER Diagram Copyright by S.-g. Lee Review - 13
RELATIONAL MODEL & SQL Copyright by S.-g. Lee Review - 14
Relational Model E. F. Codd, A Relational Model of Data for Large Shared Data Banks," Communications of the ACM, June 1970, pp.377-387. Key features Uses a single structure called relation Set (& math) oriented model Physical data independence Definition of a relation R: Let D 1,..., D n be domains, then R D 1, X.. X D n R = { <d 1,..., d n > d 1 D 1,..., d n D n } Copyright by S.-g. Lee Review - 15
Relations and Tables a tuple in a relation represents relationship among set of values Implemented as tables relation R = { <a 1, b 1, c 1 >, <a 2, b 2, c 2 >,, <a n, b n, c n > } => table R Name Address Telephone column (field, attribute) HS Kim Suwon 323-3232 KS Lee Busan 323-5454 row (record, tuple) MH Choi Seoul 553-3235 KH Na Yongin 545-5488 Copyright by S.-g. Lee Review - 16
Relational Database A relational database a set of relations a collection of tables Keys superkey, candidate key, and primary key keys are constraints on allowable relation instances for a given schema Copyright by S.-g. Lee Review - 17
Relational Algebra Query languages Network, Hierarchical : navigational language Relational relational algebra relational calculus SQL QUEL Relational algebra operands : relations operators : fundamental operators + additional operators (algebra : operators and operands) Copyright by S.-g. Lee Review - 18
Formal Definition A basic expression in the relational algebra consists of either one of the following: A relation in the database A constant relation Let E 1 and E 2 be relational-algebra expressions; the following are all relational-algebra expressions: E 1 E 2 E 1 E 2 E 1 x E 2 p (E 1 ), P is a predicate on attributes in E 1 s (E 1 ), S is a list consisting of some of the attributes in E 1 N (E 1 ), N is the new name for the result of E 1 Copyright by S.-g. Lee Review - 19
Additional Operations We define additional operations that do not add any power to the relational algebra, but that simplify common queries. Set intersection Natural join Division Assignment Copyright by S.-g. Lee Review - 20
SQL Structured Query Language IBM's System R project : Sequel RC-based: SQL is declarative DML & DDL SELECT / UNION / INTERSECT / EXCEPT INSERT / DELETE / UPDATE CREATE / DROP / ADD Copyright by S.-g. Lee Review - 21
SQL examples List in alphabetic order the names of all customers having a loan in Perryridge branch select distinct customer-name from borrower, loan where borrower loan-number - loan.loan-number and branch-name = Perryridge order by customer-name Find the number of depositors for each branch. select branch-name, count (distinct customer-name) from depositor, account where depositor.account-number = account.account-number group by branch-name Copyright by S.-g. Lee Review - 22
INTEGRITY CONSTRAINTS Copyright by S.-g. Lee Review - 23
Integrity Constraints Integrity Constraints (IC) are rules that the data in the DB must abide by IC defines the semantics of the DB Domain Constraints restricts the values of a column Referential Integrity Foreign Key Constraint Let r1(r1), r2(r2) be relations with primary keys k1 & k2, respectively. R2 is a foreign key referencing k1 if it is required that for every t2 r2, there must be a tuple t1 r1 such that t1[k1] = t2[] (r2) k1(r2) Copyright by S.-g. Lee Review - 24
Integrity Constraints Assertion a general constraint expressed as x P(x) but in SQL as x (P(x)) CREATE ASSERTION sum-constraint CHECK (NOT EXISTS (SELECT * FROM branch WHERE (SELECT SUM(amount) FROM loan WHERE loan.b_name=branch.b_name) >= (SELECT SUM(amount) FROM account WHERE account.b_name=branch.b_name))) Trigger Action tied to a DB event (insert/delete/update) DEFINE TRIGGER overdraft ON UPDATE OF account T (IF NEW T.balance < 0 insert Copyright by S.-g. Lee Review - 25
Functional Dependencies Basic Concept R: relation scheme. Let R, R Functional dependency holds on R, if in any legal relation r(r), for all pairs of tuples t1 and t2 in r, if t1[]=t2[] then t1[]=t2[] Keys Trivial FD Inference Rules Reflexivity, Transitivity, Augmentation Closure & Cover Copyright by S.-g. Lee Review - 26
Decomposition Redundancy causes problems: anomalies Solution => decompose schema so that each information content is represented only once Definition: Let R be a relation scheme {R 1,..., R n } is a decomposition of R if R = R 1... R n (i.e., all of R s attributes are represented) binary decomposition mostly used: R into {R 1, R 2 } where R = R 1 R 2 student(id, name, dept, dept_chair, dept_phone, year) => student (ID, name, year, dept) department(dept, chair, phone) Lending = (b_name, asset, b_city, loan#, c_name, amount) => Branch = (b_name, asset, b_city) Loan = (loan#, c_name, amount) Copyright by S.-g. Lee Review - 27
Lossy Decomposition Lossy decomposition Careless decomposition leads to loss of information: Decomposition of R = (A, B) into R 1 = (A) and R 2 = (B) A B A B 1 2 1 A (r) 1 2 B (r) Can we recover the original information content? A (r) B (r) A B 1 2 1 2 Lossy! Copyright by S.-g. Lee Review - 28
Lossless-join Decomposition For r(r) and decomposition {R 1, R 2 }, it is always the case that r R1 (r) R2 (r) Definition: Decomposition {R 1, R 2 } is a lossless-join decomposition of R if r = R1 (r) R2 (r) The information content of the original relation r is always the basis r 1 r c d e f a a b b 1 1 2 3 r 2 r 1 c d e f a a b b a b b 1 2 3 r 2 Copyright by S.-g. Lee Review - 29
Lossless-join Decomposition Lemma: {R 1,...,R n } is a lossless decomposition if R 1 R 2 R 1, or R 1 R 2 R 2 i.e., if one of the two subschemas hold the key of the other subschema r 1 r c d e f a a b b 1 1 2 3 r 2 r 1 c d e f a a b b a b b 1 2 3 r 2 Copyright by S.-g. Lee Review - 30
Boyce-Codd Normal Form We want a way to decide whether a particular relation R is in good form. Definition: A relation schema R is in BCNF (with respect to a set F of FDs) if for each FD in F + ( R and R), at least one of the following holds: is trivial (i.e., ) is a superkey for R Example R = (A, B, C), F = {A B ; B C}, Key = {A} R is not in BCNF Decompose into R 1 = (A, B), R 2 = (B, C) R 1 and R 2 in BCNF Lossless-join decomposition Copyright by S.-g. Lee Review - 31
QUERIE PROCESSING Copyright by S.-g. Lee Review - 32
Query Processing & Optimization Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation Copyright by S.-g. Lee Review - 33
Issues in Query Processing Evaluation of individual operations Select, sort, join Evaluation of Expressions Equivalence of expressions name, title ( dept= Music ( instructor (teaches course ))) vs Cost based optimization name, title ( ( dept= Music (instructor)) teaches course ) Measures of Query Cost Copyright by S.-g. Lee Review - 34
Indexing Ordered Indices primary vs secondary dense vs sparse Multilevel index B+-Tree Index Files Hashing Static Hashing Dynamic Hashing Index Definition in SQL Multiple-Key Access Copyright by S.-g. Lee Review - 35