Announcements CS511 Design of Database Management Systems HW incremental release starting last Sun. Class will reschedule next week: time: Wednesday Tuesday 5pm, place: 1310 DCL Lecture 05: Object Relational DB: POSTGRES Kevin C. Chang 2 Time: early 80 s Background after the initial impl. and products of relational DBMS Motivation: RDBMS not powerful enough for non-administrative data-intensive applications such as: CAD/CAM, GIS Buzz terms: object-oriented, extensible POSTGRES: Post INGRES Stonebraker, U.C. Berkeley 1977-1985: INGRES among the first relational DB implementation Ingres Inc. -->.. acquired by Computer Associates 1986-1994: POSTGRES among the first object-relational DB implementation Illustra acauqired by Informix PostgreSQL (the SQL version) 3 4 Post-Relational DB Projects Postgres: U.C. Berkeley Starburst: IBM Almaden highly extensible after System R (relational), R* (distributed) ultimately finding its way into IBM DB2 UDB Exodus: U. Wisconsin not a complete DB; an OO-style storage manager toolkit followed by: Shore at Wisconsin, Predator at Cornell RDBMS: the Relational Root Data model: (Codd, 1970 s) a database is a set of relations relation of n attributes: a set of n-tuples n-tuple: (v 1,, v n ), where v i is in domain S i 5 6 1
Relational Model: Normal Forms Basic: 1NF (First Normal Form) implicitly required in the relation model definition: only simple domains of atomic elements (Codd) simple domains represent the base (built-in) types? why? Stronger normal forms: 4NF, Boyce-Codd Normal Form, 3NF, 2NF,? why? 7 Normalizing Relations: Example Unnormalized relation of book objects : Books: title authors date great future {smith, jones} 4/01/01 career {jones} 7/12/00 Normalized relations: by decomposition Books: title day month year great future 4 1 01 career 7 12 00 Books: title authors great future smith great future jones career jones?? Problems of the relational model? 8 Relational Model Problems A relational DB is like a garage that forces you to take your car apart and store the pieces in little drawers. (some researcher) Object notion lost by decomposition non-intuitive: object is decomposed into several relations inefficient: a lot of online assembling by joins Base types are too restrictive integers and strings are very primitive data types are typically application specific Relational algebra is the only allowed operation simple, declarative, but also restrictive application = host language + embedded SQL?? How to remedy these problems? 9 Quest for a Richer Model? Object-oriented data model Extensible ADTs Programming-language constructs 10 ORDBMS vs. OODBMS Question: How important is the relation? ORDBMS: (this lecture) RDBMS + OO features # query-based OODBMS: OO PL + database features (persistent objects) programming-based Meeting in the middle Stonebraker s Matrix Simple Data Complex Data Query RDBMS ORDBMS No Query File System OODBMS Prediction: ORDBMS will dominate evidence: big DB players are all on this side 11 12 2
?? Supporting Extensible Types by ADT DB issues for defining a new type? Storage? Parsing? Optimization? Execution? Access methods? ADT Support Storage: space requirement, conversion of representations Parsing: must know user-defined types/methods (table-driven) Optimization selectivity/cost of user defined predicates/methods match user-defined predicate to access methods 13 14 ADT Support Execution methods called via function pointers or similar constructs dynamic linking to user-defined code untrusted functions Access methods support user-defined access methods POSTGRES: access method = 13 user-defined functions GiST is along this line ADTs for RDBMS ADT extends the notion of base types do not fundamentally change a relational system fit naturally into RDBMS query processing ADT is different from the OO features?? How are they different? Reference: Stonebraker, Inclusion of New Types in Relational Data Base Systems, RDS pp. 516 15 16 Object Orientation Concepts Classes: classes as types encapsulation: interface + implementation inheritance: building class hierarchies Objects: complex objects: built from constructors, e.g., set-of, array, nested objs object identity (OID): system generated as unique object reference enables (efficient) object linking and navigation POSTGRES data model: OO constructs POSTGRES Data Model classes as relations object (class instance) = tuple object-id = tuple-id method = attribute or function of attributes inheritance (multiple parents) ADT constructs: types functions 17 18 3
POSTGRES Functions Arbitrary C functions e.g.: overpaid(employee) arbitrary semantics-- not optimized no fancy access methods-- typically sequential scan Binary operators hints to provide semantics extensible access methods extensible B+tree or user-defined index PostQuel procedures parameterized queries as functions e.g.: sal-lookup(name): retrieve Emp.salary where Emp.name = name 19 POSTGRES Storage System # We were guided by a missionary zeal to do something different No-overwrite system Logging: old values are not overwritten-- no value logging necessary log only needs to keep transaction state (commit/abort/going)?? crash recovery-- how? Vacuum-cleaner daemon to archive historical data Advantages: recovery is cheap time travel is easy 20 Storage System: Problems Problems flushing differential data (why?) by commit time can be costly unless stable main memory more costly than sequentially writing out logs why?? reads have to stitch together current picture And, yes, there are lots details unexplored or unexplained POSTGRES Queries: POSTQUEL Path expression for object navigation e.g.: Emp.manager.name? what relational operation does this replace? # Transitive closure queries: the * construct? what does this compute? answer(a) :- parent(a, john) answer(a) :- parent(a, x), answer(x) Time travel queries querying DB snapshot at time T: Emp [T] more generally: temporal DB 21 22 Other Novelties Comprehensive rule systems ON event WHERE conditions THEN DO actions applications: consistency enforcement (integrity constraints) views (derived classes) class versioning: new-version = parent + differential FastPath to provide low-level access direct call to DB internal functions e.g., get access to OID assignment Post POSTGRES Development Commercial product Illustra acquired by Informix into Universal Server object-relational coined by Informix for Illustra Open source software: PostgresSQL Postgres95 (94-95): 250k lines of C code ported to SQL support by two UCB graduate students PostgreSQL: based on Postgres95 http://www.postgresql.org/ release every 3-5 months, current: V. 8.x 23 24 4
Your Question: How about Joins? The POSTGRES paper discusses the Nested Dot Notation as an alternative to Joins. Does this mean that the POSTGRES system will not require Joins at all in any scenario. How will this notation translate to better storage structures so as to beat the performance of Join Questing for the Right Models Speaking about knowledge representation The simple relational model is by far the only successful KR paradigm. When the relational model came along, the network guys resisted and their companies went under. When the OO model came along, the relational guys absorb its best, and their companies prospered again! -- Jeffery Ullman 25 26 What s Next? End Of Talk Overview of relational DBMS implementation 27 28 Your Question: How about Joins? What is the best way to represent a many-to-many relationship [which is typically modelled with three tables in a relational database] in an object-oriented DBMS such as POSTGRES, without losing query performance on any of the main table/classes in the relation? e.g., student enrollment class How about: Student.Classes and Class.Students? How to make this efficient? 29 5