Databases = Categories

Similar documents
Categorical databases

Categorical databases

A Functorial Query Language. Ryan Wisnesky Harvard University DCP 2014

arxiv: v4 [cs.db] 3 Feb 2013

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

The language of categories

Categorical models of type theory

Topos Theory. Lectures 3-4: Categorical preliminaries II. Olivia Caramello. Topos Theory. Olivia Caramello. Basic categorical constructions

Chapter 2: Intro to Relational Model

8) A top-to-bottom relationship among the items in a database is established by a

Chapter 2: Intro to Relational Model

Category Theory 3. Eugenia Cheng Department of Mathematics, University of Sheffield Draft: 8th November 2007

3.1 Constructions with sets

Crash Course in Monads. Vlad Patryshev

MIS Database Systems Relational Algebra

COMBINATORIAL METHODS IN ALGEBRAIC TOPOLOGY

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Chapter 2 Introduction to Relational Models

Relational Algebra. Chapter 4, Part A. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

I. An introduction to Boolean inverse semigroups

14.1 Encoding for different models of computation

6. Relational Algebra (Part II)

An introduction to Category Theory for Software Engineers*

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 1. Textbook. Practicalities: assessment. Aims and objectives of the course

Database Applications (15-415)

Relational Model. Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS IIT, Abbottabad Pakistan

Lecture 4: examples of topological spaces, coarser and finer topologies, bases and closed sets

1 Introduction CHAPTER ONE: SETS

Relational Databases

The three faces of homotopy type theory. Type theory and category theory. Minicourse plan. Typing judgments. Michael Shulman.

More about Databases. Topics. More Taxonomy. And more Taxonomy. Computer Literacy 1 Lecture 18 30/10/2008

A MODEL CATEGORY STRUCTURE ON THE CATEGORY OF SIMPLICIAL CATEGORIES

Why Study the Relational Model? The Relational Model. Relational Database: Definitions. The SQL Query Language. Relational Query Languages

Database Technology Introduction. Heiko Paulheim

COSC344 Database Theory and Applications. σ a= c (P) Lecture 3 The Relational Data. Model. π A, COSC344 Lecture 3 1

More Relational Algebra

Arithmetic universes as generalized point-free spaces

Relational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity

Relational Model History. COSC 304 Introduction to Database Systems. Relational Model and Algebra. Relational Model Definitions.

Unit 4 Relational Algebra (Using SQL DML Syntax): Data Manipulation Language For Relations Zvi M. Kedem 1

Data Modeling and Integration Using the Open Source Tool AQL

Relational Database: The Relational Data Model; Operations on Database Relations

Relational Algebra Part I. CS 377: Database Systems

Denotational Semantics. Domain Theory

Metrics on diagrams and persistent homology

DBMS. Relational Model. Module Title?

File Processing Approaches

Lecture Notes for 3 rd August Lecture topic : Introduction to Relational Model. Rishi Barua Shubham Tripathi

Relational Algebra and SQL

Databases. Jörg Endrullis. VU University Amsterdam

CS317 File and Database Systems

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until

Abstract algorithms. Claus Diem. September 17, 2014

DC62 Database management system JUNE 2013

SQL. Dean Williamson, Ph.D. Assistant Vice President Institutional Research, Effectiveness, Analysis & Accreditation Prairie View A&M University

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Lecture 1: Overview

SILVER OAK COLLEGE OF ENGINEERING & TECHNOLOGY ADITYA SILVER OAK INSTITUTE OF TECHNOLOGY

Spaces with algebraic structure

Homotopy theory of higher categorical structures

Relational Algebra Homework 0 Due Tonight, 5pm! R & G, Chapter 4 Room Swap for Tuesday Discussion Section Homework 1 will be posted Tomorrow

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Programming with Math and Logic

Unit- III (Functional dependencies and Normalization, Relational Data Model and Relational Algebra)

SOME TYPES AND USES OF DATA MODELS

The Relational Model and Normalization

CS 377 Database Systems

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Interpretations and Models. Chapter Axiomatic Systems and Incidence Geometry

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Category Theory & Functional Data Abstraction

The design of a programming language for provably correct programs: success and failure

Relational Model and Algebra. Introduction to Databases CompSci 316 Fall 2018

Announcements. CSCI 334: Principles of Programming Languages. Exam Study Session: Monday, May pm TBL 202. Lecture 22: Domain Specific Languages

Algebra of Sets. Aditya Ghosh. April 6, 2018 It is recommended that while reading it, sit with a pen and a paper.

Chapter 3B Objectives. Relational Set Operators. Relational Set Operators. Relational Algebra Operations

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

Relational algebra. Iztok Savnik, FAMNIT. IDB, Algebra

Introduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure

Ian Kenny. November 28, 2017

An Evolution of Mathematical Tools

Relational Model, Relational Algebra, and SQL


Notes on categories, the subspace topology and the product topology

1 Relational Data Model

Theory of Computations Spring 2016 Practice Final Exam Solutions

Relational Model. IT 5101 Introduction to Database Systems. J.G. Zheng Fall 2011

On the Recognizability of Arrow and Graph Languages

Symmetric Product Graphs

Relational Algebra. Relational Query Languages

SOFTWARE ENGINEERING DESIGN I

Algebraic Topology: A brief introduction

Comp 5311 Database Management Systems. 2. Relational Model and Algebra

Relational Query Languages. Relational Algebra. Preliminaries. Formal Relational Query Languages. Relational Algebra: 5 Basic Operations

Unit 4 Relational Algebra (Using SQL DML Syntax): Data Manipulation Language For Relations Zvi M. Kedem 1

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Lecture 2: SML Basics

Formal Methods of Software Design, Eric Hehner, segment 24 page 1 out of 5

Transcription:

Databases = Categories David I. Spivak dspivak@math.mit.edu Mathematics Department Massachusetts Institute of Technology Presented on 2010/09/16 David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 1 / 39

Introduction My background My background Mathematics Thesis: algebraic topology Category theory and information science David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 2 / 39

Introduction The problem to be addressed The problem to be addressed Importance of databases: Information is the currency of the 21st century. The majority of computer cycles in the world operate on databases. Organizing information is the only way to make sense of our world. Problems with databases: They do not integrate well with programming languages. Data migration is difficult and prone to errors. Understanding a schema can be quite tedious. It s time to put databases on a firm mathematical footing. Science always benefits from good mathematical underpinnings. Need appropriate language and protocol for schemas and data. But don t we already have that in the Relational Algebra? David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 3 / 39

The relational model Introduction The problem to be addressed Mathematical definition of relation: Given sets (to be data types for columns) A 1, A 2,..., A n, a relation on them is a subset R A 1 A 2 A n. The elements of this subset are the rows of a table with the n specified columns. Relations are the current mathematical foundation of databases. Brought to bear on databases by E.F. Codd. Relations tell us what tables are, they give us an algebra for taking subsets, unions, intersections, joins, etc., and they give us a terminology and organizing principle. The mathematics of relations is over 150 years old. The relational algebra is simply out of date. It is not flexible enough to really describe what databases do. Most importantly, relations are about single tables whereas databases are massive conglomerates of interconnected tables. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 4 / 39

Introduction The status of database theory The problem to be addressed Juggling theory and practice Databases have been successful since the beginning. Because of a need for continuity, foundations have not been often reconsidered. The basic theory (relations) is quite simple. It has been extended again and again to give new organizational and functional capacities. Q: How well do the following fit in with the pure theory of relations? Nulls Skolem variables Schemas Queries Views Refactoring Data migration Optimizers David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 5 / 39

A need for unity Introduction The problem to be addressed Database management Each of the above (nulls, Skolem, queries, views, etc.) is an essential aspect of how databases are used. Each has its own mini-theory. But the pieces don t fit together very well. Clunky. Programming languages PL is more theoretical and unified. Perhaps the reason that PL isn t integrating well with database is the lack of coherency on our side. Headaches. The integration of PL and databases, the integration of different aspects of a DBMS, and the integration of different database systems all these headaches can be relieved by unifying database foundations. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 6 / 39

Introduction Category theory: a unified modeling language Category theory: a unified modeling language A need for unity in mathematics In the first half of the 20th century, a similar problem faced mathematics. Each subfield had its own jargon and way of doing business. It was noticed that there were similarities in all of them. Different subfields needed each other in order to advance. Solution: category theory. Invented in the 1940s to connect topology and algebra. Powerful, expressive, and scalable, yet axiomatically simple. Since its inception it has completely changed the way math is done. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 7 / 39

Introduction Category theory in practice Category theory: a unified modeling language In mathematics categories abound: Sets, Partially ordered sets, Graphs and trees, Monoids, groups, finite groups Topological spaces, vector spaces. Also used extensively in PL and linguistics A category specifies a language. Yet it is formal, structured, and well-defined. As such it serves as a nice foundation for semantics, both in Linguistics and PL. Functors relate categories. This is perhaps most important. A precise definition for translating notions between categories. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 8 / 39

Introduction Category theory: a unified modeling language My goal My goal was to find a categorical description of databases. A good language for discussing all aspects of databases in a unified way. It turns out that this is not only possible, it s quite simple. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 9 / 39

Introduction Schemas are categories; categories are schemas Schemas are categories; categories are schemas The reason it s simple is that categories and database schemas do the same thing. A schema gives a language for modeling a situation; Types Attributes This is precisely what a category does. Objects Morphisms. Schema = Category, Instance = functor. In this talk, I ll explain these ideas and some consequences. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 10 / 39

Introduction Schemas are categories; categories are schemas Outline of the talk Define categories, give examples, and show relation to schemas. Define functors, give examples, and show relation to data. Discuss morphisms of schemas and the associated data migration functors. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 11 / 39

Categories and schemas Categories Categories Idea: A category models objects of a certain sort and the relationships between them. g A f B C i D j k E h Think of it like a graph: the nodes are objects and the arrows are relationships. Some paths can be equated with others (example: j.k = i 3 ). David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 12 / 39

Data of a category C Categories and schemas Categories A category C consists of the following data: 1 A set Ob(C), called the set of objects of C. Objects x Ob(C) may be written as x or simply as x. 2 For each x, y Ob(C) a set Arr C (x, y), called the set of arrows in C from x to y. An element of Arr C (x, y) may be written f : x y or x f y. 3 For each x, y, z Ob(C) a composition law Arr C (x, y) Arr C (y, z) Arr C (x, z). We write f.g = h to denote that following arrow f then arrow g is the same as following arrow h: x f y g h z David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 13 / 39

Rules for a category C Categories and schemas Categories These data must satisfy the following requirements: 1 For every object y Ob(C) there is an identity arrow id y : y y in Arr C (y, y) such that for any f : x y, the equations id x.f = f and f.id y = f hold: x f id x x y f 2 Composition is associative. That is, given the following equation holds: w f x g y h z, f.(g.h) = (f.g).h x f f y id y David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 14 / 39 y

Categories and schemas Categories Examples of categories 1: Sets Any set X can be considered as a category. Its objects are the elements of X and it has no arrows (except an identity arrow for each object). X = a b d e f c Set. Objects are sets, arrows are total functions. f : X Y Set. Objects are sets, arrows are partial functions. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 15 / 39

Categories and schemas Categories Examples of categories 2: Posets and Graphs A partial order, (X, ). Turn it into a category whose objects are the elements of X and Arr(x, y) has one element if x y; else empty. For example: Linear order: A B 1 C A B 1, B 2 C B 2 [n] = 0 1 n Any graph can be turned into a category in several ways. A f B g h C David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 16 / 39

Categories and schemas Categories Examples of categories 3: Others Functional programming languages Hask. Objects are Haskell data-types, arrows are Haskell functions. Similar for ML. The category of topological spaces. Linguistic categories. A tight connection with databases. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 17 / 39

Categories and schemas What is a database? What is a database? A database consists of a bunch of tables and relationships between them. The rows of a table are called records or tuples. The columns are called attributes. A column may be pure data or may be a key. A table may have foreign key columns that link it to other tables. A foreign key of table A links into the primary key of table B. A schema may have business rules. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 18 / 39

Foreign Keys Categories and schemas Foreign Keys Example: Employee Employee Id First Last Mgr Dpt 101 David Hilbert 103 q10 102 Bertrand Russell 102 x02 103 Alan Turing 103 q10 Department Department Id Name Secr y q10 Sales 101 x02 Production 102 Note the primary key columns and foreign key columns. Perhaps we should enforce certain integrity constraints (business rules): The manager of an employee E must be in the same department as E, The secretary of a department D must be in D. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 19 / 39

Categories and schemas Foreign Keys Data columns as foreign keys Theoretically we can consider a data-type as a 1-column table. Example: String a b. z aa ab. So any data column can be considered a foreign key to a 1-column table. Conclusion: each column in a table is a key one primary, the rest foreign. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 20 / 39

Categories and schemas Categorical normal form Categorical normal form 1 Every table T has a unique primary key column id T, chosen at the outset. (Can be row-number or can be typed.) 2 Every data type D is considered as a 1-column table. The cells in id D are the values of that data type. 3 Every column c of every table T refers to some other table T. All values in the c-column can be found in the primary key column of T. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 21 / 39

Example again Categories and schemas Categorical normal form Employee Employee Id First Last Mgr Dpt 101 David Hilbert 103 q10 102 Bertrand Russell 102 x02 103 Alan Turing 103 q10 Department Department Id Name Secr y q10 Sales 101 x02 Production 102 Mgr Employee Dpt Secr y Department First Last String Name David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 22 / 39

Categories and schemas Database schema as a category Database schema as a category A database schema is a system of tables linked by foreign keys. This is just a category! C = Mgr Dpt Employee Department Secr y First String Last Name Objects are tables, arrows are columns. Primary key column of a table is identity arrow of an object. Declaring integrity constraints (e.g. Mgr.Dpt=Dpt) is declaring composition law. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 23 / 39

Functors and data Functors Functors Idea: A functor is a graph morphism that is required to respect the composition law. Definition: A functor F : C D consists of A function Ob(F ): Ob(C) Ob(D) and a function Arr(F ): Arr(C) Arr(D), that respect the source and target of every arrow, the identity arrow of every node, and the composition law. Note that the composition of functors is a functor. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 24 / 39

Functors and data Functors Examples of functors How many functors F : C D? C:= a e b F X f D:= Y g h Z Vals: Hask Set. The set of values for each data type. A f B C g Cat is the category whose objects are categories and morphisms are functors. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 25 / 39

Functors and data Schema=Category, Data=Functor Schema=Category, Data=Functor Let C = Mgr Dpt Employee Department Secr y First String Last Name Mgr.Dpt = Dpt; Secr y.dpt = id Department A functor δ : C Set consists of A set for each object of C and a function for each arrow of C, such that the declared equations hold In other words, δ fills the schema with data. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 26 / 39

Data as a functor Functors and data Schema=Category, Data=Functor C = δ : C Set m f d E D s l n S m 101,102,103 f a,b,...,alan,alao,... l d s n q10,x02 m.d = d; s.d = id D A category C is a schema. An object x Ob(C) is a table. A functor δ : C Set fills the tables with compatible data. For each table x, the set δ(x) is its set of rows. The composition law in C is enforced by δ as business rules. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 27 / 39

Functors and data C Set, the category of instances of C. C Set, the category of instances of C. Given a schema C there is a category of instances on C; call it C Set. The objects of C Set are the instances of schema C. More precisely, an object of C Set is a functor δ : C Set. The morphisms of C Set are updates. Called natural transformations of functors δ δ. Not too hard, but not worth the effort here. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 28 / 39

Functors and data Recap Recap Any category C is a database schema. Any functor δ : C Set is an instance of that schema. Updating instances takes place in the new category C Set. What does this do for you? David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 29 / 39

Functorial Data migration Morphisms between schemas Morphisms between schemas A functor F : C D is called a morphism of schemas. C:= x a y z F D:= x a y z Many choices of F above, one of which is indicated. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 30 / 39

Functorial Data migration A less obvious morphism Morphisms between schemas C:= x a y z F D:= a x y z David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 31 / 39

Functorial Data migration Data migration 1: pull-back Data migration 1: pull-back Given F : C D and δ : D Set, C F D δ Set, F.δ compose to get F.δ : C Set. Let δ vary (i.e. let updates take place). Now F : C D gives a functor D Set C Set. C:= x a y z F D:= a x δ Set y z David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 32 / 39

Functorial Data migration The existential and universal push-forwards The existential and universal push-forwards Given a functor F : C D we get a migration functor called F : D Set C Set. This migration functor has a left adjoint F and a right adjoint F. Both of these push data forward from C to D F : C Set D Set F : C Set D Set. I won t go into how these work but just give some quick examples. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 33 / 39

Functorial Data migration The universal push-forward F makes joins The universal push-forward F makes joins a 1 C:= a 2 a 3 b 1 b 2 b 3 c 1 d 1 d 2 F D:= a b 1 b 2 b 3 c 1 d 1 d 2 We begin with three tables (with 3,1, and 2 data columns respectively) arranged in a detail to master hierarchy. Apply the universal push-forward F to join them into one table. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 34 / 39

Functorial Data migration The universal push-forward F makes joins More joins using F SSN First T1 T2 Last Salary F SSN First U Last Salary Employee Converter USD Euro F Employee Euro David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 35 / 39

Functorial Data migration The existential push-forward F makes unions The existential push-forward F makes unions C:= SSN First T1 T2 Last Salary F D:= SSN First U Last Salary Given any instance δ : C Set, get an instance F δ : D Set. The rows in table U will be the union of the rows in T1 and T2. It will automatically use Skolem variables for the unknown cells. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 36 / 39

Conclusions Advantage from mathematical basis Advantage from mathematical basis Unity Putting large swath of database concepts under one framework. This framework works nicely with mathematics, programming languages, lingusitics. Opens up a world of possibility. Visualizability The ability to visualize SQL statements in terms of graph (category) morphisms. Should be easy to create a system to convert drawings of categories into schemas. This allows a larger set of every day people to work with database systems. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 37 / 39

Conclusions Advantage of rigor / soundness. Advantage from mathematical basis The ability to encode business rules in the structure This ensures a higher quality in data migration. It also makes for more seamless transitions to new DBAs. Rules for schema migration are more precise. Typically there are many rules for database mapping. These rules are static and over-cautious. With good foundation, one can decide more dynamically what is ok. One can employ theorem provers and checkers. The schema and the data have a mathematical structure. Theorem checkers can support your results or stated characteristics. Prove that a certain query-plan will be fastest in a local situation. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 38 / 39

Conclusions Summary Summary Category theory is useful. It can be used to model any well-defined situation. It unifies and clarifies. It is about as hard to learn as discrete mathematics is. Categories and database schemas are the same thing! No wonder categories are useful in modeling. Mathematical language has many advantages. It is generally worth the effort to learn. I am available to work on this connection. David I. Spivak (MIT) Databases = Categories Presented on 2010/09/16 39 / 39