A Functorial Query Language. Ryan Wisnesky Harvard University DCP 2014

Similar documents
FQL IDE Manual. Ryan Wisnesky April 18, Introduction 2

Categorical databases

Databases = Categories

Algebraic Model Management: A Survey

Categorical databases

Denotational Semantics. Domain Theory

Categorical models of type theory

Topos Theory. Lectures 3-4: Categorical preliminaries II. Olivia Caramello. Topos Theory. Olivia Caramello. Basic categorical constructions

Data Modeling and Integration Using the Open Source Tool AQL

Relational Databases

Implementing database design (and manipulation) categorically

The language of categories

The three faces of homotopy type theory. Type theory and category theory. Minicourse plan. Typing judgments. Michael Shulman.

CMPS 277 Principles of Database Systems. Lecture #11

Relational Model and Relational Algebra

Schema Mappings and Data Exchange

On the Recognizability of Arrow and Graph Languages

A Database of Categories

CS 4110 Programming Languages & Logics. Lecture 28 Recursive Types

arxiv: v4 [cs.db] 3 Feb 2013

An introduction to Category Theory for Software Engineers*

Ian Kenny. November 28, 2017

Functional Query Languages with Categorical Types

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

ACLT: Algebra, Categories, Logic in Topology - Grothendieck's generalized topological spaces (toposes)

Arithmetic universes as generalized point-free spaces

The Relational Algebra

CMPS 277 Principles of Database Systems. Lecture #4

Crash Course in Monads. Vlad Patryshev

Relational Database: The Relational Data Model; Operations on Database Relations

Lecture 1: Conjunctive Queries

Inductive Types for Free

I. An introduction to Boolean inverse semigroups

Chapter 3: Relational Model

Mathematics for Computer Scientists 2 (G52MC2)

Contents Introduction and Technical Preliminaries Composition of Schema Mappings: Syntax and Semantics

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Lecture 18: Groupoids and spaces

DATABASE DESIGN II - 1DL400

CMPS 277 Principles of Database Systems. Lecture #3

CS34800 Information Systems. The Relational Model Prof. Walid Aref 29 August, 2016

Negations in Refinement Type Systems

Relational Algebra. Procedural language Six basic operators

Recursive Types and Subtyping

Algebraic Topology: A brief introduction

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

Languages and Compilers

Uncertainty in Databases. Lecture 2: Essential Database Foundations

9/19/12. Why Study Discrete Math? What is discrete? Sets (Rosen, Chapter 2) can be described by discrete math TOPICS

Category Theory & Functional Data Abstraction

Chapter 6: Formal Relational Query Languages

Query Processing and Optimization

Syntax-semantics interface and the non-trivial computation of meaning

II. Structured Query Language (SQL)

CSC Discrete Math I, Spring Sets

Relational Model & Algebra. Announcements (Tue. Sep. 3) Relational data model. CompSci 316 Introduction to Database Systems

CS152: Programming Languages. Lecture 11 STLC Extensions and Related Topics. Dan Grossman Spring 2011

Recursive Types and Subtyping

Chapter 6 Part I The Relational Algebra and Calculus

Relational Model, Relational Algebra, and SQL

3. Relational Data Model 3.5 The Tuple Relational Calculus

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Cubical sets as a classifying topos

Relational Algebra 1. Week 4

CS 317/387. A Relation is a Table. Schemas. Towards SQL - Relational Algebra. name manf Winterbrew Pete s Bud Lite Anheuser-Busch Beers

Lecture 5: The Halting Problem. Michael Beeson

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

THE RELATIONAL MODEL. University of Waterloo

An introduction to simplicial sets

Relational Metadata Integration. Cathy Wyss

λ-calculus Lecture 1 Venanzio Capretta MGS Nottingham

CMPT 354: Database System I. Lecture 5. Relational Algebra

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

2.2.2.Relational Database concept

CS 377 Database Systems

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

Outline. 1 CS520-5) Data Exchange

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

A Typed Lambda Calculus for Input Sanitation

Introduction to Sets and Logic (MATH 1190)

CS 275 Automata and Formal Language Theory. First Problem of URMs. (a) Definition of the Turing Machine. III.3 (a) Definition of the Turing Machine

Relational Algebra. Lecture 4A Kathleen Durant Northeastern University

FRUCHT S THEOREM FOR THE DIGRAPH FACTORIAL

An Evolution of Mathematical Tools

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

Introduction III. Graphs. Motivations I. Introduction IV

Today s Topics. What is a set?

Substitution in Structural Operational Semantics and value-passing process calculi

LOGIC AND DISCRETE MATHEMATICS

Composing Schema Mapping

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Recursively defined cpo algebras

Foundations of Databases

Uncertain Data Models

DERIVED DEFORMATION RINGS FOR GROUP REPRESENTATIONS

Relational Algebra. Relational Algebra. 7/4/2017 Md. Golam Moazzam, Dept. of CSE, JU

DATABASE MANAGEMENT SYSTEMS

ACLT: Algebra, Categories, Logic in Topology - Grothendieck's generalized topological spaces (toposes)

On the Expressiveness of Polyadicity in Higher-Order Process Calculi

CS 4110 Programming Languages & Logics. Lecture 27 Recursive Types

Transcription:

A Functorial Query Language Ryan Wisnesky ryan@cs.harvard.edu Harvard University DCP 2014 1

History In the early 1990s, Rosebrugh et al noticed that finitely presented categories could be thought of as database schemas. A finitely presented category - a schema - is a directed multigraph with path equations. An instance I on a schema C is a functor C Set. This is, for each node X, a set of IDs IX, and for each edge X Y, a function IX IY. 2

Example Schema & Instance manager worksin Employee Department secretary Employee.manager.worksIn=Employee.worksIn Department.secretary.worksIn=Department Employee ID manager worksin 101 103 q10 ID Department secretary 102 102 x02 103 103 q10 3 q10 102 x02 101

History, continued A schema mapping F : C D takes nodes(c) nodes(d) and edges(c) paths(d) in a way that respects C s path equations. This is, a mapping is a functor. Given a mapping F : C D, there are three adjoint data migration operations: Δ F : D-Inst C-Inst (similar to projection) Σ F : C-Inst D-Inst (similar to disjoint union) F : C-Inst D-Inst (similar to cartesian product) We call this the functorial data model. 4

Example Migrations A a1 a2 a3 B b1 b2 A B F C A c1 c2 c3 B c1 c2 c3 F ΣF Δ F (A) = C ΔF Δ F (B) = C C (a1,b1) (a1,b2) (a2,b1) (a2,b2) (a3,b1) (a3,b2) C (a1,a) (a2,a) (a3,a) (b1,b) (b2,b) Σ F (C) = A+B F (C) = A B C c1 c2 c3 5

Advantages Schemas and mappings form a bi-cartesian closed category (BCCC), so we can take products of schemas, co-products, exponentials, etc. For each schema T, the T-instances and their homomorphisms (which are natural transformations) form a topos (a BCCC with a sub-object classifier). Data integrity constraints are built-in to schemas and path equations are a natural and expressive class of constraint. 6

Disadvantages Instances must be considered up to isomorphism, not equality. So every constant is a meaningless ID. There is no obvious query language, nor is it obvious how to implement the three data migration operators using, for example, SQL. So Rosebrugh et al moved on to a different categorical data model, that of sketches. We pick up where they left off, and address these challenges. 7

Historical Aside There have been many other uses of category theory for information management. Wong, Tannen, Buneman, and others used category theory to develop the nested relational calculus, but their work is not related to ours. Alagic and Bernstein defined a notion of a good data model using category theory; the functorial data model is good by their definition. The Δ,Σ, data migration operations appear in a different guise in categorical logic and type theory. Key phrase: quantification is adjoint to substitution. 8

Contributions I have been working with David Spivak to extend the functorial data model, and to build practical tools based on it. The second half of this talk will be a demo. Key results: A way to store concrete data (attributes) in instances. A functorial query language, FQL. An implementation of FQL in SQL, and vice versa. Project webpage: wisnesky.net/fql.html 9

Attributes We associate to each node in a schema a set of attribute names and domains (strings, integers, etc). An instance contains additional columns for attributes. The category theory required to describe schemas and instances with attributes is verbose, but straightforward. The key challenge is making sure the useful properties of the functorial data model continue to hold. Example: isomorphisms of instances preserve attributes. 10

Employees with Attributes manager worksin Employee Department secretary first last Employee.manager.worksIn=Employee.worksIn Department.secretary.worksIn=Department name Employee ID manager worksin first last 101 103 q10 Al Akin 102 102 x02 Bob Bo Department ID secretary name q10 102 CS x02 101 Math 103 103 q10 Carl Cork 11

FQL Schemas with attributes still form a BCCC. We can define schemas and mappings using categorical abstract machine language. Equivalently, using the simply typed λ-calculus (STLC). Instances with attributes still form a topos. We can define instances and homomorphisms using higher-order logic (HOL) (= STLC + equality at all types). Some minor details about finite vs infinite domains apply. Migrations for the form Σ F o G o Δ H are closed under composition, provided F is a discrete op-fibration. An FQL query is a migration of the above form. 12

FQL Syntax schema S = { nodes Employee, Department; attributes name : Department -> string, first : Employee -> string, last : Employee -> string; arrows manager : Employee -> Employee, worksin : Employee -> Department, secretary : Department -> Employee; equations Employee.manager.worksIn = Employee.worksIn, Department.secretary.worksIn = Department } instance I = { nodes Employee -> { 101, 102, 103 }, Department -> { q10, x02 }; attributes first -> { (101, Alan), (102, Camille), (103, Andrey) }, last -> { (101, Turing), (102, Jordan), (103, Markov) }, name -> { (q10, AppliedMath), (x02, PureMath) }; arrows manager -> { (101, 103), (102, 102), (103, 103) }, worksin -> { (101, q10), (102, x02), (103, q10) }, secretary -> { (q10, 101), (x02, 102) }; } : S 13

FQL Syntax, continued //From products example schema S = { } //products of schemas schema T = { } schema A = (S * T) mapping p1 = fst S T mapping p2 = snd S T mapping p = (p1 * p2) //is id //products of instances instance I = { } : S instance J = { } : S instance A = (I * J) transform K = A.fst transform L = A.snd transform M = A.(K * L) //is id //From co-products example schema S = { } //co-products of schemas schema T = { } schema A = (S + T) mapping p1 = inl S T mapping p2 = inr S T mapping p = (p1 + p2) //is id //co-products of instances instance I = { } : S instance J = { } : S instance A = (I + J) transform K = A.inl transform L = A.inr transform M = A.(K + L) //is id 14

FQL / SQL Let SPCU denote the select-project-product-union relational algebra. Let guidgen denote the operation taking n-ary tables to n+1- ary tables by adding a new column with globally unique IDs. Every FQL query can be implemented using SPCU+guidgen. Every SPCU query under bag semantics can be implemented using FQL. FQL can be extended with an operation, relationalize, such that every SPCU query under set semantics can be implemented using FQL+relationalize. 15

FQL to SQL Δ migrations are compositions of tables, implementable with SPC. Σ migrations are unions of compositions of tables, implementable with SPCU. migrations are implementable with SPC+guidgen, but are much more complex to describe than Δ, Σ. Their construction requires comma categories and implementing diagram limits using joins. Products and co-products are implementable with SPCU. 16

SQL to FQL Consider a relational schema with two relations: R(c 1,, c n ) R (c 1,, c n ) It is encoded in FQL with an active domain and an attribute: c1 R R c1 cn cn adom att Using this encoding, Δ implements projection, implements selection and product, and Σ implements union. 17

FQL IDE Demo Download fql.jar from wisnesky.net/fql.html Run by double-clicking or java -jar fql.jar Requires Java 7 Internally, FQL emits SQL and runs it using the H2 SQL engine (h2database.com) The FQL IDE does allow additional operations that can t be implemented in SQL. The FQL IDE can execute against external instances using JDBC. FQL is case insensitive. 18

Home Screen 19

Employees 20

Delta - Mapping 21

Delta - Projection 22

Sigma - Mapping 23

Sigma - Union 24

Pi - Mapping 25

Pi - Product 26

Other FQL IDE Features Translates from SPCU (in SQL syntax) to FQL. Category of elements view displays an instance as a graph where every ID is a node. Observables view shows all the different attributes associated with an ID. Generates FQL from attribute correspondences. Supports enumerated (finite) types. Compiles FQL to embedded dependencies, an alternative relational language. 27

Conclusion We are excited about the functorial data model as an alternative basis for studying problems in information management. It has a number of useful properties that the relational model does not: Its schemas are naturally based on graphs and build-in constraints. It is naturally ID and bag based, and can be extended to work with sets. It can implement a number of information integration scenarios that relational tools like Clio and Rondo cannot (see my thesis). It can implement SPCU via an encoding. As the FQL IDE demonstrates, functorial data migration is more than just generalized abstract nonsense. Send questions/comments to ryan@cs.harvard.edu. 28