Static checking of domain constraints in applications interacting with relational database by means of dependently-typed lambda calculus

Similar documents
ADAM CHLIPALA. Research interests. Education. Refereed conference papers

The Relational Model. Suan Lee

Chapter 2: Intro to Relational Model

Topic 7: Algebraic Data Types

Programming with Patterns. ACM SIGPLAN Developer Tracks on Functional Programming (DEFUN 2009)

Chapter 2: Intro to Relational Model

Beluga: A Framework for Programming and Reasoning with Deductive Systems (System Description)

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Control Flow Analysis with SAT Solvers

Functional Programming. Overview. Topics. Definition n-th Fibonacci Number. Graph

CS 11 Haskell track: lecture 1

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

And Parallelism. Parallelism in Prolog. OR Parallelism

SQL: Data Definition Language. csc343, Introduction to Databases Diane Horton Fall 2017

COSC344 Database Theory and Applications. σ a= c (P) Lecture 3 The Relational Data. Model. π A, COSC344 Lecture 3 1

Where Are We? Next Few Lectures. Integrity Constraints Motivation. Constraints in E/R Diagrams. Keys in E/R Diagrams

Database Technology Introduction. Heiko Paulheim

G Programming Languages Spring 2010 Lecture 6. Robert Grimm, New York University

DATABASE TECHNOLOGY - 1MB025 (also 1DL029, 1DL300+1DL400)

Domain Constraints Referential Integrity Assertions Triggers. Authorization Authorization in SQL

Core Membership Computation for Succinct Representations of Coalitional Games

Dependent Object Types - A foundation for Scala s type system

Haskell 98 in short! CPSC 449 Principles of Programming Languages

Extracting the Range of cps from Affine Typing

COSE212: Programming Languages. Lecture 3 Functional Programming in OCaml

Databases. Jörg Endrullis. VU University Amsterdam

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

An Evolution of Mathematical Tools

Lecture 8: Summary of Haskell course + Type Level Programming

CSCI-GA Scripting Languages

Relational Databases

Toward a Verified Rela.onal Database Management System

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations

Chapter 7 Constraints and Triggers. Spring 2011 Instructor: Hassan Khosravi

Behavioral Equivalence

Types. Benjamin C. Pierce University of Pennsylvania. Programming Languages Mentoring Workshop, Jan. 2012

Constructing Control Flow Graph for Java by Decoupling Exception Flow from Normal Flow

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

The alternator. Mohamed G. Gouda F. Furman Haddix

Adam Chlipala University of California, Berkeley ICFP 2006

King Fahd University of Petroleum and Minerals

On Meaning Preservation of a Calculus of Records

Programming with Math and Logic

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

Discrete Mathematics Lecture 4. Harper Langston New York University

LOGIC AND DISCRETE MATHEMATICS

Slides for Faculty Oxford University Press All rights reserved.

Lecture Notes for 3 rd August Lecture topic : Introduction to Relational Model. Rishi Barua Shubham Tripathi

Relational Database Systems Part 01. Karine Reis Ferreira

CS 430 Spring Mike Lam, Professor. Data Types and Type Checking

CS2300: File Structures and Introduction to Database Systems

doc. RNDr. Tomáš Skopal, Ph.D. RNDr. Michal Kopecký, Ph.D.

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Type Inference Systems. Type Judgments. Deriving a Type Judgment. Deriving a Judgment. Hypothetical Type Judgments CS412/CS413

Type-directed Syntax Transformations for ATS-style Programs with Proofs

Programming Languages

Query formalisms for relational model relational calculus

Learning Queries for Relational, Semi-structured, and Graph Databases

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

SQL: Concepts. Todd Bacastow IST 210: Organization of Data 2/17/ IST 210

2.2.2.Relational Database concept

Runtime Behavior of Conversion Interpretation of Subtyping

Streams, Delayed Evaluation and a Normal Order Interpreter. CS 550 Programming Languages Jeremy Johnson

Lecture 1: Conjunctive Queries

Introductory logic and sets for Computer scientists

Element Algebra. 1 Introduction. M. G. Manukyan

Programming for Data Science Syllabus

Functional Programming. Big Picture. Design of Programming Languages

CMSC 331 Final Exam Section 0201 December 18, 2000

CSC312 Principles of Programming Languages : Functional Programming Language. Copyright 2006 The McGraw-Hill Companies, Inc.

Cut points in PEG. Extended Abstract. Roman R. Redziejowski.

Programming Languages Third Edition

Denotational Semantics. Domain Theory

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

On the Hardness of Counting the Solutions of SPARQL Queries

SQL Functionality SQL. Creating Relation Schemas. Creating Relation Schemas

1 Introduction One of the contributions of Java is in its bytecode verier, which checks type safety of bytecode for JVM (Java Virtual Machine) prior t

An Introduction to Type Inference

CSC443 Winter 2018 Assignment 1. Part I: Disk access characteristics

Relational Data Model

INCONSISTENT DATABASES

Recursion. Tjark Weber. Functional Programming 1. Based on notes by Sven-Olof Nyström. Tjark Weber (UU) Recursion 1 / 37

Consistency and Set Intersection

Research on the Static Analysis Method of the Localization Embedded Platform Software Code Zhijie Gaoa, Ling Lu, Wen Jiao

Behavioral Equivalence

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen

Types and Programming Languages. Lecture 5. Extensions of simple types

Type Checking and Type Inference

Analysis of dependent types in Coq through the deletion of the largest node of a binary search tree

RAISE in Perspective

Inference rule for Induction

Functional Programming and Haskell

CS W Introduction to Databases Spring Computer Science Department Columbia University

CMPS 277 Principles of Database Systems. Lecture #11

Comp 5311 Database Management Systems. 4b. Structured Query Language 3

Lecture 8. Database Management and Queries

Static type safety guarantees for the operators of a relational database querying system. Cédric Lavanchy

ARELAY network consists of a pair of source and destination

Relational terminology. Databases - Sets & Relations. Sets. Membership

1KOd17RMoURxjn2 CSE 20 DISCRETE MATH Fall

Transcription:

Static checking of domain constraints in applications interacting with relational database by means of dependently-typed lambda calculus Maxim Aleksandrovich Krivchikov Ph.D., senior researcher, Lomonosov Moscow State University maxim.krivchikov@gmail.com Evgeny Vladimirovich Shulgin graduate, Lomonosov Moscow State University shulginev@yandex.ru 1 Introduction In the modern software development practice, almost every information system uses a database management system (DBMS) to store data. The most widespread database model nowadays is the relational model, which describes data as the tuples of attribute values. This model is implemented by relational database management systems. One of the aspects of data modeling in relational DBMS is the consistency constraints. Consistency constraints can be represented either as predicates over the attributes of a single tuple, or as uniqueness requirements of the subset of attributes in the whole relation, as a requirement of the existence of tuple with specified attributes in a different relation, or as a subroutine call before appending the tuple to the relation. Such subroutine call in existing relational DBMS may perform arbitrary database queries. Requests for data modification (appending or removal of a tuple or updating the attribute values) which result in a violation of consistency constraints are rejected by DBMS. The common design models for information systems interacting with DBMS are three-tier or client-server architectures. First tier (data tier) corresponds to the database. Second tier (logic tier or client) is usually implemented as an application server or a separate application written in a general purpose programming language (e.g., Java, C#, Python). Interaction with the database is implemented in the application by means of SQL language or a software library which constructs SQL queries. The application code must be consistent with the data model, which means that all possible cases of consistency constraint violation must be handled in the application. During the software lifecycle data model and constraints are usually modified to reflect the changes in requirements. Therefore, the consistency constraints handling in the application must be controlled continuously. The present paper describes the research-in-progress of the relational database model in terms of dependently-typed lambda calculus. The model can represent the consistency constraints on the relational

58 database and statically check that application code respects the consistency constraints. 2 Background We will start the discussion by an example from ISTINA system developed in Lomonosov Moscow State University. We will discuss the domain constraints in Dissertation Councils subsystem which is concerned with the informational support of public doctoral theses defense. The data model of the simplified Dissertation Councils subsystem in example consists of the following relations: 1. Researcher profile ( Worker relation) which in present paper contains a single attribute Worker.id which uniquely identifies the profile of the researcher in the system (usually called primary key ). 2. Dissertation Council ( DissertationCouncil relation) with the following attributes: DissertationCouncil.id the primary key; DissertationCouncil.number the unique sequence of letters and numbers separated by space and period characters, identifying the dissertation council in official documents. 3. Record of membership of the scientist in the dissertation council ( DissertationCouncilMembership relation) with the following attributes: DissertationCouncilMembership.id the primary key; DissertationCouncilMembeship.worker the reference to researcher profile (a foreign key ); DissertationCouncilMembership.council the reference to the dissertation council (a foreign key ); DissertationCouncilMembership.startdate the effective date of entry of the scientist to the dissertation council; DissertationCouncilMembership.enddate the effective date of resignation of the scientist from the dissertation council. The consistency constraints on these relations include: primary keys uniqueness (by definition of primary key); foreign key existence (by definition of foreign key); dissertation council number uniqueness (domain requirement); a scientist can simultaneously be a member of no more than two dissertation councils in Lomonosov Moscow State University (domain requirement). Modern software development frameworks include the database abstraction libraries. With such libraries, an entry may be inserted into the database by means of the function call of the library. For example, in Django framework for Python programming language the insertion of new dissertation council membership record is represented by the following line of code: DissertationCouncilMembership.objects.create(worker=w, council=c, startdate=datetime.now()) If the previously mentioned constraints are defined in DBMS, the evaluation of this statement may result in an error (exception). Such exceptions are usually handled in

59 the software code. Statically typed programming languages (for example, in Java programming language) usually provide tools for checking that all possible types of exceptions which can be raised inside the function are handled. Nevertheless, exceptions produced by DBMS often are rather generic (e.g. integrity constraint (SOME_POSSIBLY_MEANINGFUL_NAME) violated child record found ) and hence hard to process. Also, the handling of all possible exceptions cannot ensure that all data constraints are handled in exception handlers (which results in undescriptive error messages). More importantly, it is not possible to ensure that some fragment of code will not violate any consistency constraints due to performed checks of input data. The model presented in this paper allows representing the results of constraint check as so-called witness object to prove that the insertion of a new row will not result in constraint violation. The type of such witness object binds it with the specific data set for which the checks are performed and with the details of checks. A function which inserts the record into the database with static constraint control accepts such proof as an additional parameter: P = check_no_more_than_one_membership(worker=w, council=c, startdate=datetime.now()) DissertationCouncilMembership.objects.create(worker=w, council=c, startdate=datetime.now(), proof=p) The model is based on dependently-typed lambda calculus and implemented in Agda proof assistant. The model itself does not use any features specific to Agda except type inference in examples. Presumably, the model can be implemented with minimal modifications in any dependently-typed language based on the Calculus of Constructions supporting mutually inductive type definitions. 3 Related work The existing work on relational DBMS specification in dependently-typed programming languages is concerned primarily with the task of verified DBMS implementation. For example, [1] presents a lightweight DBMS implementation in Coq proof assistant based on B+-trees. It consists of the relational algebra specification, its implementation in terms of proof trees and a set of verified query optimizations. The source code repository [2] contains Agda code of database schema specification and an implementation of relational algebra. The initial design of our model was inspired by this code. In [3] the strong typing for a relational database is provided in Haskell programming language. Haskell type system is less expressive than dependentlytyped lambda calculus. Another example of relational DBMS specification in Coq proof assistant is available in [4]. The authors of [4] state that the specification and not the implementation as the primary objective of the research. Ur programming language [5] is worth mentioning as an example of richly typed programming language supporting the typing of SQL queries. None of the mentioned papers are con-

60 cerned with the representation and static checking of data consistency constraints which is the focus of the present paper. 4 Table schema and single-table constraints The table schema in the present paper is defined analogously to [6]. The finite enumeration AttrType specifies possible attribute type names (for example, integer number, floating-point number, string). The el function matches each attribute type with the corresponding Agda type. Table schema is defined as a list of pairs (attribute name, attribute type). Type of table data Row which depends on schema is defined as a tuple of elements of corresponding Agda types. For example, schema s = (( Primary Key, NATURAL), ( number, STRING)) corresponds to row type Row s = N * String. Constraints on separate table attributes are defined by means of Constraint type which depends on the schema. Each element of Constraint s type defines a single constraint on table rows. The sequence of rows in the table is defined as an inductive type with a constructor for an empty table and a constructor adding a row to the table together with the object proving that every constraint is satisfied for this row in combination with the previous rows. We can say that rows are chained in a table by proofs of the fact that all constraints hold. The every constraint is represented in the model in the terms of tuple with each element corresponding to the proof of the specific constraint conditions. A type representing the proof of constraint condition is constructed by means of checkconstraint function. For the definition of the table structure and checkconstraint function, the dependent types are required. The simplest type of constraint is the constraint on the range of an attribute because the constraint checking involves a single attribute value. Such constraint AttrValue in model is constructed with the following parameters: attribute index in schema n, the proof of attribute presence in schema isattr (static bounds checking), type of predicates over the type of the attribute predicate and the decision procedure for the predicate decision. The constraint is satisfied if the proof of predicate is supplied. Constraints of uniqueness Unique and primary key PrimaryKey are defined just by an index of the corresponding attribute in the schema. These constraints are checked by iterating over each row in the table and verifying the uniqueness of the attribute value of the new row. Primary key constraint check includes also the requirement on a type of the attribute (natural number). In the future, the last constraint will be checked at the time of constraint declaration. The set of single table constraints in the model can be extended to include the compound primary keys and arbitrary predicates on either a single row data or the contents of the whole table. The empty ( null ) values and the corresponding not null constraint is presently not supported, but the support may be added by means of technical changes. A more complicated task is the support of row update and row removal operations. For each change, the whole table must be reconstructed starting from the updated (or removed) row and each arbitrary whole-table predicate must be checked again. But as we mentioned before, the present paper is not concerned with

61 the efficient DBMS implementation. For the static constraint checking, we need to derive from the model the types of constraint checks on each predicate type. 5 Multiple-table constraints As opposed to single-table constraints discussed previously, the foreign key constraint is defined on the set of tables. In the present section, we will call table model the pair of table schema and a list of the constraints over this schema. The set of database tables can be described as a list of pairs (table name, table model). This definition has some degree of similarity to a single table schema from the previous section: in both cases, it consists of the pair of the name (which can be used to declare constraints) and description of data. Following this similarity, the model elements described in the present section implement the constraint mechanism in the same way as in the previous section with a single exception: in this case data is not the tuple of the attribute values but the set of table rows. It is worth noting that such twodimensional approach significantly increases the complexity of definitions from the technical standpoint. The main points of the approach stay unchanged: data is described as a log of insertions, each log entry is chained to the list of previous rows by means of constraint validity proofs. The insertion log entry for database consists of the name of the changing table, the contents of an inserted row, the previous state of the table and proof of successful constraint checks. Foreign key constraints for the given set of tables d consists of the five-tuple (t₁, a₁, t₂, a₂, pk₂), where tᵢ represent table names, aᵢ correspond to attribute indices in the corresponding tables, and pk₂ is the proof of the property that a₂ is the primary key of the table t₂. The type of proofs of the foreign key constraint preservation after the row insertion into table t₁ consists of the proof of the existence of row with the specified primary key in table t₂. We are investigating the possibility of the addition of update and delete operations in case of multiple tables. The main obstacle in this task is the implementation of cut operation which leaves only proofs relevant to the insertion operation for the specified table. 6 Statically checked Insert operation On our way to useful in practice implementation of the proposed approach, we started by implementing the function StaticInsert which inserts the row in the table with static constraint checks. This function presently supports only single-table constraints. Our tests show that for constant data Agda type inference is able to provide the proof of constraints automatically. But for real code examples from ISTINA system, which works with tables with an unknown consistent state, such automated type inference is impossible. The future work includes the development of a method to obtain such proofs from database queries in application source code, as in the second example in Related work section.

62 7 Conclusion The present paper includes the preliminary results of our research. An objective of this research is static (compile-time) checking of the fact that all domain constraints are respected in the source code. Such static checking will be implemented by means of source code translation to Agda proof assistant language in terms of the present model. The proposed model may, therefore, be used to check the code for different programming languages, including domain-specific languages for database interaction. Acknowledgements The first author is supported by RFBR Grant 16-07-01178a. References 1. Toward a Verified Relational Database Management System / G. Malecha, G. Morrisett, A. Shinnar, R. Wisnesky. / Proceeding. POPL '10 Proceedings of the 37 th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages 2010 Pages 237-248 DOI:10.1145/1706299.1706329 2. Agda Relation Algebra source code 2014 URL: https://github.com/sabauma/agdarelation-algebra (online; accessed: 2017-05-13) 3. Strong types for relational databases / A.Silva, J.Visser / Proceeding. Haskell '06 Proceedings of the 2006 ACM SIGPLAN workshop on Haskell 2006 Pages 25-36 DOI:10.1145/1159842.1159846 4. A Coq Formalization of the Relational Data Model / V. Benzaken, E. Contejean, S Dumbrava / Programming Languages and Systems Volume 8410 of the series Lecture Notes in Computer Science 2014 Pages 189-208 DOI:10.1007/978-3-642-54833-8_11 5. Adam Chlipala. Ur: Statically-Typed Metaprogramming with Type-Level Record Computation. Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI'10). June 2010.