Static type safety guarantees for the operators of a relational database querying system. Cédric Lavanchy

Similar documents
Ian Kenny. November 28, 2017

CMP-3440 Database Systems

CPS 216 Spring 2003 Homework #1 Assigned: Wednesday, January 22 Due: Monday, February 10

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Databases 1. Daniel POP

Relational Model: History

Relational Algebra. Lecture 4A Kathleen Durant Northeastern University

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Chapter 2: Intro to Relational Model

Informationslogistik Unit 4: The Relational Algebra

Relational Databases

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

Databases Lectures 1 and 2

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

v Conceptual Design: ER model v Logical Design: ER to relational model v Querying and manipulating data

Chapter 2: Intro to Relational Model

5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

Set theory is a branch of mathematics that studies sets. Sets are a collection of objects.

Relational Database Systems 2 5. Query Processing

Relational Algebra Homework 0 Due Tonight, 5pm! R & G, Chapter 4 Room Swap for Tuesday Discussion Section Homework 1 will be posted Tomorrow

Relational Query Languages. Relational Algebra. Preliminaries. Formal Relational Query Languages. Relational Algebra: 5 Basic Operations

Relational Algebra 1. Week 4

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

Relational Algebra. Chapter 4, Part A. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Relational algebra. Iztok Savnik, FAMNIT. IDB, Algebra

Introduction to Data Management. Lecture #11 (Relational Algebra)

Chapter 3: Introduction to SQL

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

The SQL data-definition language (DDL) allows defining :

Database Management Systems Paper Solution

Relational Model, Relational Algebra, and SQL

Relational Algebra and SQL

Database Technology Introduction. Heiko Paulheim

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until

EECS 647: Introduction to Database Systems

Relational Algebra 1

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

RELATION AND RELATIONAL OPERATIONS

Relational Database: The Relational Data Model; Operations on Database Relations

Relational Algebra 1

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Relational Algebra. Relational Query Languages

Relational Model and Algebra. Introduction to Databases CompSci 316 Fall 2018

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

CS317 File and Database Systems

CSCC43H: Introduction to Databases. Lecture 3

Databases - 4. Other relational operations and DDL. How to write RA expressions for dummies

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

Part I: Structured Data

Informatics 1: Data & Analysis

CompSci 516: Database Systems

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL

3. Relational Data Model 3.5 The Tuple Relational Calculus

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

2.2.2.Relational Database concept

Relational Database Systems 2 5. Query Processing

QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E)

In this Lecture. More SQL Data Definition. Deleting Tables. Creating Tables. ALTERing Columns. Changing Tables. More SQL

Keywords: Database forensics, database reconstruction, inverse relational algebra

CS 377 Database Systems

A Sample Solution to the Midterm Test

Relational Query Languages

Lecture 16. The Relational Model

Introduction to database design

Foundations of Databases

Database Management System. Relational Algebra and operations

SQL - Data Query language

Relational Algebra for sets Introduction to relational algebra for bags

Relational Algebra. Procedural language Six basic operators

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 17: Types and Type-Checking 25 Feb 08

Final Review. Zaki Malik November 20, 2008

Scala : an LLVM-targeted Scala compiler

CS233:HACD Introduction to Relational Databases Notes for Section 4: Relational Algebra, Principles and Part I 1. Cover slide

Relational Query Languages: Relational Algebra. Juliana Freire

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

G Programming Languages - Fall 2012

Chapter 6: Formal Relational Query Languages

Relational Model and Relational Algebra

RELATIONAL DATA MODEL: Relational Algebra

Chapter 3: Introduction to SQL

Chapter 11 Object and Object- Relational Databases

20461: Querying Microsoft SQL Server 2014 Databases

Chapter 3. The Relational Model. Database Systems p. 61/569

Relational Database Systems 1

Web Services for Relational Data Access

Information Systems. Relational Databases. Nikolaj Popov

Chapter 13 Introduction to SQL Programming Techniques

Incomplete Information: Null Values

Databases Relational algebra Lectures for mathematics students

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Chapter 3: SQL. Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Relational Algebra. Relational Query Languages

Chapter 3: SQL. Chapter 3: SQL

Overview. Elements of Programming Languages. Objects. Self-Reference

Overloading, Type Classes, and Algebraic Datatypes

Transcription:

Static type safety guarantees for the operators of a relational database querying system Cédric Lavanchy June 6, 2008

Contents 1 Previous work 2 2 Goal 3 3 Theory bases 4 3.1 Typing a relation........................... 4 3.1.1 Relation types........................ 4 3.1.2 Relational operators..................... 4 3.1.3 Typing rules......................... 7 3.2 Transforming the relational query.................. 7 3.2.1 ISO standard and pre-treatment.............. 7 3.2.2 Transforming rules...................... 8 3.2.3 Building SQL query from relational algebra........ 10 4 Implementation 13 4.1 Giving a static type to a relation.................. 13 4.1.1 Structural types....................... 13 4.1.2 Constructing and typing the relation............ 14 4.2 Executing query and accessing data................ 14 4.3 Special case: order by........................ 15 5 How to use it 16 5.1 Database description......................... 16 5.2 Constructing, executing request and going through results.... 16 6 Future work 18 Bibliography 19 A Incorrect switch of relational operators 20 A.1 π c1,...,c n (R 1 ) R 2 and R 1 π c1,...,c n (R 2 ) or with outer join operators:, or......................... 20 A.2 R 1 σ pred (R 2 )............................ 20 B Implementation problems 22 B.1 Operators not available....................... 22 B.2 Manifest not automatically generated............... 22 B.3 Java proxy limitation......................... 23 1

Chapter 1 Previous work Until now, there were two ways to write and execute SQL queries in Scala. The first solution is to use jdbc classes (java.sql.*) and write the query as a String. The result is returned as a ResultSet. The second possibility is the DBC2 library. That library uses Java classes but provides a way to write SQL embedded in Scala. We can write SQL queries with Scala functions nearly as if we were writing our queries in the DBMS 1 (example in table 1.1). select(*) from( TABLE1, TABLE2) where col1 == 1 and col2 == ABC Table 1.1: Example of how to write a request with DBC2 Unfortunately with both solutions presented above, requests do not a have static type. The SQL request is represented as a String. It has a relational type in the database. That type is accessible by jdbc but jdbc does not apply any verification on it. The only type we have is the one provided by the database itself as metadata of the returned ResultSet. In consequence, we are not able to ensure that the result returned by the database management system has the correct type (by correct I mean the expected one). In fact, we could try to access a field that does not exist in our relation. With jdbc, the execution of the query on the database returns a ResultSet that gives us some possibilities to access column values of a row. To access a field of that ResultSet, we need to call the method corresponding to the type that programmer expects (getdouble(column name),...). That method is correct at compilation time but may be incorrect at runtime. The columns we want to access may not exist in the result or may have different types (Boolean instead of Double for example). That error will be detected at runtime and not during compilation. By default, a ResultSet can only move forward through its rows. It possible to produce ResultSet that is scrollable. But ResultSets are not compatible with Scala sequences. We can t handle them using for-comprehensions. Unlike jdbc, DBC2 results are compatible with Scala sequences which simplifies the access of its elements. 1 Database mangement system 2

Chapter 2 Goal The aim of this project is to provide a type-safe access to databases in Scala. In previous versions of DBC, the type safety does not offer any strong garanties and errors are caught only during execution. That safety is dynamic and not static. Absolutely no verifications are done during compilation. We can access fields that do not exist without any compilation errors. We have no guarantee that the result returned by the database has the same type than the programmer expect. The goal of this project is to improve that safety. The first step is to give a static type to the query: the type that corresponds to the query written by the programmer and which should be computed by the compiler. With that static type and the metadata returned by the database management system, we can check whether they are equal or not. This check is done during execution and validates the type found by the compiler and by the DBMS. After that we want to transform the result return by the DBMS into a Set[T] where T is the static type found by the compiler. That way, each row of the result has the type T and we can access its fields (database columns) by their names by calling a Scala method. If we access a field f that does not exist in the (computed) type of the relation, the compiler will say that type T has no field f. That ensures that we do not access non-existing columns in the result of our query. 3

Chapter 3 Theory bases In this chapter I will define theory bases required to statically type a relational request. First, we will define the notion of type (section 3.1.1) and the operators present in DBC3 (section 3.1.2). Then the typing rules for those operators will be presented in section 3.1.3. This section presents how to give a static type to the request but we will need more work (section 3.2) to be able to execute it on databases. Relational algebra is used as defined by Codd [1] described in Date s book [2]. 3.1 Typing a relation 3.1.1 Relation types A tuple is a set of ordered triples of the form {A i : T i = v i }, where A i is an attribute name, T i is a type name, and v i is the value of type T i. The heading of such a tuple is its set of attributes (here: attributes are the ordered pair {A i : T i } uniquely identified by their names A i ). The type of a tuple is determined by its heading. All tuples in a relation have the same heading, therefore the same type. The relation heading is the tuples heading. The relation type has same attributes (names and types) as its heading. So a relation type is defined as {c 1 : T 1,..., c n : T n } where c 1,..., c n are the names of the attributes and T 1,..., T n their types. 3.1.2 Relational operators In Relational algebra, there are two groups of primitives Traditional set operators union, intersection, difference and Cartesian product. Special relational operators restrict (selection), projection, join and divide. From those eight operators, we can build quantity of others. In this project we will define other operators that represent some SQL operations like order 4

by, or outer join (left, right, full). Two operators will not be available: rename and divide. An explanation about why we don t use those two operators (divide and rename) is given in appendix B.1. The problem with order by clause is that it changes the type of the resulting relation. All other relational operators are not ordered (they behave like sets). They operate on other relations that are also not ordered. But the order by operator gives an order to the relation (the relation has the properties of a list and not of a set). That means that it transforms the relation it receives into an ordered list of tuples. The other operators are only defined on unordered relations. So it must be put at the most outside position of the request. For that reason, no typing rules are given for the order by operator. In fact it will not be considered as an operator in the way the others are. But we will implement it (section 4.3). Operators σ pred (R) : Selection of tuples from relation R where predicate pred returns true. π c1,...,c n (R) : Projection of relation R only keeps columns c 1 to c n. R 1 R 2 : Cartesian product between R 1 and R 2. In his book, C.J. Date assumes that the use of Cartesian product is done on relations with distinct attributes. We will need those constraints in our implementation. R 1 R 2 : Natural join (equivalent to inner join 1 with adapted on-clause) between R 1 and R 2. If R 1 and R 2 don t have common attributes, this operation corresponds to the Cartesian product. R 1 R 2 : Left outer join between R 1 and R 2. The resulting relation contains all tuples of left relation even if join condition doesn t find any matching in right relation. If no matching is found the row will contains NULL in right relation columns. R 1 R 2 : Right outer join between R 1 and R 2. Same explanation as for left outer join but relations are swapped. The result will contains all tuples of right relation. R 1 R 2 : Full outer join between R 1 and R 2. This operator combines the left and right outer join operators. It contains all tuples of its left and right relations. R 1 R 2 : Union of R 1 and R 2. It contains all tuples from R 1 and from R 2. R 1 R 2 : Intersection of R 1 and R 2. It contains all tuples that are in both R 1 and R 2. R 1 R 2 : Difference between R 1 and R 2. It contains all tuples that are in R 1 and not in R 2. 5

R 1 a b c 1 2 3 4 5 6 7 8 9 R 2 b c d 2 3 10 5 5 11 7 9 12 R 3 a b c 5 4 3 6 2 1 7 8 9 R 4 e 5 6 Examples of operators results σ a<2 (R 1 ) = a b c 1 2 3 π a,b (R 1 ) = R 1 R 4 = a b 1 2 4 5 7 8 a b c e 1 2 3 5 1 2 3 6 4 5 6 5 4 5 6 6 7 8 9 5 7 8 9 6 R 1 R 2 = a b c d 1 2 3 10 R 1 R 2 = R 1 R 2 = R 1 R 2 = R 1 R 3 = a b c d 1 2 3 10 4 5 6 NULL 7 8 9 NULL a b c d 1 2 3 10 NULL 5 5 11 NULL 7 9 12 a b c d 1 2 3 10 4 5 6 NULL 7 8 9 NULL NULL 5 5 11 NULL 7 9 12 a b c 1 2 3 4 5 6 7 8 9 5 4 3 6 2 1 1 SQL inner join: http://en.wikipedia.org/wiki/join (SQL)#Types of inner joins 6

R 1 R 3 = a b c 7 8 9 R 1 R 3 = a b c 1 2 3 4 5 6 3.1.3 Typing rules Notation Let T 1 be the relation type {c 1 : T 11,..., c n : T 1n } and T 2 the relation type {d 1 : T 21,..., d n : T 2n }. Then T 1 with T 2 is the union of columns from T 1 and T 2 i.e. {c 1 : T 11,..., c n : T 1n, d 1 : T 21,..., d n : T 2n }. If there are common attributes (name and type) between T 1 and T 2, then T 1 with T 2 will contain only one instance of that attributes. Rules R : T σ pred (R) : T R : T θ pred (R) : T R : T φ c1,...,c n (R) : T R : T P = {c 1 : T 1,..., c n : T n } T <: P π c1: T 1,...,c n: T n (R) : P R 1 : T 1 R 2 : T 2 R 1 R 2 : T 1 with T 2 R 1 : T 1 R 2 : T 2 R 1 R 2 : T 1 with T 2 R 1 : T 1 R 2 : T 2 R 1 R 2 : T 1 with T 2 R 1 : T 1 R 2 : T 2 R 1 R 2 : T 1 with T 2 R 1 : T 1 R 2 : T 2 R 1 : T R 2 : T R 1 R 2 : T 1 with T 2 R 1 R 2 : T R 1 : T R 2 : T R 1 R 2 : T R 1 : T R 2 : T R 1 R 2 : T 3.2 Transforming the relational query The transformation to SQL is done in two phases. Firstly we need to transform the relational query by switching its operators. When that transformation is done we have a normalized query and we can translate it into a SQL query. 3.2.1 ISO standard and pre-treatment The ISO standard defines a SQL query as a box. The box is an ordered mix of relational operators. Only operators parameters are missing. An illustration of this is shown in Figure 3.1. The corresponding relational request is: π (σ ( )) 7

Figure 3.1: SQL query represented as a box to fill where are operators parameters. We can t write another complete select query into a. Then, given a relational request, we need to transform it such that it corresponds to above model. To do that we defined a hierarchy of operators presented in Figure 3.2. If the request matches that hierarchy then it can be translated into an ISO standard SQL query. That figure explains that,,,, and Table operators can not contain any other operators into their operands. And same for σ that can not contain,, and π in its operand. Below is an example of a case where a relational request doesn t match the hierarchy and why its translation into SQL is not correct. Figure 3.2: Hierarchy for relational request Example Consider the following request that doesn t match the hierarchy: σ col1>0(r 1 ) R 2 The corresponding SQL query would be: s e l e c t from ( s e l e c t from R1 where c o l 1 > 0) n a t u r a l j o i n R2 that is incorrect according to the ISO standard because we have a select statement into the from clause. 3.2.2 Transforming rules In some cases we are not able to switch two relational operators because if they are switched, the result of the query would be changed. Those cases are listed in appendix A with an example that shows the difference of results. 8

Notation UID(R 1, R 2 ) is any of {R 1 R 2, R 1 R 2, R 1 R 2 }. R[a 1,..., a n ] = Table R with field a 1,..., a n. Extract σ pred We can t extract σ pred from the right (resp. left) part of a left (resp. right) outer join and from any parts of a full outer join. A more complete explanation and examples are presented in appendix A.2. σ pred (R 1 ) R 2 σ pred (R 1 R 2 ) R 1 σ pred (R 2 ) σ pred (R 1 R 2 ) σ pred (R 1 ) R 2 σ pred (R 1 R 2 ) R 1 σ pred (R 2 ) σ pred (R 1 R 2 ) σ pred (R 1 ) R 2 σ pred (R 1 R 2 ) R 1 σ pred (R 2 ) σ pred (R 1 R 2 ) Extract, and Rel = UID(R 1, R 2 ) σ pred (Rel) UID(σ pred (R 1 ), σ pred (R 2 )) Rel = UID(R 1, R 2 ) π a1,...,a n (Rel) Rel(π a1,...,a n (R 1 ), π a1,...,a n (R 2 )) Rel 1 = UID(R 1, R 2 ) Rel 1 Rel 2 UID(R 1 Rel 2, R 2 Rel 2 ) Rel 2 = UID(R 1, R 2 ) Rel 1 Rel 2 UID(Rel 1 R 1, Rel 1 R 2 ) Rel 1 = UID(R 1, R 2 ) Rel 1 Rel 2 UID(R 1 Rel 2, R 2 Rel 2 ) Rel 2 = UID(R 1, R 2 ) Rel 1 Rel 2 UID(Rel 1 R 1, Rel 1 R 2 ) Rel 1 = UID(R 1, R 2 ) Rel 1 Rel 2 UID(R 1 Rel 2, R 2 Rel 2 ) Rel 2 = UID(R 1, R 2 ) Rel 1 Rel 2 UID(Rel 1 R 1, Rel 1 R 2 ) 9

Rel 1 = UID(R 1, R 2 ) Rel 1 Rel 2 UID(R 1 Rel 2, R 2 Rel 2 ) Rel 2 = UID(R 1, R 2 ) Rel 1 Rel 2 UID(Rel 1 R 1, Rel 1 R 2 ) Rel 1 = UID(R 1, R 2 ) Rel 1 Rel 2 UID(R 1 Rel 2, R 2 Rel 2 ) Rel 2 = UID(R 1, R 2 ) Rel 1 Rel 2 UID(Rel 1 R 1, Rel 1 R 2 ) Extract π a1,...,a n The two rules for R 1 R 2 where R 1 or R 2 is a projection are special cases. The problem is that the type of the resulting relation when switching the two relational operators is not the same as the type before the switch. The solution is to adapt the columns of projection to include columns of R 2 (resp. R 1 ). With that change the switch of operators is possible and (in that case) the results of the two requests are identical. It s not possible to apply the same solution for the same structure with,, and because the results are not the same after switching operators than before (see appendix A) In rule for R 1 [b 1,..., b k ] π a1,...,a n, the projection results in π a1,...,a n,b 1,...,b k and not in π b1,...,b k,a 1,...,a n because in Relational algebra, columns order is irrelevant. σ pred (π a1,...,a n (R)) π a1,...,a n (σ pred (R)) π a1,...,a n (R 1 ) R 2 [b 1,..., b k ] π a1,...,a n,b 1,...,b k (R 1 R 2 ) R 1 [b 1,..., b k ] π a1,...,a n (R 2 ) π a1,...,a n,b 1,...,b k (R 1 R 2 ) 3.2.3 Building SQL query from relational algebra Now we have a well typed normalized query expressed in relational algebra. But the goal is to execute it on a database that does not support requests in such an algebra. We need to transform it into the corresponding SQL query. In order to do this, I define some rules (presented in section 3.2.3) that show how to transform the relational request into an SQL one. Transformation to SQL Rules take a relational expression and return a SQL request as an abstract syntax tree. That tree is like the SQL box presented in Figure 3.1. The buildsql method is the one defined by these rules. 10

Rules for R 1 R 2 s 1 = buildsql(r 1 ) s 2 = buildsql(r 2 ) R 1 R2 Union(s 1, s 2 ) Rules for R 1 R 2 s 1 = buildsql(r 1 ) s 2 = buildsql(r 2 ) R 1 R2 Intersect(s 1, s 2 ) Rules for R 1 R 2 s 1 = buildsql(r 1 ) s 2 = buildsql(r 2 ) R 1 R 2 Except(s 1, s 2 ) Rules for σ pred Select(cols, f rom, ) = buildsql(rel) σ pred (Rel) Select(cols, from, pred) Select(cols, from, wherecond) = buildsql(σ p2 (R)) σ p1 (σ p2 (R)) Select(cols, from, wherecond and p 1 )) Rules for π a1,...,a n Select(, f rom, wherecond) = buildsql(rel) π columns (Rel) Select(columns, from, wherecond) Select(cols, from, wherecond) = buildsql(π colsb (R)) π colsa (π colsb (R)) Select(colsA ::: cols, from, wherecond) Rules for R 1 R 2 Select(cols, from 1, ) = buildsql(r 1 ) Select(cols, from 2, ) = buildsql(r 2 ) R 1 R 2 Select(cols, from 1 ::: from 2, ) Rules for R 1 R 2 Select(cols, from 1, ) = buildsql(r 1 ) Select(cols, from 2, ) = buildsql(r 2 ) R 1 R 2 Select(cols, Join( inner, from 1, from 2 ), ) Rules for R 1 R 2 Select(cols, from 1, ) = buildsql(r 1 ) Select(cols, from 2, ) = buildsql(r 2 ) R 1 R 2 Select(cols, Join( left outer, from 1, from 2 ), ) 11

Rules for R 1 R 2 Select(cols, from 1, ) = buildsql(r 1 ) Select(cols, from 2, ) = buildsql(r 2 ) R 1 R 2 Select(cols, Join( right outer, from 1, from 2 ), ) Rules for R 1 R 2 Select(cols, from 1, ) = buildsql(r 1 ) Select(cols, from 2, ) = buildsql(r 2 ) R 1 R 2 Select(cols, Join( full outer, from 1, from 2 ), ) Rules for Table(name) That operator is described in subsection 4.1.2 T able(name) Select(, name, ) 12

Chapter 4 Implementation 4.1 Giving a static type to a relation 4.1.1 Structural types Since Scala 2.6, structural types [4] are available. They are defined as shown in Figure 4.1. {def c1: T1 ; def c2: T2 } Figure 4.1: Example of definition of a structural type They have some advantages compared to traits and classes. Firstly their definitions are short and easy to write. Secondly, they look very similar to relation types defined by C.J. Date (described at section 3.1.1). And last, they use structural inheritance. That means that if a structural type has the same fields as another plus some others, that first type is a subtype of the second. In relational algebra, this property is very useful if not indispensable. If we don t use them but use traits instead, each time we want to project a relation on a subset of its columns, we need to define explicitly the new type and its inheritance link with the previous one to satisfy the projection constraint. The following example shows why it s much simpler to use structural types than traits in our library. type T1 = { def c1 : Int } type T2 = { def c1 : Int ; def c2 : Double} Because they are structural types, the following constraint is satisfied even if no explicit inheritance links are present: T1 >: T2 (>: means: is a supertype of ). If we want to write it with traits we would do the following. t r a i t T1 { def c1 : Int } t r a i t T2 extends T1 { def c2 : Double } 13

4.1.2 Constructing and typing the relation A relational query is represented using an abstract syntax tree. That tree is composed of the case classes that represent its nodes. The only way to construct a relational query is to use case classes constructor. All classes inherit from a super class that has a type parameter. That type corresponds to the type of the relational query at that point of the tree. It is computed using typing rules presented in section 3.1.3, i.e. if we have R 1 : RA[T 1 ] and R 2 : RA[T 2 ] then R 1 R 2 will have type RA[T 1 with T 2 ]. In that case, the with correspond to mixin [3] in Scala. The problem is that our type system assumes that base relations have a type when it wants to type the new relation. But there is a problem when an operator acts directly on a table. In Relational algebra there is no rule to infer the type of a table, instead tables have a pre-determined type depending on their content. For this reason we need a Scala representation of such types as relational query. To do that we introduce an operator Table[T](name). It encapsulates a basic type into a relation. Then that relation has type T and can be used as operand in other relations. All the tables of the database that is used in queries must be represented by that way. Later we will need a representation of the type during execution of the program. Unfortunately the compiler erases types during compilation and we have no access to that type anymore at execution time. Fortunately the class Manifest contains exactly the information we need. It contains a sequence of members (val, var and def) and the erasure type (Predef.Class[U]). With that we will be able to access the representation of the type. One important point is that it is not the programmer s job to create those manifests, but it is compiler job. So we add in each node of the relational tree a field that contains the Manifest. To avoid the user to create and pass it during query writing, it is defined as implicit in methods and constructors arguments. So, if the compiler can construct such an instance, it will pass it to the method. 4.2 Executing query and accessing data The execution of the query (as a string) on the database is done with jdbc classes (java.sql.*). It returns a ResultSet. We want to be able to access all columns of all tuples in any order, i.e. the last and then the second, the third, the fourteenth etc.. Depending of the ResultSet we receive, it is not possible to do that. Moreover the columns are not accessible directly by their names. We must use the method getdouble, getobject, and so on with as argument the name of the column. Instead of doing while ( myresultset. next ( ) ) { myresultset. getdouble ( c o l 1 ) } we want to do for ( val t < r e s u l t ) { t. c o l 1 } 14

where result is a Set[T]. The advantage of this solution is that if col1 is not a member of T, the compiler will detect that error. The problem is that T is a structural type and has no implementation. Because of this we can t instantiate it. The solution is to use Java reflection. We will construct a proxy that will act as if it was an instance of T. When a method is called on the instance of T, it is the proxy that will receive it and handle it. The proxy contains an Array[AnyRef] and a mapping String => Int. The array corresponds to a row in the ResultSet and will contains the columns values of that row. Because the array is accessible by index we need a way to transform a column name into the corresponding index. That is what the mapping does. When the name is converted into an index, it can return the corresponding value. 4.3 Special case: order by The problem with order by clause is that it changes the type of the resulting relation as explained in section 3.1.2. Because of that, we must insert the order by clause at the outside-most position of the relational (and SQL) query. The first idea we tried is to implement it as a relational operator and bring it outside of the other operators. That solution was not very practical because it changes the resulting relation type (section 3.1.2). The only way to represent it as a relational operator and having a way to transform the query into SQL is to let the programmer put it directly at the outside-most position. But this is not a safe way because programmers can make mistakes. So we found another solution that is to not implement it as a relational operator but to insert it when the user executes his query on the database. To do that we offer two execute methods to the user. One without order by that only takes relational requests and database name as arguments and the other that takes one more argument: the list of columns on which the ordering is done. 15

Chapter 5 How to use it That chapter explains how to use the DBC3 library. In the following example we will show how it works, using the database presented in section 5.1. 5.1 Database description The database is composed of two tables: Student and Person. Person pname: String age: Int married: Int Dupont 18 0 Durant 22 0 DeLapatefeuilletee 57 1 Lenoir 23 1 Student sname: String grade: Int semester: Int Dupont 3 4 Durant 6 8 Lenoir 4 6 We want to execute the following relational request on the previous database: (σ sname= Moi (Student) σ grade>4 (Student)) P erson 5.2 Constructing, executing request and going through results Relevant classes are in package scala.dbc3 but only Relational object must be imported. To use that library on our database we first need to describe its structure. def student = t a b l e [ { def sname : S t r i n g ; def grade : Int ; def semester : Int } ] ( Student ) 16

def person = t a b l e [ { def pname : S t r i n g ; def age : Int ; def married : Int } ] ( Person ) Now the needed tables of the database have a Scala representation and we can write the query. val r e q u e s t = Product ( Union ( S e l e c t ( student, sname= Moi ), S e l e c t ( student, grade >4 ) ), Person ) The compiler will find the following type for the request above. { def sname: String ; def grade: Int ; def semester: Int ; def pname: String ; def age: Int ; def married: Int } Now the query needs to be normalized as explained in section 3.2.1. transformation will produce the following query: Union ( S e l e c t ( Product ( student, person ), sname = Moi ), S e l e c t ( Product ( student, person ), grade > 4 ) ) That new query can be transformed into the SQL query: ( s e l e c t from student, person where sname = Moi ) union ( s e l e c t from student, person where grade > 4) That Those two transformations are done in execute method, the programmer does not need to do them. Now the query can be executed on the database using method: def execute[t](request: RA [T]): Set [T] It will return a Set of tuples. That tuples are instances of T. Now if we want to print age value of each tuple, we can do: for ( val t < execute ( r e q u e s t ) ) p r i n t l n ( t. age ) The result of the execution of the previous code is: 57 18 22 23 17

Chapter 6 Future work The main drawback of DBC3 library is that it does not allow us to write SQL queries but only relational ones. Hence a good improvement would be to adapt DBC2 (SQL embedded in Scala) to be statically typed. One way to do that could be to transform the SQL query into relational query that will be typed as explained in that report. That is not easy (and maybe not possible) to do it with DBC2 because we would have to transform strings or symbols into Scala types. Then to do it, we need to transform DBC2 to type the SQL requests during its construction as we did for relational ones. That implies modifying the syntax of DBC2. It s a problem for compatibility reason with old programs but has the advantage that a static type is computed by the compiler. Another solution is for the user to write his requests with DBC2 and at the end give the static type he expects. If he does not care about results types, he should use existential types; otherwise it specifies expected type (example below). Then the types are checked during execution (dynamically). val r1 : S e l e c t [T ] forsome { type T} = s e l e c t ( )... val r2 : S e l e c t [ { def c1 : Int, def c2 : S t r i n g } ] = s e l e c t... 18

Bibliography [1] E. F. Codd. Relational completeness of data base sublanguages. In: R. Rustin (ed.): Database Systems: 65-98, Prentice Hall and IBM Research Report RJ 987, San Jose, California, 1972. [2] C. J. Date. An Introduction to Database Systems, 6th Edition. Addison- Wesley, 1995. [3] Martin Odersky and Matthias Zenger. Scalable component abstractions. In OOPSLA 05: Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, pages 41 57, New York, NY, USA, 2005. ACM. [4] Benjamin C. Pierce. Types and programming languages. MIT Press, Cambridge, MA, USA, 2002. 19

Appendix A Incorrect switch of relational operators A.1 π c1,...,c n (R 1 ) R 2 and R 1 π c1,...,c n (R 2 ) or with outer join operators:, or In that example we adapt the type of the resulting projection as described in subsection 3.2.2 in order to keep the final type. If we do not do that we can easily see that π a,b (R 1 ) R 2 : {a : Int, b : Int, c : Int, d : Int} and π a,b (R 1 R 2 ) : {a : Int, b : Int} do not have the same type and then the resulting relation is different. But even if we do the transformation of projection s columns the results are different before and after the switch. R 1 a b c 1 2 3 4 5 6 7 8 9 R 2 b c d 2 3 10 5 5 11 7 9 12 π a,b (R 1 ) R 2 a b c d 1 2 3 10 4 5 5 11 π a,b,c,d (R 1 R 2 ) a b c d 1 2 3 10. And it s similar for R 1 π c1,...,c n (R 2 ) and outer join operators:, and A.2 R 1 σ pred (R 2 ) Let R 1 have type {a : Int, b : Int}, R 2 type {b : Int, c : Int} and pred (predicate of σ) be c <> NULL. And it s similar for σ pred (R 1 ) R 2, R 3 R 4 if either R 3 or R 4 contains a σ pred. That switch is correct if the predicate of selection doesn t act on columns 20

R 1 a b 1 2 3 4 5 6 R 2 b c 2 3 4 5 7 9 R 1 σ c<>null (R 2 ) a b c 1 2 3 3 4 5 5 6 NULL σ c<>null (R 1 R 2 ) a b c 1 2 3 3 4 5 that can have NULL values after the outer join. Even if it uses those columns, the switch could be correct (does not use NULL in the predicate). But to implement that, we must find a way to analyze the predicate and detect these special cases. 21

Appendix B Implementation problems B.1 Operators not available Divide The divide operator can be written with other primitives. But given the divide operation, it s hard to transform it using primitives. That is because it needs many transformations to go from the divide operator to a combination of primitives. Moreover it has no equivalence in SQL. So that operator will not be available in the library. If the programmer wants to use it, he should write directly with primitives. Rename The problem with the rename operator is that we are not able to have a static type safe implementation. Let P be the type before renaming and type T after renaming. Then the required constraints are P and T must have same number of members. Members of P and T must have the same types. P and T must have only one member that has a different name. Such constraints don t exist in Scala compiler. But we can check them dynamically (during execution). For that reason I decided not to use the rename operator. The usage of rename can be avoided in most of cases by originally giving adequate names to the database tables and columns. One way of implementing that operator would be the use of annotations. When the structural type of the relation is defined, we can add an annotation in front of the renamed field that gives the original name 1. With that solution we ensure that the new type has the same number of members as the previous one and the same members types but we can t be sure that the previous name is the correct one. B.2 Manifest not automatically generated Pre-treatment During the pre-treatment phase, in most of cases, the manifest needed for rebuilding the request (after a switch) is already available in 1 the name inside the database 22

the previous request. But in some cases (presented below), firstly all required manifests are not available in previous request and secondly those manifests are not currently generated by the compiler. Then those switching rules can not be implemented before the compiler generates the appropriate manifests. π a1,...,a n (R 1 ) R 2 [b 1,..., b k ] π a1,...,a n,b 1,...,b k (R 1 R 2 ) R 1 [b 1,..., b k ] π a1,...,a n (R 2 ) π a1,...,a n,b 1,...,b k (R 1 R 2 ) Short-syntax It is not possible to implement short-syntax for building relational queries because for the same reason as for the pre-treatment special cases. B.3 Java proxy limitation Scala proxies implementation is not complete. So I had to use Java proxies in section 4.2. Those proxies work with Java interfaces and classes and consequently with Scala classes and traits. But they do not work with structural types. That s a problem in the DBC3 library because describing the database with traits is not practical. If we want to project a relation of type P on another, the new type T must satisfy the constraint that T >: P. As explained in section 4.1, with structural types, it suffices that type T has less members than P but T s members must be members of P. With traits it s not that simple. Inheritance links must be declared explicitly in type s definitions. Then we need to define P as a trait that extends T. But if we have to do that with all possible types that could appear into requests, that library would become impossible to use. That s a current limitation of the Scala compiler. 23