RAQUEL s Relational Operators

Similar documents
The Syntax of Relational Operators

Generalising Relational Algebra Set Operators

Illustrative Example of Logical Database Creation

Illustrative Example of Logical Database Creation

Examples of Relational Value Assignments

The Logical Design of the Tokeniser

Stored Relvars 18 th April 2013 (30 th March 2001) David Livingstone. Stored Relvars

NULLs & Outer Joins. Objectives of the Lecture :

Illustrative Example of Physical Schema Usage

CMP-3440 Database Systems

CS233:HACD Introduction to Relational Databases Notes for Section 4: Relational Algebra, Principles and Part I 1. Cover slide

Join (SQL) - Wikipedia, the free encyclopedia

Relational Model: History

Relational Model, Relational Algebra, and SQL

Relational Database: The Relational Data Model; Operations on Database Relations

A.1 Numbers, Sets and Arithmetic

RAQUEL Parser Code Design

Relational Algebra. Procedural language Six basic operators

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

Ian Kenny. November 28, 2017

1. Data Definition Language.

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University

SQL: Data Manipulation Language. csc343, Introduction to Databases Diane Horton Winter 2017

Further GroupBy & Extend Operations

Relational Databases

Chapter 4: SQL. Basic Structure

M359: Relational Databases: theory and practice Notes to accompany slides on Relational Algebra

Chapter 4 RM PRESCRIPTIONS

Creating SQL Tables and using Data Types

9. Elementary Algebraic and Transcendental Scalar Functions

Sets MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Sets Fall / 31

Functional Programming Languages (FPL)

Slides for Faculty Oxford University Press All rights reserved.

Chapter 6 The Relational Algebra and Calculus

Advanced Databases: Parallel Databases A.Poulovassilis

Review -Chapter 4. Review -Chapter 5

A Small Permutation Group Engine by: Gregory Kip. COMS W4115 Programming Languages and Translators Prof. Stephen Edwards

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure.

CS 317/387. A Relation is a Table. Schemas. Towards SQL - Relational Algebra. name manf Winterbrew Pete s Bud Lite Anheuser-Busch Beers

JAVASCRIPT AND JQUERY: AN INTRODUCTION (WEB PROGRAMMING, X452.1)

The SQL data-definition language (DDL) allows defining :

The Relational Model and Relational Algebra

8) A top-to-bottom relationship among the items in a database is established by a

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Missing Information. We ve assumed every tuple has a value for every attribute. But sometimes information is missing. Two common scenarios:

CPS122 Lecture: From Python to Java last revised January 4, Objectives:

SQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji. Winter 2018

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

THE RELATIONAL DATABASE MODEL

Chapter 3: Introduction to SQL

Oracle Database 10g: Introduction to SQL

DOWNLOAD PDF INSIDE RELATIONAL DATABASES

Informatics 1: Data & Analysis

Relational Algebra. B term 2004: lecture 10, 11

1 Lexical Considerations

II. Structured Query Language (SQL)

Chapter 3: The Relational Database Model

SQL: Data Querying. B0B36DBS, BD6B36DBS: Database Systems. h p:// Lecture 4

Output: For each size provided as input, a figure of that size is to appear, followed by a blank line.

Chapter 12: Query Processing

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL

Silberschatz, Korth and Sudarshan See for conditions on re-use

2.2 Set Operations. Introduction DEFINITION 1. EXAMPLE 1 The union of the sets {1, 3, 5} and {1, 2, 3} is the set {1, 2, 3, 5}; that is, EXAMPLE 2

Chapter 6: Formal Relational Query Languages

SQL Data Querying and Views

6. Relational Algebra (Part II)

JAVASCRIPT AND JQUERY: AN INTRODUCTION (WEB PROGRAMMING, X452.1)

Data about data is database Select correct option: True False Partially True None of the Above

WHAT IS SQL. Database query language, which can also: Define structure of data Modify data Specify security constraints

Introduction to Sets and Logic (MATH 1190)

x = 3 * y + 1; // x becomes 3 * y + 1 a = b = 0; // multiple assignment: a and b both get the value 0

SQL functions fit into two broad categories: Data definition language Data manipulation language

This book is licensed under a Creative Commons Attribution 3.0 License

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

1 Scope. 2 Normative References. WORKING DRAFT ISO/IEC ISO/IEC 11404:xxxx WD3

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Data Definition

Relational Algebra and SQL

A Short Summary of Javali

II. Structured Query Language (SQL)

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 7 Introduction to Structured Query Language (SQL)

Chapter 3: Relational Model

Relational Model History. COSC 304 Introduction to Database Systems. Relational Model and Algebra. Relational Model Definitions.

RELATIONAL DATA MODEL: Relational Algebra

Functional programming with Common Lisp

RELATIONAL ALGEBRA II. CS121: Relational Databases Fall 2017 Lecture 3

CS 582 Database Management Systems II

Relational Algebra. Relational Algebra. 7/4/2017 Md. Golam Moazzam, Dept. of CSE, JU

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Part I: Structured Data

A Summary of Out of the Tar Pit

A Simplified Abstract Syntax for the Dataflow Algebra. A. J. Cowling

Chapter 3B Objectives. Relational Set Operators. Relational Set Operators. Relational Algebra Operations

CPS122 Lecture: From Python to Java

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Chapter 3: Introduction to SQL

Chapter 3: SQL. Chapter 3: SQL

Chapter 5: Other Relational Languages

Transcription:

Contents RAQUEL s Relational Operators Introduction 2 General Principles 2 Operator Parameters 3 Ordinary & High-Level Operators 3 Operator Valency 4 Default Tuples 5 The Relational Algebra Operators in RAQUEL 6 The Syntax of Operators 8 The Current Relational Algebra Operators in RAQUEL Ordinary Operators : Monadic Meta 9 Nest 9 Project 9 Rename 10 Unnest 10 Ordinary Operators : Dyadic Difference 10 Intersect 10 Natural Join 11 (Inner, Semi, & Outer) Set Comparators 12 Union 13 High-Level Operators : Monadic Extend 13 GroupBy 13 Restrict 15 High-Level Operators : Dyadic Divide 15 (Inner, Semi, & Outer) Generalised Join 17 (Inner, Semi, & Outer) Operators in Development Compose 19 Dist 20 Page 1 of 20

Introduction This document lists and briefly describes all the relational algebra operators in RAQUEL, with more detail given of less familiar operators. A fully comprehensive description of them all, including mathematical specifications, is given in the document The Semantics of the Relational Algebra Operators. To support the operator descriptions, the document starts by describing the general ideas on which RAQUEL relational algebra is based, illustrating them with examples of relational operators. Note that RAQUEL applies the same algebraic principles to operators at all 3 levels of abstraction - relational, scalar, and schema. General Principles The first principle is that of Closure under the Algebra. This means that when an operator is applied to its operand(s), it produces a result that may itself be used as the operand of another operator. This circularity makes it possible to use the operators to create arbitrarily complex and powerful expressions, since there is no (finite) limit to the number of times this may be done, i.e. no limit to the number of times closure may occur. Closure is intrinsically simple. It is also the same principle as is used in arithmetic, which everyone learns it in junior school and has been practising ever since. Thus algebra is easy to learn and use, as well as being powerful and flexible. The second principle is that operators have no side effects. They use their operand(s) to return a result, but have no effect on their operand(s) or on any other variable that may be in the same scope at the same level of abstraction as the operator. They are mathematical functions. The third principle is that an operator takes the value(s) of its operand(s) and returns a value. Thus if an operand is a variable, the current value of that variable is used by the operator. Of course if the operand is a literal value, then that value is used by the operator. If the operand is an operator expression, then the operand value is that of the expression when evaluated. The fourth principle is that although the value of an operator s result may have a different type to the type(s) of its operands, the result and the operands must all be at the same level of abstraction. Thus operands and results must all have relational types at the relational level; likewise scalar types at the scalar level and schema types at the schema level. Operator expressions are used by assignments (at the same level of abstraction) to carry out some action on the DB. A RAQUEL statement is simply an assignment expression. Page 2 of 20

Operator Parameters Some operators require parameters. A parameter is an additional input to the operator which is not an operand. Parameters are necessary when the operator per se is insufficient to define precisely the operation that it is to carry out on its operand(s). For example, Restrict needs a parameter because the operator itself only defines in general terms how tuples in its operand are to be selected to appear in its result; its parameter completes its definition by specifying a truth-valued expression to be applied to every individual operand tuple; tuples where application of the parameter returns the value True appear in the result, and the remainder do not. By comparison Union needs no parameter because the operator defines precisely how the values of its 2 operands are to be combined to yield its result. Thus the purpose of a parameter is to refine the meaning of an operator, to make its general meeting sufficiently specific that an invocation of the operator can be executed. Where appropriate, an operator s parameters may be partitioned into 2 (or more) groupings. This is syntactic sugar to make the parameters more intelligible when read. For example, the GroupBy operator s parameters are partitioned into 2 groupings; the first specifies how the operand is to be split or grouped into a set of relational values (= relvalues), and the other how aggregation is to be applied to each relvalue. For ease of reading, partitioned parameters appear in sequence. Ordinary and High-Level Operators Parameters fall into 4 categories : 1. A set of names. For example, the Project operator takes a set of attribute names as a parameter. 2. A set of names with an assignment to another name. For example, the Nest operator takes as a parameter a set of attribute names that are assigned to another attribute name (designating the attribute that is to have relational values whose attributes are the set of attributes). 3. An algebra expression. For example, the Restrict operator takes a truthvalued expression as a parameter. 4. An algebra expression with an assignment to a name. For example, the GroupBy operator takes as a parameter an aggregate expression that is assigned to a name (designating the attribute that is to have the value of the aggregate expression). Categories 3 and 4 can be considered special instances of a more general case. The reason for this is as follows. RAQUEL uses assignment algebra as well as operator algebra, and Closure Under the Algebra applies to both forms. At every level of abstraction, an assignment expression can constitute the operand of an operator and an operator expression can constitute the operand of an Page 3 of 20

assignment. Thus closure applies to an expression that combines the 2 kinds of expression. Case 3 is an operator expression, case 4 is an assignment expression, and together they form the expression category. In a parallel way, and to achieve generality and simplicity, categories 1 and 2 are also considered special instances of a more general name set category. Since RAQUEL uses expressions at every level of abstraction, an expression parameter is no different in principle to any other RAQUEL expression, although it may be at a different level of abstraction to that of the operator to which it pertains. The execution of an operation whose operator has an expression parameter is dependent on the execution of that parameter. This is mathematically different in nature to the use of a name set parameter by an operator. A name identifies and represents something. A name is merely replaced by what it represents. Operators that take an expression parameter are termed High-Level Operators. Operators that take a name set parameter or have no parameter are termed Ordinary Operators. For example, Restrict is a high-level operator. Its parameter is applied independently to every tuple in Restrict s operand to determine whether that tuple should appear in the result or not. Project is an ordinary operator because its parameter is a set of names, namely the names of those attributes that are to appear in the result of the projection 1. The definition of High-Level is generalised to include operators whose expression parameter consist of a single operator. For example the Divide operator takes a set-comparison operator as its parameter in order to determine the precise nature of the division to be executed. Operator Valency The valency of an operator is the number of operands that it takes. Currently RAQUEL operators are all either monadic (take one operand) or dyadic (take 2 operands). It is possible for future additional operators to be : 1. Ambiadic, i.e. take a varying number of operands. A future Compose operator could fall into this category. 2. Niladic, i.e. take no operands. It may be thought that a niladic operator will always return the same 1 Wherever logically possible, RAQUEL does permit a name set in a parameter to be prefixed with ~, meaning that the set of names to be used is all those derivable from the operand except those prefixed by ~. For example, Rel Project[ ~ A, B ] means that all the attributes are to be projected out of Rel except attributes A and B. However this is not regarded as evaluating the parameter, but rather a notational convenience that RAQUEL provides to ease the writing of a set of names. Page 4 of 20

relvalue, as there is no input to the operator to cause the result to vary. However a niladic operator with a parameter can generate different results depending on the value of the parameter(s). For example, suppose there were an Index[ n ] operator, where n were a numeric valued parameter, such that Index returned a single-attribute relvalue of n tuples containing the value 1.. n. A view (= virtual relational variable) is semantically equivalent to a niladic parameterless operator, but can return different relvalues when it is called because it represents a variable whose relvalue can change. It is possible that a future additional operator could be triadic (take 3 operands) or be of an even higher valency, but this would cause problems for RAQUEL because its syntax is linear and operands are recognised by their position adjacent to the operator dyadic operators are infix and not by any punctuation marks (e.g. parentheses or commas). This is to simplify RAQUEL syntax and make it similar to human language text where punctuation acts to organise words, not to separate one word from another 2. Therefore it is currently intended that any future additional operator whose valency is higher than dyadic would utilise an operand that is a single-attribute relvalue whose attribute values are the operands. Since the most likely future additions are also ambiadic e.g. Dist, a distributed high-level operator that could apply to a significant number of operands this approach is desirable rather than a problem. Default Tuples In addition to operands and parameters, it is possible for an operator to have a third kind of input, a so-called Default Tuple. By definition, the outer variant of a dyadic operator takes 2 operands of different relational types (= reltypes). The result is a relvalue whose reltype combines the 2 operand reltypes in a way that depends on the nature of the operator. Depending on the operand relvalues, the operator may need to combine a tuple value from one operand with a tuple value from neither operand in order to generate a valid result tuple. The tuple value from neither operand has to be input to the operator, and the purpose of the default tuple is to be that input. Points to note : 1. RAQUEL uses 2-valued logic. Therefore it does not permit NULLs or NULL values. Therefore a default tuple value must consist of a set of genuine 2 Note that human languages which do not read left-to-right as English does, but right-to-left or vertically, are still using linear text. Page 5 of 20

attribute values of the correct types, such that it can be combined with an operand tuple to produce a valid result tuple 3. 2. RAQUEL does not provide for tuples as entities independent of relations. The notion of a tuple is considered to be integral with the notion of a relation, just as the notion of an array-element is considered to be integral with the notion of an array one cannot have one without the other. Therefore RAQUEL only handles tuple values within relvalues. Hence a default tuple value is represented by a relvalue of one tuple. The Relational Algebra Operators in RAQUEL Relational algebra operators may be usefully split into 6 categories according to the nature of the function that they execute. Each category is further split into 2 subcategories according to whether an operator s function is applied by attribute or by tuple. The following table lists all RAQUEL s current operators by category. Two operators are omitted because they are still in development. They are a proposed Distributed operator and a Compose operator see the last section of the document. 3 Even a DB language which does permit NULLs should logically permit genuine attribute values as well as NULLs in the default tuple value specified by the user. Therefore SQL is remiss in preventing genuine values from being used and enforcing a NULL-only policy. Page 6 of 20

Nature of Function Executed Extracting a subrelvalue from a relvalue. Performing computations on a relvalue s attribute values. Merging 2 relvalues. Tuple set comparisons of 2 relvalues. Obtaining meta data about a relvalue. Restructuring a relvalue. Restrict. Extend. By Tuple Union, Intersect, Difference. =, ~=, Sub, Sub= Super, Super= Member, ~Member Disjoint, ~Disjoint. Meta. Not Applicable Project. GroupBy. By Attribute Natural Join in all inner, outer, & semi variants. Generalised Join in all inner, outer, & semi variants. Divide in all inner, outer and semi variants, and with any of the tuple set comparisons as a parameter. Meta. Nest, Unnest, Rename. It can be seen that there is one empty subcategory, viz. the By Tuple subcategory of the Restructuring a relvalue category. The restructuring of a relvalue exploits attribute names, but tuple names do not exist and hence there can be no tuple counterpart to restructuring. Comments : 1. It is conjectured that the categories are fundamental to relational algebra, although the operators in each category and its subcategories could vary in number and nature depending on the approach taken to allocating function to operators. 2. Sometimes Extend and Project are considered inverses of each other. This does not appear in the above categorisation, because the primary purpose of Extend is considered to be affording the calculation of values per tuple, with the addition of attributes merely a by-product needed to support this. Page 7 of 20

The Syntax of Operators The following syntax is used in describing the individual operators : commas are used to separate names in a list; semi-colons are used to separate values and (operator and assignment) expressions; square brackets, [ ], are used to encompass parameters; curly brackets, { }, are used to encompass literal tuple values and relvalues; parentheses, ( ), are used to override the natural precedence of operators and assignments in order to enable an expression of the desired kind to be written easily. The following terms are used to express the syntax : Operator-Expression is an operator expression that returns a value at the scalar or nested relvalue level of abstraction. Condition-Expression is an operator expression that returns a truth value at the scalar or nested relvalue level of abstraction. Aggregate-Expression is an operator expression that returns a single attribute value from a single-attribute relvalue. Attribute is the name of a single attribute. Attribute-Set is a list of one or more attribute names, the names being separated by commas, and the list order not being significant. ROP1 and ROP2 are relational operand values, the left-hand operand and the right-hand operand (where relevant) respectively. They are the value of a named relational variable (= relvar), a literal relvalue, or a relational algebra expression that evaluates to a relvalue. { Tuple } is either a literal tuple value or an expression that evaluates to a tuple value, expressed as a relvalue of a single tuple. Abcde123. Underlining indicates that a term (which may be any of the above) can be repeated as often as required, these being separated by a comma or semi-colon as appropriate. Abcde123. A line through indicates that a term (which may be any of the above) is optional and may be omitted. The Current Relational Algebra Operators in RAQUEL For convenience of indexing, ordinary operators are listed first and then high-level operators, within that monadic and then dyadic operators, and within that, alphabetic order of RAQUEL operator names. Page 8 of 20

Ordinary Operators : Monadic Meta ROP1 Meta[ Meta-Set ] The purpose of the Meta operator is to return a result that is a relvalue containing meta data about the relational operand. The set of meta data to be returned is specified by the parameter Meta-Set. The result has one attribute for each kind of meta data specified by Meta- Set. The attribute names are the standard names of specific kinds of meta data. Meta-Set comprises names of specific meta data (e.g. Cardinality, AttributeName ), and/or names that encompass 2 or more specific kinds of meta data (e.g. Attribute AttributeName + AttributeType ). The type of each result attribute depends on that needed to hold its meta data. Meta data expressible as a scalar value uses a suitable scalar type; meta data expressed as a set of values uses a suitable reltype. Consequently the result has as many tuples as required to express the result. In general the operand is a relational algebra expression. Relvars and literal relvalues are special cases of an expression. Other instances of an expression must be evaluated to derive the required meta data. Nest ROP1 Nest[ Attribute <-- Attribute-Set ] The purpose of the Nest operator is to return a relvalue containing one or more reltype attributes that are formed from sets of one or more attributes in the operand. Those operand attributes that do not become part of a reltype attribute in the result remain in the result. The parameter specifies the operand attributes that are to be nested in the result s reltype attributes. Project ROP1 Project[ Attribute-Set ] The purpose of the Project operator is to return a result that is a subrelvalue of the operand in terms of it being a subset of its attributes. The parameter specifies the set (which may be empty) of attributes to appear in the result. Because the result must be a relvalue, it may have fewer tuples than the operand. If Attribute-Set is preceded by ~, then the attributes to appear in the result are : (all ROP1 s attributes) Difference Attribute-Set. If Attribute-Set is empty and preceded by ~, then all ROP1 s attributes appear in the result; i.e. the result is identical to ROP1. Page 9 of 20

If Attribute-Set is empty, or ~ is used to remove all a relation s attributes, then the result is a relvalue with no attributes. Rename ROP1 Rename[ Attribute <-- Attribute ] The purpose of the Rename operator is to return a result that is the same as the operand except that one or more attributes have been renamed. Each assignment comprising the parameter has the old attribute name on the right and the new name on the left. Unnest ROP1 Unnest[ Attribute-Set ] The purpose of the Unnest operator is to return a relvalue in which one or more reltype attributes in the operand have been replaced by the set of attributes comprising those reltype(s). The remaining operand attributes remain in the result. The parameter specifies the names of those reltype attributes that are to be unnested. Attribute-Set may be preceded by ~ ; and the effect is to specify the relevant attribute names in the same way as with the Project operator. Ordinary Operators : Dyadic Difference ROP1 Diff ROP2 The purpose of the Diff operator is to merge two operands together by tuple so that the result has only those tuples from the left operand that are not also in the right operand. In order to achieve this, the two operands must have the same reltype so that their tuples can be compared. Intersect ROP1 Intersect ROP2 The purpose of the Intersect operator is to merge two operands together by tuple so that the result has all the tuples common to both operands. In order to achieve this, the two operands must have the same reltype so that their tuples can be compared. Page 10 of 20

Natural Join Natural Join is provided in all the inner, outer and semi variants. It can be considered as a single operator with a variety of permissible inputs, or as a family of operators whose semantics and syntax are designed to form a coherent set. Since the semi and outer joins can be considered as extensions of the inner join, the inner join is described first and the semi and outer joins described in terms of the inner join. Natural Inner Join :- ROP1 Join[ Attribute-Set ] ROP2 Its purpose is to merge two operands together by attribute; i.e. the attributes of the result are a set union of the attributes of the 2 operands. In order to derive the tuples that form the resulting relvalue, every tuple in one operand is matched with every tuple in the other operand. When a pair of tuples match, they are set unioned to form one tuple in the result. The set of attributes to be used in the matching are named in the parameter; each attribute must have the same name and type in both operands. The matching consists of a distributed logical And of the results of = comparisons. For each attribute named in the parameter, an = comparison is made of that attribute s value in one tuple with the same named attribute s value in the other tuple. The parameter must name all the attributes common to both operands. If that set is empty, there must be no common attributes, and the result is a Cartesian Product of the operands. Attribute-Set may not be preceded by ~. The Difference of an operand s attributes with Attribute-Set depends on the operand chosen. For simplicity and symmetry, RAQUEL does not allow an operand to be chosen. If the parameter is elided, the operator derives the common attributes to be used. Natural Semi Joins :- ROP1 Join[[ Attribute-Set ] ROP2 ROP1 Join[ Attribute-Set ]] ROP2 (Left) (Right) Their purpose is to carry out a natural inner join, and then project out from that result the attributes of the left or right operand to form the semi join result. For a left semi join the result s attributes are those of the left operand, and for a right semi join those of the right operand. Page 11 of 20

Natural Outer Joins :- ROP1 Join[[ Attribute-Set ]{ R-Tuple } ROP2 ROP1 Join{ L-Tuple }[ Attribute-Set ]]ROP2 (Left) (Right) ROP1 Join{ L-Tuple }[[ Attribute-Set ]]{ R-Tuple } ROP2 (Full) Their purpose is to carry out a natural inner join and then, for every tuple value in an operand that is missing from the inner join result, union that value with the default tuple value, and union that tuple result with the inner join result to yield the outer join result. For a left outer join, a missing tuple value is one from the left operand not appearing in the inner join result. Hence the default tuple value has a type identical to that of the right operand but omitting the common attributes, and contains user-specified attribute values. For a right outer join, a missing tuple value is one from the right operand not appearing in the inner join result. Hence the default tuple value has a type identical to that of the left operand but omitting the common attributes, and contains user-specified attribute values. In a full outer join, a missing tuple value is one from either the left or right operand not appearing in the inner join result. A tuple value missing from the left operand is dealt with as for a left outer join, and a tuple value missing from the right operand is dealt with as for a right outer join. In the above syntax, L-Tuple and R-Tuple represent the left and right default tuple values respectively. They are placed within the brackets { and } so that they become a relvalue of one tuple. Set Comparators ROP1 ROP2 where is a set comparator expressed by a keyword. The comparators are : Super ( ) Super= ( ) = ~= Sub ( ) Sub= ( ) Disjoint ~Disjoint For completeness, the 2 membership comparators are also provided : Member ~Member Page 12 of 20

The left-hand operand must be a relvalue of one tuple, or an error arises. (Mathematically, a membership comparison is of a single value with a set of values. In relational terms this translates to the comparison of a tuple with a relation of tuples. In RAQUEL tuples only exist within relations; therefore the single tuple translates to a relvalue of one tuple value). Union ROP1 Union ROP2 The purpose of the Union operator is to merge two operands together by tuple so that the result has all the tuples of both operands. In order to achieve this, the two operands must have the same reltype so that their tuples can be compared. High-Level Operators : Monadic Extend ROP1 Extend[ Attribute <-- Operator-Expression ] The purpose of the Extend operator is to enable individual attribute values in the operand to be manipulated. The manipulations are carried out tuple by tuple, and enable zero, one or more operand attribute values in the same tuple to contribute to the calculation of a new attribute value. Every such value created per tuple appears in an additional attribute attached to the operand relvalue to form the result relvalue. The purpose of the parameter is to supply a set of value assignments, where each assignment specifies the expression to be used to calculate a value and the name of the additional attribute that is to hold that value. GroupBy ROP1 GroupBy[ Attribute-Set ] With[ Attribute <-- Aggregate-Expression ] The purpose of the GroupBy operator is to enable the values in individual attributes of the operand to be partitioned into single-attribute relvalues and each such relvalue aggregated to form a single new scalar attribute value. The operand is partitioned into a set of disjoint sub-relvalues, such that in each sub-relvalue, every tuple has identical attribute values in all the Attribute-Set attributes of the first parameter. For each assignment in the second parameter, Aggregate-Expression is applied to every sub-relvalue. Aggregate-Expression references one or more operand attributes 4 (each of 4 Strictly speaking, this is not logically necessary as long as there is an aggregate operator that returns a single value. Page 13 of 20

which is treated as a single-attribute relvalue) and returns a single value which is held in Attribute (which is a new attribute not already in the operand). The result s attributes are the first parameter s Attribute-Set unioned with the second parameter s Attribute s. The result has one tuple for each sub-relvalue in the operand. If Attribute-Set is an empty set, then the complete operand is treated as a single sub-relvalue, and the result consists of one tuple. If the operand has no tuples, then the Aggregate-Expression s need to be such that they can cope with sub-relvalues that have no tuples; otherwise an error occurs. If the Aggregate-Expression s can cope, then the result consists of one tuple. The Aggregate-Expression has 2 basic forms : Bag[ Attribute ] Aggregate-Operator This form is used when any replicated values in a sub-relvalue s attribute (named in the Bag parameter) are to be retained for processing by the Aggregate-Operator. Project[ Attribute ] Aggregate-Operator This form is used when any replicated values in a sub-relvalue s attribute (named in the Project parameter) are to be removed before processing by the Aggregate- Operator. In effect, the operand names of the Bag and Project operators are elided. These names would be those of the sub-relvalues, were they to exist. The Project operator is the standard, ordinary, monadic Project operator. The Bag operator returns a bag of tuples (not a set of tuples, i.e. not a relvalue) and so the Aggregate- Operator needs to be able to handle a bag value. The aggregate-operators available are : The traditional operators Sum, Min, and Max. The operator Meta. This is typically used with the parameter Cardinality to return a count of the number of tuples in its operand 5. The operator Nest. This returns a single nested relvalue of the subrelvalue attribute 6. The logical operators Any and All. These correspond to a Distributed Or and a Distributed And respectively, and are applicable to attribute values of type Truth 7. 5 The traditional Count operator is anomalous because it returns a value derived from the structure of the relvalue to which it is applied. The other operators derive their result from the individual attribute values within the relvalue. 6 A nested relvalue appears as a single value. It may be manipulated by relational operators in the same way that a scalar value may be manipulated by scalar operators. Therefore a relvalue s attribute value may be either a scalar value or a nested relvalue. Page 14 of 20

It is also possible for the result of an aggregation to be an operand in a scalar expression. For example : Bag[ Attribute ] Aggregate-Operator Monadic-Scalar-Operator ( Bag[ Attribute ] Aggregate-Operator ) Dyadic-Scalar-Operator (Project[ Attribute ] Aggregate-Operator ) Further generalisations, say to permit manipulation of single-attribute relvalues before they are aggregated, are future possibilities. Restrict ROP1 Restrict[ Condition-Expression ] The purpose of the Restrict operator is to return a result that is a subrelvalue of the operand in terms of it being a subset of its tuples. Therefore the parameter TrueExp is a truth-valued expression which when applied to each tuple in the operand determines whether that tuple will be in the result or not. High-Level Operators : Dyadic Divide ROP1 Divide ROP2 Traditionally the attributes of the divisor operand, ROP2, are a subset (not necessarily a proper subset) of those of the dividend operand, ROP1, and the result would comprise those attributes of ROP1 not common with those of ROP2. The tuples in the result would be derived as follows. ROP1 is split into a set of disjoint sub-relvalues via a standard GroupBy operation that uses its non-common attributes to determine the split. For each subrelvalue, the relvalue formed by a Projection of its common attributes is checked to see if it is a superset of ROP2 or equal to it; where this is the case, the corresponding sub-tuple of non-common attributes is put into the result 8. In RAQUEL, Divide has been generalised as follows : Any of the set comparisons may be employed, not just Super= (i.e. ). Therefore a division is written as ROP1 Divide[ ] ROP2 7 A future possibility is to replace Any, All, Sum, Min, and Max with a Distributed operator that takes an appropriate scalar operator as a parameter (e.g. Or to obtain the effect of Any). This could provide a simpler approach and a greater range of functionality. 8 This differs from The Third Manifesto definition of Divide because it uses GroupBy and set comparisons in its definition; also it only has 2 operands, not three. Page 15 of 20

where is the relational set comparator to be used. It is still permissible to write ROP1 Divide ROP2 but then the operator uses the Super= comparison by default. Each operand consists of a set of common attributes (i.e. the attributes have the same name and type in both operands) and a disjoint set of non-common attributes. It is permissible for the common set to be empty, or at the other extreme to comprise all the attributes of one or both operands. If the common set is empty, the non-common sets comprise all the attributes in the operands. If the common set comprises all the attributes in an operand, its non-common set is empty. Operand ROP2 is also split into a set of disjoint sub-relvalues via the same standard GroupBy operation, using ROP2 s non-common attributes to determine its split. The traditional set of set comparisons is replaced by a Cartesian Product of set comparisons, one for each combination of a ROP1 sub-relvalue of common attributes and a ROP2 sub-relvalue of common attributes. Where each comparison returned is True, the ROP1 sub-tuple of non-common attributes would be unioned with the ROP2 sub-tuple of non-common attributes and put into the result. The above 2 generalisations mean that Divide can be considered as a form of join operator. It merges its operands by attribute as joins do, but using set value comparisons instead of individual attribute value comparisons as used by traditional join operators. Consequently the common attributes must be omitted from the result, whose attributes therefore comprise the non-common attributes of ROP1 unioned with the non-common attributes of ROP2. Considered as a form of join, the above Divide is an inner Divide. Hence it is extended to include semi and outer variants, in order to be consistent with the other join operators. Divide can be considered as a single operator with a variety of permissible inputs, or as a family of inner, semi, and outer operators whose semantics and syntax are designed to form a coherent set. Semi Divides :- ROP1 Divide [ [ ] ROP2 ROP1 Divide [ ] ] ROP2 (Left) (Right) The left and right semi Divides return results whose attributes are those of their left and right operands respectively. An inner Divide is executed and then for each result tuple, that set of corresponding tuples that appear in the left operand (for a left semi Divide) or in the right operand (for a right semi Divide) is put in the semi Divide result. Page 16 of 20

Outer Divides :- ROP1 Divide[ [ ]{ R-Tuple } ROP2 ROP1 Divide{ L-Tuple }[ ] ] ROP2 (Left) (Right) ROP1 Divide{ L-Tuple }[ [ ] ]{ R-Tuple } ROP2 (Full) The left, right, and full outer Divides return results whose attributes are those of an inner Divide result (i.e. no common attributes) but which include (sub-)tuple values from their operand(s) omitted from an inner Divide result. An inner Divide is executed, and then for each sub-tuple of non-common attribute values missing from the result, that operand subtuple is unioned with the corresponding default tuple, and that tuple result unioned with the inner Divide result. The sub-tuples considered missing are determined by whether a left, right, or full outer Divide is required For a left outer Divide, a missing sub-tuple value is one from the left operand not appearing in the inner Divide result. Hence the default tuple value has a type identical to that of the right operand but omitting the common attributes, and contains user-specified attribute values. For a right outer Divide, a missing sub-tuple value is one from the right operand not appearing in the inner Divide result. Hence the default tuple value has a type identical to that of the left operand but omitting the common attributes, and contains user-specified attribute values. In a full outer Divide, a missing sub-tuple value is one from either the left or right operand not appearing in the inner Divide result. A subtuple value missing from the left operand is dealt with as for a left outer Divide, and a sub-tuple value missing from the right operand is dealt with as for a right outer Divide. Generalised Join (or Theta Join) Generalised Join is provided in all the inner, outer and semi variants. It can be considered as a single operator with a variety of permissible inputs, or as a family of operators whose semantics and syntax are designed to form a coherent set. Since the semi and outer joins can be considered as extensions of the inner join, the inner join is described first and the semi and outer joins described in terms of the inner join. Page 17 of 20

Generalised Inner Join :- ROP1 Gen[ Condition-Expression ] ROP2 The purpose is to merge two operands together by attribute; i.e. the result should have all the attributes of both operands. The reltypes of ROP1 and ROP2 must be disjoint. In order to derive the tuples that form the resulting relvalue, each tuple in one operand is compared with each tuple in the other operand; wherever the comparison is true, the 2 matching tuples are set unioned together to form one tuple in the result. The parameter is a truth-valued operator expression that defines the tuple comparison to be made; so the expression must compare attribute values from ROP1 with attribute values from ROP2. Sub-expressions may manipulate attribute values before they are compared; And, Or, and Not may be included in comparisons. If the parameter consists of the truth-value True, then the result is a Cartesian Product of ROP1 and ROP2. The parameter may not be elided. Generalised Semi Joins ROP1 Gen[[ Condition-Expression ] ROP2 ROP1 Gen[ Condition-Expression ]] ROP2 (Left) (Right) Their purpose is to carry out a generalised inner join, and then project out from that result the attributes of the left or right operand to form the semi join result. For a left semi join the result s attributes are those of the left operand, and for a right semi join those of the right operand. Generalised Outer Joins ROP1 Gen[[ Condition-Expression ]{ R-Tuple } ROP2 ROP1 Gen{ L-Tuple }[ Condition-Expression ]] ROP2 (Left) (Right) ROP1 Gen{ L-Tuple }[[ Condition-Expression ]]{ R-Tuple } ROP2 (Full) Their purpose is to carry out a generalised inner join and then, for every tuple value in an operand that is missing from the inner join result, union that value with the default tuple value, and union that tuple result with the inner join result to yield the outer join result. For a left outer join, a missing tuple value is one from the left operand not appearing in the inner join result. Hence the default tuple value has a type identical to that of the right operand, and contains userspecified attribute values. For a right outer join, a missing tuple value is one from the right operand not appearing in the inner join result. Hence the default tuple Page 18 of 20

value has a type identical to that of the left operand, and contains userspecified attribute values. In a full outer join, a missing tuple value is one from either the left or right operand not appearing in the inner join result. A tuple value missing from the left operand is dealt with as for a left outer join, and a tuple value missing from the right operand is dealt with as for a right outer join. In the above syntax, L-Tuple and R-Tuple represent the left and right default tuple values respectively. They are placed within the brackets { and } so that they become a relvalue of one tuple. Operators in Development Compose ROP1 Compose[ Attribute-Set1; Attribute-Set2 ] With[ Attribute <-- Operator-Expression ] By Set Key[Attribute-Set ] Guard[ Condition-Expression ] ROP2 The syntax may change, particularly as regards simplifying and generalising the parameters that terminate composition. The current intent is that Attribute-Set1 and Attribute-Set2 are the two sets of named attributes on which composition is carried out. They comprise the only mandatory parameter. The optional With parameter enables values to be derived e.g. the total length of a path as each iteration of the composition is carried out; the derived values are held in new attributes attached to the composition result. The By, Key, and Guard parameters are optional means of controlling the number of iterations in the composition. If all are omitted, then composition is continued until a transitive closure is achieved. The optional operand ROP2 allows the first iteration to use a second relvalue instead of using 2 copies of ROP1. This can be useful if it is desired to start the composition using a subset of ROP1 s tuples; ROP2 is used in effect to pick out that subset. There are 3 current concerns with the prototype. Firstly the control of the number of iterations is overly complicated and should be simplified. Secondly more consideration is needed to determine how the operator should avoid executing a permanent loop and terminate satisfactorily. Thirdly it would be useful to generalise the attribute comparisons used in composition; currently natural join type comparisons are used, but it would be useful to also permit generalised join type comparisons. Dist Page 19 of 20

{ } Dist[ Operator ] { } is a 1-attribute relvalue that contains an operand in each tuple. Dist applies Operator in a distributed fashion over all the operands. Operator is constrained to be a dyadic operator that is both commutative and associative. Page 20 of 20