TELEIOS FP Deliverable D5.1. An implementation of ad-hoc array queries on top of MonetDB

Size: px
Start display at page:

Download "TELEIOS FP Deliverable D5.1. An implementation of ad-hoc array queries on top of MonetDB"

Transcription

1 TELEIOS FP Deliverable D5.1 An implementation of ad-hoc array queries on top of MonetDB Ying Zhang, Martin Kersten, Milena Ivanova, Holger Pirk, Stefan Manegold and Consortium Members February 29, 2012 Status: Final Scheduled Delivery Date: 29 February 2012

2 Executive Summary The objectives of WP5 are (i) to develop query processing and optimization techniques for ad-hoc and continuous/stream queries for EO image data implemented as arrays on top of MonetDB and (ii) to develop a functional/performance benchmark that allow us to evaluate our implementation. WP5 is organized in three tasks that each yield one deliverable as follows: Task 5.1 (Month 6 Month 18) Query processing and optimization for ad-hoc array queries on top of MonetDB Deliverable 5.1 (Prototype/Public, Month 18) An implementation of ad-hoc array queries on top of MonetDB Task 5.2 (Month 19 Month 30) Query processing and optimization for continuous/stream queries on top of MonetDB Deliverable 5.2 (Prototype/Public, Month 30) An implementation of continuous/stream queries on top of MonetDB Task 5.3 (Month 19 Month 36) Benchmarking and evaluation of the developed system Deliverable 5.3 (Report/Public, Month 36) The evaluation of the developed implementation In this report, accompanying Deliverable 5.1, we present our implementation of SciQL on top of MonetDB. SciQL is the query language for science applications we developed in WP2 (Deliverable 2.2 [KNZ + 11]). SciQL is an extension of SQL that introduces arrays and respective processing capabilities as first-class citizens next to relational tables. The source code for Deliverable 5.1 is provided as archive TELEIOS_deliverable_5.1-4.tar.bz2. See Appendix A for more details. D5.1 An implementation of ad-hoc array queries on top of MonetDB i

3 Document Information Contract Number FP Acronym TELEIOS Full title TELEIOS: Virtual Observatory Infrastructure for Earth Observation Data Project URL EU Project Officer Francesco Barbato Deliverable Number D5.1 Name An implementation of ad-hoc array queries on top of MonetDB Task Number T5.1 Name Query processing and optimization for ad-hoc array queries on top of MonetDB Work package Number WP5 Date of delivery Contract M18 Actual 29 February 2012 Status Draft Final Nature Prototype Report Distribution Type Public Restricted Responsible Partner CWI QA Partner NKUA Contact Person Martin Kersten Phone Fax - D5.1 An implementation of ad-hoc array queries on top of MonetDB ii

4 Project Information This document is part of a research project funded by the IST Programme of the Commission of the European Communities as project number FP The Beneficiaries in this project are: Partner Acronym Contact National and Kapodistrian University of Athens Department of Informatics and Telecommunications NKUA Prof. Manolis Koubarakis National and Kapodistrian University of Athens Department of Informatics and Telecommunications Panepistimiopolis, Ilissia, GR Athens, Greece (koubarak@di.uoa.gr) Tel: , Fax: Fraunhofer Institute for Computer Graphic Research German Aerospace Center The Remote Sensing Technology Institute Photogrammetry and Image Analysis Department Image Analysis Team Fraunhofer DLR MSc. Thorsten Reitz Fraunhofer Institute for Computer Graphic Research Fraunhofer Strasse 5, D Darmstadt, Germany (thorsten.reitz@igd.fraunhofer.de) Tel: , Fax: Prof. Mihai Datcu German Aerospace Center The Remote Sensing Technology Institute Oberpfaffenhofen, D Wessling, Germany (mihai.datcu@dlr.de) Tel: , Fax: Stichting Centrum voor Wiskunde en Informatica Database Architecture Group CWI Prof. Martin Kersten Stichting Centrum voor Wiskunde en Informatica P.O. Box 94097, NL-1090 GB Amsterdam, Netherlands (martin.kersten@cwi.nl) Tel: , Fax: National Observatory of Athens Institute for Space Applications and Remote Sensing NOA Dr. Charis Kontoes National Observatory of Athens Institute for Space Applications and Remote Sensing Vas. Pavlou and I. Metaxa, GR Athens, Greece (kontoes@space.noa.gr) Tel: , Fax: Advanced Computer Systems A.C.S S.p.A ACS Mr. Ugo Di Giammatteo Advanced Computer Systems A.C.S S.p.A Via Della Bufalotta 378, RM Rome, Italy (udig@acsys.it) Tel: , Fax: D5.1 An implementation of ad-hoc array queries on top of MonetDB iii

5 Contents 1 Introduction 1 2 MonetDB Overview Technical Features Physical Data Model Execution Model System Architecture & Software Stack Front-end Back-end Kernel SciQL Introduction Language Model Array Definitions Array Modifications Array and Table Coercions Query Model Cell Selections Array Slicing Array Views Aggregate Tiling Implementation Extending the SQL Parser Extending the SQL Catalog Translation to relational algebra and MAL Array storage & creation Array slicing Structural grouping TELEIOS NOA Use Case Implementation Loading Cropping and georeference Classification Output generation Assessment Summary 29 A Source Code 30 D5.1 An implementation of ad-hoc array queries on top of MonetDB iv

6 List of Figures 2.1 Decomposed Storage Model: Vertical Fragmentation in BATs in MonetDB The original MonetDB/SQL Architecture SciQL fixed arrays with different forms Results of updating the four fixed arrays by the first four queries in Section Result of shifting and zero filling the last column of matrix SciQL Array Tiling Results of computing AVG() over the tiles The revised & extended MonetDB Architecture to support SciQL Fire classification algorithm in SciQL D5.1 An implementation of ad-hoc array queries on top of MonetDB v

7 List of Tables D5.1 An implementation of ad-hoc array queries on top of MonetDB vi

8 1. Introduction This document reports on the implementation of SciQL, the query language for science applications as proposed in TELEIOS deliverable D2.2 An array data model and query language for EO image databases [KNZ + 11], on top of MonetDB. MonetDB 1 is the open-source column-store DBMS developed at CWI. SciQL is an extension of SQL that introduces arrays and respective processing capabilities as first-class citizens next to relational tables and their query algebra. This document is structured as follows. To set the stage, Section 2 introduces MonetDB and its software architecture to the extent as relevant for the SciQL implementation presented here. In Section 3, we provide a recap of the design intentions and most important features of SciQL. Section 4 discusses the extensions to the MonetDB software architecture that are required to implement SciQL, and provides detailed descriptions of the implementation of key features of SciQL. In Section 5, we describe the implementation of the NOA fire monitoring use case in SciQL, and provide an initial assessment of our SciQL implementation. Section 6 summarizes and concludes this document. The source code for Deliverable 5.1 is provided as archive TELEIOS_deliverable_5.1-4.tar.bz2. See Appendix A for more details. 1 D5.1 An implementation of ad-hoc array queries on top of MonetDB 1

9 2. MonetDB Before describing the implementation of SciQL on top of MonetDB 1, we give a brief technical description of the architecture of MonetDB. Far from complete, this description focuses on the software components that are relevant for the SciQL implementation. 2.1 Overview MonetDB is an open-source database management system (DBMS) for high-performance applications in data mining, business intelligence, OLAP, scientific databases, XML Query, text and multimedia retrieval, that is being developed at CWI since 1993 [MKB09]. MonetDB was designed primarily for data warehouse applications. These applications are characterized by large databases, which are mostly queried to provide business intelligence or decision support. Similar applications also appear frequently in the area of e-science, where observations are collected into a warehouse for subsequent scientific analysis. This makes MonetDB a good candidate to provide a data management solution for such applications. The design of MonetDB is built around the concept of bulk processing: simple operations applied to large volumes of data make efficient use of the hardware for large scale data processing. This focus on bulk processing is reflected at all levels of the architecture and the functionality offered to the user. Although MonetDB/SQL provides the full-fledged SQL interface, it has not been tuned towards high-volume transaction processing with its accompanying multi-level ACID properties. MonetDB often achieves a significant speed improvement for both relational/sql and XML/XQuery databases over other open-source systems. MonetDB achieves its goal by innovations at all layers of a DBMS, e.g., a storage model based on vertical fragmentation (column store), modern CPU-tuned query execution architecture, automatic and self-tuning indexes, run-time query optimization, and a modular software architecture. 2.2 Technical Features From a user s point of view, MonetDB is a full-fledged relational DBMS that supports the SQL:2003 standard and provides standard client interfaces such as ODBC and JDBC, as well as application programming interfaces for various programming languages including C, Python, Java, Ruby, Perl, and PHP. MonetDB is designed to exploit the large main memories of modern computer systems effectively and efficiently during query processing, while the database is persistently stored on disk. With respect to performance, MonetDB mainly focuses on analytical and scientific workloads that are read-dominated and where updates mostly consist of appending new data to the database in large chucks at a time. However, MonetDB also provides complete support for transactions in compliance with the SQL:2003 standard. Internally, the design, architecture and implementation of MonetDB reconsiders all aspects and components of classical database architecture and technology to achieve the aforementioned performance benefits by effectively exploiting the potentials of modern hardware and enabling extensibility to support new application requirements. MonetDB is one of the first publicly available DBMSs designed to exploit column-store technology. Traditionally, relational database systems store their data row-wise, i.e., per data tuple, the respective values of all attributes are stored together in a database record. In contrary, a column-store stores the data columnwise, i.e., per attribute, the respective values of all tuples are stored together in one array. The first benefit 1 D5.1 An implementation of ad-hoc array queries on top of MonetDB 2

10 of column-wise over row-wise storage is reduced costs for I/O and data transport. In particular, in scientific and analytical workloads, queries often access the attribute values of many or all tuples of a table, but only for a small subset of all attributes of a table; as opposed to traditional transactional workloads, where queries mostly access only one or at most very few tuples, but then all attribute values of those tuples. MonetDB consequently exploits the column-store architecture beyond data storage and I/O efficiency. In MonetDB, all query processing internally happens on a columnar data representation. Multi-attribute tuples are only reconstructed just before the final query result is returned to the client. This approach enables a very lean query evaluation architecture that is highly tuned to minimize computational (CPU) costs. Moreover, carefully designed cache-conscious data structures and algorithms make optimal use of hierarchical memory systems [BKM08]. The design of MonetDB also supports extensibility of the whole system at various levels. Via extension modules, implemented in C or MonetDB s MAL language, new data types and new algorithms can be added to the system to support special application requirements that go beyond the SQL standard, or enable efficient exploitation of domain-specific data characteristics. Additionally, by opening the traditionally closed and monolithic query optimization and execution engine, MonetDB provides a modular multi-tier query optimization framework. Optimizer pipelines can be configured and extended to effectively exploit domain-specific data and workload characteristics. In addition, MonetDB provides novel techniques to provide efficient support for a priori unknown or rapidly changing workloads over large data volumes. Both the fine-grained flexible intermediate result caching technique recycling [IKNG09] and the adaptive incremental indexing technique database cracking [IKM07] require minimal overhead and investment to provide maximal benefit for the actual workload and the portion of the data that are actually accessed. Finally, the core architecture of MonetDB has proved to provide efficient support not only for the relational data model and SQL, but also for, e.g., XML and XQuery [BGvK + 06]. In this line, support for RDF and SPARQL, as well as arrays will be developed in the context of the TELEIOS project. 2.3 Physical Data Model The storage model deployed in MonetDB is a significant deviation of traditional database systems. Instead of storing all attributes of each relational tuple together in one record (aka. row-store), MonetDB represents relational tables using vertical fragmentation (aka. column-store), by storing each column in a separate (surrogate,value) table, called a BAT (Binary Association Table). The left column, often the surrogate or OID (object-identifier), is called the head, and the right column, usually holding the actual attribute values, is called the tail. In this way, every relational table is internally represented as a collection of BATs as sketched in Figure 2.1. For a relation R of k attributes, there exist k BATs, each BAT storing the respective attribute as (OID,value) pairs. The system-generated OID identifies the relational tuple that the attribute value belongs to, i.e., all attribute values of a single tuple are assigned the same OID. OID values form a dense ascending sequence representing the position of a value in the column. Thus, for base BATs, the OID columns are not materialized, but rather implicitly given by the position. This makes base BATs essentially equal to typed arrays in C with optional metadata. For each relational tuple t of R, all attributes of t are stored in the same position in their respective column representations. The position is determined by the insertion order of the tuples. This tuple-order alignment across all base columns allows the column-oriented system to perform tuple reconstructions efficiently in the presence of tuple order-preserving operators. For fixed-width data types (e.g., integer, decimal and floating point numbers), MonetDB uses a plain C array of the respective type to store the value column of a BAT. For variable-width data types (e.g., strings), MonetDB applies a kind of dictionary encoding. All distinct values of a column are stored in a BLOB and D5.1 An implementation of ad-hoc array queries on top of MonetDB 3

11 Figure 2.1: Decomposed Storage Model: Vertical Fragmentation in BATs in MonetDB the value column of the BAT is an integer array that contains references to the position of the respective value in the BLOB. MonetDB resorts to the operating systems memory mapped files support to load the data in main memory and exploit extended virtual memory. Thus, all data structures are represented in the same binary format on disk and in memory. Similarly to all advanced column-stores [S +, BZN05], MonetDB uses late tuple reconstruction; when a query is fired, the relevant columns are loaded from disk to memory but are glued together in a tuple N- ary format only prior to producing the final result. This way, intermediate results are also in a column format. This approach allows the query engine to exploit CPU- and cache-optimized vector-like operator implementations throughout the whole query evaluation relying on a bulk processing model as opposed to the typical Volcano approach, allowing to minimize function calls, type casting, various metadata handling costs, etc. Intermediate results need to be materialized, but those can efficiently be reused [IKNG09]. 2.4 Execution Model The MonetDB kernel is an abstract machine, programmed in the MonetDB Assembly Language (MAL). The core of MAL is formed by a closed low-level two-column relational algebra on BATs. N-ary relational algebra plans are translated into two-column BAT algebra and compiled to MAL programs. These MAL programs are then evaluated in a operator-at-a-time manner, i.e., each operation is evaluated to completion over its entire input data, before subsequent data-dependent operations are invoked. Each BAT algebra operator maps to a simple MAL instruction, which has zero degrees of freedom in its behavior (obviously, it may be parameterized where necessary): it does not take complex expressions as parameter. Complex operations are broken into a sequence of BAT algebra operators that each perform a simple operation on an entire column of values ( bulk processing ). This allows the implementation of the BAT algebra to avoid an expression interpreting engine; rather all BAT algebra operations in the implementation map onto simple array operations. For instance, the BAT algebra expression D5.1 An implementation of ad-hoc array queries on top of MonetDB 4

12 R:bat[:oid,:oid] := select(b:bat[:oid,:int], V:int); can be implemented at the C code level like: for (i = j = 0; i < n; i++) if (B.tail[i] == V) R.tail[j++] = i; The BAT algebra operators have the advantage that tight for-loops without function calls create high instruction locality which eliminates the instruction cache miss problem. Such simple loops are amenable to compiler optimization (loop pipelining, blocking, strength reduction), and CPU out-of-order speculation. 2.5 System Architecture & Software Stack Legend Resultset Functional Component Client Data Structure SQL Parser SQL Query Syntax Tree SQL Compiler SQL Catalog MAL Generator Relational Algebra MAL Program MAL Optimizers GDK Kernel MAL Interpreter BATs BATs BATs BATs Figure 2.2: The original MonetDB/SQL Architecture MonetDB s query processing scheme is centered around three software layers as depicted in Figure 2.2. The remainder of this section describes the three layers and their components. D5.1 An implementation of ad-hoc array queries on top of MonetDB 5

13 2.5.1 Front-end The top layer or front-end provides the user-level data model and respective query language. The back-end and kernel are designed generic enough to support multiple user-level data models and query languages, like relational tables with SQL and XML with XQuery. While RDF with SPARQL is on our agenda, this document reports on the implementation of arrays with SciQL. In general, the front-end s task is to map the user-space data model to MonetDB s BATs and to translate the user-space query language to MAL. For the latter, the user-level query language is first parsed into an internal representation (e.g., SQL into relational algebra), which is then optimized using domain-specific rules. In general, these domain-specific optimizations, which we refer to as strategic optimization, aim primarily at reducing the amount of data to be processed, i.e., the size of intermediate results. In the case of SQL & relational algebra, such optimizations include heuristics like pushing down selections and exploiting join indexes. As an indicative example, consider the following simple table and query. CREATE TABLE t (a INTEGER, b INTEGER, c INTEGER); INSERT INTO t VALUES (1,5,9),(2,4,8),(3,6,7); SELECT * FROM t; a b c +======+======+====== SELECT a,b FROM t WHERE c BETWEEN 7 AND 8; a b +======+====== Prefixing the query with keyword plan shows the (logical) relational plan of the query. plan SELECT a,b FROM t WHERE c BETWEEN 7 AND 8; project ( select ( table(sys.t) [ t.a, t.b, t.c, t.%tid% NOT NULL ] ) [ int[tinyint "7"] <= t.c <= int[tinyint "8"] ] ) [ t.a, t.b ] The optimized logical plan is then translated into MAL and handed over to the back-end for general MALoptimization and evaluation Back-end The middle layer or back-end consists of the MAL optimizers framework and the MAL interpreter as textual interface to the kernel. The MAL optimizers framework consists of a collection of optimizer modules that each transform a given MAL program into a more efficient one, possibly adding resource management directives. The modules provide facilities ranging from symbolic processing up to just-in-time data distribution and execution. This D5.1 An implementation of ad-hoc array queries on top of MonetDB 6

14 approach, which we refer to as tactical optimization, is more inspired by programming language optimization than by classical database query optimization. It breaks with the hitherto omnipresent cost-based optimizers, recognizing that not all decisions can be cast together in a single cost formula. Operating on the common binary-relational back-end algebra, these optimizer modules are shared by all front-end data models and query languages. Prefixing the query with keyword explain shows the optimizer (physical) MAL plan of the query. explain SELECT a,b FROM t WHERE c BETWEEN 7 AND 8; X_2 := sql.mvc(); X_9:bat[:oid,:int] := sql.bind(x_2,"sys","t","c",0); X_10 := algebra.uselect(x_9,7:int,8:int,true,true); X_11 := algebra.markt(x_10,0@0:oid); X_12 := bat.reverse(x_11); X_5:bat[:oid,:int] := sql.bind(x_2,"sys","t","a",0); X_13 := algebra.leftjoin(x_12,x_5); X_15:bat[:oid,:int] := sql.bind(x_2,"sys","t","b",0); X_16 := algebra.leftjoin(x_12,x_15); X_17 := sql.resultset(2,1,x_13); sql.rscolumn(x_17,"sys.t","a","int",32,0,x_13); sql.rscolumn(x_17,"sys.t","b","int",32,0,x_16); X_27 := io.stdout(); sql.exportresult(x_27,x_17); Kernel The bottom layer or kernel (aka. GDK) provides BATs as MonetDB s bread-and-butter data structure, as well as the library of highly optimized implementation of the binary relational algebra operators. Due to the operator-at-a-time bulk-processing evaluation paradigm, each operator has access to its entire input including known properties. This allows the algebra operators to choose at runtime the actual algorithm and implementation to be used, based on the input s properties. For instance, a select operator can exploit sorted-ness of a BAT by deploying binary search, or (for point-selections) use an existing hash-index, and fall back to a scan otherwise. Likewise, a join can at runtime decide to, e.g., perform a merge-join if the join attributes happen to be sorted, and fall-back to hash-join otherwise. We refer to this runtime optimization as operational optimization. D5.1 An implementation of ad-hoc array queries on top of MonetDB 7

15 3. SciQL Before describing our implementation of SciQL, we provide a recap of its design intention and major features. A complete description of SciQL is available in [KNZ + 11]. 3.1 Introduction The array computational paradigm is prevalent in most sciences and it has drawn attention from the database research community for many years. The object-oriented database systems of the 90s allowed any collection type to be used recursively [B + 92] and multi-dimensional database systems took it as the starting point for their design [GL97]. The hooks provided in relational systems for user defined functions and data types create a stepping stone towards interaction with array-based libraries, i.e., RasDaMan [B + 98] is one of the few systems in this area that have matured beyond the laboratory stage. Nevertheless, the array paradigm taken in isolation is insufficient to create a full-fledged scientific information system. Such a system should blend measurements with static and derived meta-data about the instruments and observations. It therefore calls for a strong symbiosis of the relational paradigm and array paradigm. The SciQL language proposed in TELEIOS deliverable D2.2 An array data model and query language for EO image databases [KNZ + 11] fills this gap. The mismatch between application needs and database technology has a long history, e.g., [HTT09, Bro10, SBD + 09, GLNS + 05, HM05, GL97]. The main problems encountered with relational systems in science can be summed up as i) the impedance mismatch between query language and array manipulation, ii) the difficulty to write complex array-based expressions in SQL, iii) ARRAYs are not first class citizens, and iv) ingestion of terabytes of data is too slow. The traditional DBMS simply carries too much overhead. Moreover, much of the science processing involves use of standard libraries, e.g., LINPACK, and statistics tools, e.g., R. Their interaction with a database is often confined to a simplified data import/export facility. The proposed standard for management of external data (SQL3/MED) [MMJ + 02] has not materialized as a component in contemporary system offerings. A query language is needed that achieves a true symbiosis of the TABLE and ARRAY semantics in the context of existing external software libraries. This led to the design of SciQL, where arrays are made first class citizens by enhancing the SQL:2003 framework along three innovative lines: Seamless integration of array-, set-, and sequence- semantics. Named dimensions with constraints as a declarative means for indexed access to array cells. Structural grouping to generalize the value-based grouping towards selective access to groups of cells based on positional relationships for aggregation. A TABLE and an ARRAY differ semantically in a straightforward manner. A TABLE denotes a (multi-) set of tuples, while an ARRAY denotes a (sparsely) indexed collection of tuples called cells. All cells covered by an array s dimensions always exist conceptually and their non-dimensional attributes are initialized to a default value, while in a TABLE tuples only come into existence after an explicit insert operation. Arrays may appear wherever tables are allowed in an SQL expression, producing an array if the column list of a SELECT statement contains dimensional expressions. The SQL iterator semantics associated with TABLEs carry over to ARRAYs, but iteration is confined to cells whose non-dimensional attributes are not NULL. An important operation is to carve out an array slab for further processing. The windowing scheme in SQL:2003 is a step into this direction. It was primarily introduced to better handle time series in business data warehouses and data mining. In SciQL, we take it a step further by providing an easy to use language feature to identify groups of cells based on their positional relationships. Each group forms a pattern, called a tile, which can be subsequently used to derive all possible incarnations for, e.g., statistical aggregation. D5.1 An implementation of ad-hoc array queries on top of MonetDB 8

16 y (a) matrix x y (b) stripes x y (c) diagonal x y (d) sparse x Figure 3.1: SciQL fixed arrays with different forms y (a) matrix x y (b) stripes x y x (c) diagonal y (d) sparse x Figure 3.2: Results of updating the four fixed arrays by the first four queries in Section Language Model In this section we summarize the features offered in SciQL concerning ARRAY definition, instantiation and modification, as well as coercions between TABLE and ARRAY Array Definitions We purposely stay as close as possible to the syntax and semantics of SQL:2003. An ARRAY object definition reuses the syntax of TABLE with a few minor additions. An array has one-or-more dimensional attributes (for short: dimensions) and zero-or-more non-dimensional attributes. A dimension is a measurement of the size of the array in a particular named direction, e.g., x, y, z or time. A dimensional attribute is denoted by the keyword DIMENSION with optional constraints describing the dimension range. The data type of a dimension can be any of the basic scalar data types, including TIMESTAMP, FLOAT and VARCHAR. The non-dimensional attributes of an array can be of any data types a normal table column can be and they may use a DEFAULT clause to initialize their values. The default value may be arbitrarily taken from a scalar expression, a cell s dimensional value(s) (i.e., the cell s coordinates on the array dimensions) or a side-effect free function. Omission of the default or assignment of a NULL-value produces a hole, which is ignored by the built-in aggregation functions. Arrays are either fixed or unbounded. An array is fixed if all its dimensions are fixed, otherwise it is unbounded. The range and size of a fixed dimension are exactly specified using the sequence pattern [<start>:<step>:<stop>], which is composed out of expressions each producing one scalar value. The interval between start and stop has an open end-point, i.e., stop is not included. For integer dimensions, the traditional syntax using an integer upper bound [<size>] is allowed as a shortcut of the sequence pattern [0:1:<size>]. Figure 3.1 shows four fixed arrays with different forms. In addition to the most common C-style rectangular arrays (Fig.3.1-a), stripes (Fig.3.1-b) can be defined as one where the default value of some rows is indistinguishable from out of bound access, i.e., those cells are explicitly excluded by carrying NULL values in their non-dimensional attributes. A diagonal array (Fig.3.1-c) is easily formulated using a predicate over the dimensions involved. It is even possible to carve out an array based on its content (Fig.3.1-d), thereby effectively ifying all cells outside the domain of validity and producing a sparse array. This feature is of particular interest to remove outliers using an integrity constraint. Evidently, different array forms can lead to very different considerations with respect to their physical representation, a topic for future research. The following statements show how the four arrays in Figure 3.1 are created in SciQL: D5.1 An implementation of ad-hoc array queries on top of MonetDB 9

17 CREATE ARRAY matrix ( x INT DIMENSION[4], y INT DIMENSION[4], v FLOAT DEFAULT 0.0 ); CREATE ARRAY stripes ( x INT DIMENSION[4], y INT DIMENSION[4] CHECK(MOD(y,2) = 1), v FLOAT DEFAULT 0.0 ); CREATE ARRAY diagonal ( x INT DIMENSION[4], y INT DIMENSION[4] CHECK(x = y), v FLOAT DEFAULT 0.0 ); CREATE ARRAY sparse ( x INT DIMENSION[4], y INT DIMENSION[4], v FLOAT DEFAULT 0.0 CHECK(v BETWEEN 0 AND 10) ); A dimension is unbounded if any of its start, step, or stop expressions is identified by the pseudo expression *. A DIMENSION clause without a sequence pattern implies the most open pattern [*:*:*]. Cells in an unbounded array can be modified using the INSERT and DELETE statements carried over from the table semantics. An unbounded array has an implicitly defined actual size derived from the minimal bounding rectangle that encloses all cells with an explicitly inserted non-null value in the array. When walking through an array instance, cells outside the minimal bounding rectangle are ignored. However, direct access to any cells within the array s dimension bounds is guaranteed to produce the default value. The effect is that listing an array with unbounded dimensions still produces a finite result, but it may be huge. An unbounded dimension is typically used for an n-dimensional spatial array where only part of the dimension range designates a non-empty array cell. Time series are also prototypical examples of arrays with unbounded dimensions Array Modifications The SQL update semantics is extended towards arrays in a straightforward manner. The array cells are initialized upon creation with the default values. A cell is given a new value through an ordinary SQL UPDATE statement. A dimension can be used as a bound variable, which takes on all its dimension values (i.e., valid values of this dimension) successively. A convenient shortcut is to combine multiple updates into a single guarded statement. The evaluation order ensures that the first predicate that holds dictates the cell values. The refinement of the array matrix is shown in the first query below. The cells receive a zero only in the case x = y. The remaining queries demonstrate setting cell values in the arrays stripes, diagonal and sparse, respectively. The results are shown in Figure 3.2. UPDATE matrix SET v = CASE WHEN x > y THEN x + y WHEN x < y THEN x - y ELSE 0 END; UPDATE stripes SET v = x + y; UPDATE diagonal SET v = v + 10; UPDATE sparse SET v = MOD(RAND(),16); Assignment of a NULL value to an array cell leads to a hole in the array, a place indistinguishable from the out of bounds area. Such assignments overrule any predefined DEFAULT clause attached to the array D5.1 An implementation of ad-hoc array queries on top of MonetDB 10

18 definition. For convenience, the built-in array aggregate operations SUM(), COUNT(), AVG(), MIN() and MAX() are applied to non-null values only. Arrays can also be updated using INSERT and DELETE statements. Since all cells semantically exists by definition, both operations effectively turn into update statements. The DELETE statement creates holes by assigning a NULL value for all qualified cells. The INSERT statement simply overwrites the cells at positions as specified by the input columns with new values. Note that although the UPDATE, INSERT and DELETE statements do not change the existence of array cells, for unbounded arrays they may result in scaling the minimal bounding rectangle up/down. The three queries below together illustrate how to delete a column in the array matrix where x = 2, then shift the remaining columns, and (manually) set the last column of matrix to its default value. In the second and third queries, the x and y dimensions of the array matrix are matched against the projection columns of the SE- LECT statements. Cells at matching positions are assigned new values (see Figure 3.3). y x Figure 3.3: Result of shifting and zero filling the last column of matrix. DELETE FROM matrix WHERE x = 2; INSERT INTO matrix SELECT x-1, y, v FROM matrix WHERE x > 2; INSERT INTO matrix SELECT x, y, 0 FROM matrix WHERE x = 3; Array and Table Coercions One of the strong features of SciQL is to switch easily between a TABLE and an ARRAY perspective. Any array is turned into a corresponding table by simply selecting its attributes. The dimensions then form a compound primary key. For example, the matrix defined earlier becomes a table using the expression SELECT x, y, v FROM matrix or using a CAST operation like CAST(matrix AS TABLE). Note, that the semantics of an array leads to materialization of all cells within the dimension bounds (or the minimal bounding rectangle for unbounded arrays), even if their values were set to a non-null default. A selection excluding the user specified default values may solve this problem. An arbitrary table can be coerced into an array if the column list of the SELECT statement contains the dimension qualifiers [ and ] around a projection column, i.e., [<expr>]. Here, the <expr> is a <column name> or a value expression. For instance, let mtable be the table produced by casting the array matrix to a table. It can be turned into an array by picking the columns forming the primary key in the column list as follows: SELECT [x], [y], v FROM mtable, or using the reverse cast operation CAST(mtable AS ARRAY(x,y)). The result is an unbounded array with actual size derived from the dimension column expressions [x] and [y]. The default values of all non-dimensional attributes are inherited from the default values in the original table. 3.3 Query Model From a query s perspective, querying a TABLE and an ARRAY are much alike. In both cases elements are selected based on predicates, joins, and groupings. The result of any query expression is a table unless the column list contains the dimension qualifiers ( [ and ] ). A novel way to use GROUP BY, called tiling, is introduced to improve structure based querying. D5.1 An implementation of ad-hoc array queries on top of MonetDB 11

19 3.3.1 Cell Selections SELECT x, y, v FROM matrix WHERE v >2; SELECT [x], [y], v FROM matrix WHERE v >2; SELECT [T.k], [y], v FROM matrix JOIN T ON matrix.x = T.i; The examples above illustrate a few simple array queries. The first query extracts values from the array matrix into a table. The second one constructs a sparse array from the selection, whose dimensional properties are inherited from the result set. The dimension qualifiers introduce a new dimension range, i.e., a minimal bounding box is derived from the result set, such that the answers fall within its bounds. The last query shows how elements of interest can be obtained from both arrays and tables using an ordinary join expression. It assumes a table T with two (or more) columns, where the column i is of a numeric type and the column k may be of any scalar type. The expression extracts the subarray from matrix and sets the bounds to the smallest enclosing bounding box defined by the values of the columns T.k and y. The actual bounds of an array can always be obtained from the built-in functions MIN() and MAX() over the dimensions Array Slicing An ARRAY object can be considered an array of records in programming language terms. Therefore, the language supports positional index access conforming to the order the dimensions are introduced in the array definition. All attributes (dimensional and non-dimensional) of interest should be explicitly identified. A range pattern, borrowed from the programming language arena, supports easy slicing over individual dimensions using the aforementioned sequence pattern [<start>:<step>:<stop>]. The range pattern is allowed in both the FROM and GROUP BY clauses. To illustrate this, we show a few slicing expressions over the arrays defined earlier (results are computed based on Fig. 3.2-a). SELECT * FROM matrix[3][2]; -- yields: (3, 2, 5.0) SELECT v FROM matrix[*][1:3]; -- yields: (-1.0), (-2.0), (0.0), (-1.0), (3.0), (0.0),(4.0),(5.0) SELECT v FROM matrix[0:2:4][0:2:4]; -- yields: (0.0), (-2.0), (2.0), (0.0) The SQL UPDATE statement is extended to take array expressions directly. This leads to a more convenient and compact notation in many situations. The bounds of the subarray are specified by a sequence pattern of literals. Again, a sequence of updates act as a guarded function. The array dimensions are used as bound variables that run over all valid dimension values. This is illustrated using the queries below: UPDATE matrix SET matrix[0:2][*].v = v * 1.19; UPDATE matrix SET matrix[x][*].v = CASE WHEN v < 0 THEN x WHEN v >10 THEN 10 * x ELSE 0 END; D5.1 An implementation of ad-hoc array queries on top of MonetDB 12

20 y x Anchor point (a) y x Anchor point (b) y Anchor 0.4 point x (c) y Anchor point x (d) Figure 3.4: SciQL Array Tiling Array Views A common case is to embed an array into a larger one, such that a zero initialized bounding border is created, or to shift a vector before moving averages are calculated. To avoid possible significant data movements, the array VIEW constructor can be used instead. The first two queries below illustrate an embedding, i.e., to transpose and shift an array, respectively. In the SELECT clause, the x and y columns are used to identify the cells in the vmatrix to be updated. The last example illustrates how the aforementioned example of shift with zero fill of a column (see Section 3.2.2, second query group) can be modeled as a view. Note that the results of all SELECT statements in the examples below are tables, thus in the third query, the ordinary SQL UNION semantics applies. CREATE VIEW ARRAY vmatrix ( x INT DIMENSION[-1:1:5], y INT DIMENSION[-1:1:5], w FLOAT DEFAULT 0.0 ) AS SELECT y, x, v FROM matrix; CREATE VIEW ARRAY vector ( x INT DIMENSION[-1:1:5], w FLOAT DEFAULT 0.0 ) AS SELECT A.x, (A.v+B.v)/2 FROM matrix AS A JOIN (SELECT x+1 AS x, v FROM matrix) AS B ON A.x = B.x; CREATE VIEW ARRAY vmatrix2 ( x INT DIMENSION[-1:1:5], y INT DIMENSION[-1:1:5], w FLOAT DEFAULT 0.0 ) AS SELECT x, y, v FROM matrix WHERE x < 2 UNION SELECT x-1, y, v FROM matrix WHERE x > 2 UNION SELECT x, y, 0.0 FROM matrix WHERE x = 3; Aggregate Tiling A key operation in science applications is to perform statistics on groups. They are commonly identified by an attribute or expression list in a GROUP BY clause. This value-based grouping can be extended to structural grouping for ARRAYs in a natural way. Large arrays are often broken into smaller pieces before being aggregated or overlaid with a structure to calculate, e.g., a Gaussian kernel function. SciQL supports fine-grained control over breaking an array into possibly overlapping tiles using a slight variation of the SQL GROUP BY clause semantics. Therefore, the attribute list is replaced by a parametrized series of array elements, called tiles. Tiling starts with an anchor point identified by its dimensional value(s), which is extended with a list of cell denotations relative to the anchor point. The value derived from a group aggregation is associated with the dimensional value(s) of the anchor point. D5.1 An implementation of ad-hoc array queries on top of MonetDB 13

21 y y 3 y y x (a) (b) x x (c) (d) x Figure 3.5: Results of computing AVG() over the tiles. Consider a 4 4 matrix and tiling it with a 2 2 matrix by extending the anchor point matrix[x][y] with structure elements matrix[x+1][y], matrix[x][y+1], and matrix[x+1][y+1]. The tiling operation performs a grouping for every valid anchor point on the actual array dimensions. Figure 3.4-a shows the first four tiles created. The individual elements of a group need not belong to the domain of the array dimensions, but then their values are assumed to be the outer NULL value, which are ignored in the statistical aggregate operations. This way we break the matrix array into 16 overlapping tiles. The number can be reduced by explicitly calling for DISTINCT tiles. This leads to considering each cell for one tile only, leaving a hole behind for the next candidate tile. Furthermore, in this case all tiles with holes do not participate in the result set. This means that for irregularly formed tiles there is no guarantee that all array cells are taking part in the grouping. The dimension range sequence pattern can be used to concisely define all values of interest. The following queries create the tiles on matrix as depicted in Figure 3.4 (in the order from left to right). The query results are shown in Figure 3.5. SELECT [x], [y], AVG(v) FROM matrix GROUP BY matrix[x:x+2][y:y+2]; SELECT [x], [y], AVG(v) FROM matrix GROUP BY DISTINCT matrix[x:x+2][y:y+2]; SELECT [x], [y], AVG(v) FROM matrix GROUP BY matrix[x-1:x+1][y-1:y+1]; SELECT [x], [y], AVG(v) FROM matrix[1:*][1:*] GROUP BY DISTINCT matrix[x][y], matrix[x-1][y], matrix[x+1][y], matrix[x][y-1], matrix[x][y+1]; A recurring operation is to derive check sums over array slabs. In SciQL this can be achieved with a simple tiling on, e.g., the x dimension. In this case, the anchor point is the value of x. For example: SELECT [x], SUM(v) FROM matrix GROUP BY matrix[x][*]; A discrete convolution operation is only slightly more complex. For, consider each element to be replaced by the average of its neighboring elements. The extended matrix vmatrix is used to calculate the convolution, because it ensures a zero value for all boundary elements. The aggregates outside the bounds [0:4][0:4] are not calculated by using an array slicing in the FROM clause. SELECT [x], [y], AVG(v) FROM vmatrix[0:4][0:4] GROUP BY vmatrix[x-1:1:x+2][y-1:1:y+2]; D5.1 An implementation of ad-hoc array queries on top of MonetDB 14

22 Value based selection and structure based selection can be combined. An example is the nearest neighbor search, where the structure dictates the context over which a metric function is evaluated. Most systems dealing with feature vectors deploy a default metric, e.g., the Euclidean distance. The example below assumes such a distance function that takes an argument?v as the reference vector. It generates a listing of all columns with the distance from the reference vector. Ranking the result produces the K-nearest neighbors. SELECT x, distance(matrix,?v) AS dist FROM matrix GROUP BY matrix[x][*] ORDER BY dist LIMIT 10; Using the dimension values in the grouping clause permits complex structures to be defined. It generalizes the SQL:2003 windowing functions, which are limited to aggregations over sliding windows with static bounds and shift count over a sequence. The SciQL approach can be generalized to support the equivalent of mask-based tile selections. For this we simply need a table with dimension values, which are used within the GROUP BY clause as a pattern to search for. D5.1 An implementation of ad-hoc array queries on top of MonetDB 15

23 4. Implementation In our implementation of SciQL we aimed at leveraging as much of the existing functionality of MonetDB as possible. In particular the fact that MonetDB uses BATs, which are physically represented as consecutive C arrays, suggested it as a good basis to implement SciQL. In practice, though, the administrative part of the software stack, in particular dealing with syntactic similarity but semantic difference, and vice versa, between SQL and SciQL, turned out to be a much bigger challenge than the core data-processing primitives. Figure 4.1 is a revised version of Figure 2.2 (page 5), highlighting the components of the MonetDB software stack that we needed to modify and extend to implement SciQL as extension of SQL. With SciQL being designed as extension of SQL, there was no need to change or extend the general software architecture. Only existing components and data structures had to be adapted and extended to accommodate SciQL. In the following, we discuss the changes top-down, starting with the parser in general, before taking some indicative dives deeper into the software stack to examine the implementation of the most prominent SciQL features supported so far. During the implementation of MonetDB/SciQL we have given priority to the language features required by the TELEIOS project, and in particular the NOA use case. Legend Resultset SQL/SciQL Parser Client SQL/SciQL Query Functional Component Data Structure Modified for SciQL Support Syntax Tree SQL/SciQL Compiler SQL/SciQL Catalog MAL Generator Relational Algebra MAL Program MAL Optimizers GDK Kernel MAL Interpreter BATs BATs BATs BATs Figure 4.1: The revised & extended MonetDB Architecture to support SciQL D5.1 An implementation of ad-hoc array queries on top of MonetDB 16

24 4.1 Extending the SQL Parser Though not all features of SciQL are supported in their full variety in this first phase of the implementation, we extended the MonetDB SQL parser to handle the complete SciQL syntax as defined in Appendix A of [KNZ + 11]. We ensure to gracefully emit suitable error messages to indicate not yet entirely implemented features. Likewise, we also extended the syntax tree to accommodate SciQL-specific operators and distinguish arrays from tables. 4.2 Extending the SQL Catalog MonetDB manages data definition information like table schemas, metadata and secondary structures in the SQL Catalog. Naturally, any kind of extension to handling of such objects must be accompanied with appropriate changes in the catalog. The extension of the SQL catalog to accommodate SciQL arrays next to SQL tables was rather straight forward. We extended the definition of a persistent database object to include arrays in addition to tables. In the _tables metadata, we added two additional attributes to hold (1) the number of dimensions (nr_dimensons), which also serves to mark a data object as an array, and (2) the type of the array being fixed or unbounded (fixed). Likewise, all array attributes are generally treated like table columns, and thus added to the _columns metatable. In addition, a new catalog table _dimensions is maintained to store meta information about dimension columns (to distinguish them from ordinary columns). As an example, consider a CREATE ARRAY statement that generates a 3-dimensional 2x3x2 array with 2 payload elements per cell. It adds one tuple to the _tables catalog table, five tuples to the _columns catalog table, and three tuples to the _dimensions catalog table describing the three dimension ranges: CREATE ARRAY a ( x INTEGER DIMENSION [2], y INTEGER DIMENSION [3], z INTEGER DIMENSION [2], v REAL DEFAULT 1.2, w DOUBLE DEFAULT 3.4 ); SELECT * FROM _tables T WHERE T.name = a ; id name schema_ query type system commit_ readonly fixed nr_dime : : : id : : : : action : : : nsions : +======+======+=========+=======+======+========+=========+==========+=======+========= a false 0 false true SELECT C.* FROM _tables T, _columns C WHERE T.name = a AND T.id = C.table_id; id name type type_d type_s table_ default number storage : : : : igits : cale : id : : : : : +======+======+========+========+========+========+=========+=======+========+========= x int true y int true z int true v float true w double true SELECT D.* FROM _tables T, _columns C, _dimensions D WHERE T.name = a AND T.id = C.table_id AND C.id = D.column_id; column_id start step stop +===========+=======+======+====== D5.1 An implementation of ad-hoc array queries on top of MonetDB 17

SciQL, A Query Language for Science Applications

SciQL, A Query Language for Science Applications SciQL, A Query Language for Science Applications M. Kersten, Y. Zhang, M. Ivanova, N. Nes CWI, Netherlands ABSTRACT Scientific applications are still poorly served by contemporary relational database systems.

More information

SciQL, A Query Language for Science Applications

SciQL, A Query Language for Science Applications SciQL, A Query Language for Science Applications M. Kersten, N. Nes, Y. Zhang, M. Ivanova CWI, Netherlands ABSTRACT Scientific applications are still poorly served by contemporary relational database systems.

More information

Column-Oriented Database Systems. Liliya Rudko University of Helsinki

Column-Oriented Database Systems. Liliya Rudko University of Helsinki Column-Oriented Database Systems Liliya Rudko University of Helsinki 2 Contents 1. Introduction 2. Storage engines 2.1 Evolutionary Column-Oriented Storage (ECOS) 2.2 HYRISE 3. Database management systems

More information

SciQL A Query Language for Science Applications M. Kersten, Y. Zhang, M. Ivanova, N. Nes CWI Amsterdam Array Database Workshop March 25th, 2011

SciQL A Query Language for Science Applications M. Kersten, Y. Zhang, M. Ivanova, N. Nes CWI Amsterdam Array Database Workshop March 25th, 2011 SciQL A Quer Language for Science Applications M. Kersten, Y. Zhang, M. Ivanova, N. Nes CWI Amsterdam Arra Database Workshop March 5th, Who needs arras anwa? Seismolog Astronom Climate simulation Remote

More information

Array QL Syntax. Draft 4 September 10, <

Array QL Syntax. Draft 4 September 10, < Array QL Syntax Draft 4 September 10, 2012 Comments on this draft should be sent to arraydb-l@slac.stanford.edu Contributors: K.-T. Lim, D. Maier, J. Becla for XLDB M. Kersten, Y.

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Database Technology Introduction. Heiko Paulheim

Database Technology Introduction. Heiko Paulheim Database Technology Introduction Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query Processing Transaction Manager Introduction to the Relational Model

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

MonetDB: Open-source Columnar Database Technology Beyond Textbooks

MonetDB: Open-source Columnar Database Technology Beyond Textbooks MonetDB: Open-source Columnar Database Technology Beyond Textbooks http://wwwmonetdborg/ Stefan Manegold StefanManegold@cwinl http://homepagescwinl/~manegold/ >5k downloads per month Why? Why? Motivation

More information

Implementation Techniques

Implementation Techniques V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Appendix: Generic PbO programming language extension

Appendix: Generic PbO programming language extension Holger H. Hoos: Programming by Optimization Appendix: Generic PbO programming language extension As explained in the main text, we propose three fundamental mechanisms to be covered by a generic PbO programming

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

IEEE LANGUAGE REFERENCE MANUAL Std P1076a /D3

IEEE LANGUAGE REFERENCE MANUAL Std P1076a /D3 LANGUAGE REFERENCE MANUAL Std P1076a-1999 2000/D3 Clause 10 Scope and visibility The rules defining the scope of declarations and the rules defining which identifiers are visible at various points in the

More information

6.001 Notes: Section 8.1

6.001 Notes: Section 8.1 6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Advanced Data Management Technologies Written Exam

Advanced Data Management Technologies Written Exam Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

UML-Based Conceptual Modeling of Pattern-Bases

UML-Based Conceptual Modeling of Pattern-Bases UML-Based Conceptual Modeling of Pattern-Bases Stefano Rizzi DEIS - University of Bologna Viale Risorgimento, 2 40136 Bologna - Italy srizzi@deis.unibo.it Abstract. The concept of pattern, meant as an

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Symbol Tables Symbol Table: In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is

More information

The SPL Programming Language Reference Manual

The SPL Programming Language Reference Manual The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming

More information

The TTC 2014 Movie Database Case: Rascal Solution

The TTC 2014 Movie Database Case: Rascal Solution The TTC 2014 Movie Database Case: Rascal Solution Pablo Inostroza Tijs van der Storm Centrum Wiskunde & Informatica (CWI) Amsterdam, The Netherlands pvaldera@cwi.nl Centrum Wiskunde & Informatica (CWI)

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

And Parallelism. Parallelism in Prolog. OR Parallelism

And Parallelism. Parallelism in Prolog. OR Parallelism Parallelism in Prolog And Parallelism One reason that Prolog is of interest to computer scientists is that its search mechanism lends itself to parallel evaluation. In fact, it supports two different kinds

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Structural and Syntactic Pattern Recognition

Structural and Syntactic Pattern Recognition Structural and Syntactic Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent

More information

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata COGNOS (R) 8 FRAMEWORK MANAGER GUIDELINES FOR MODELING METADATA Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata GUIDELINES FOR MODELING METADATA THE NEXT LEVEL OF PERFORMANCE

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

TELEIOS FP Deliverable 3.1. KDD concepts and methods proposal: report & design recommendations

TELEIOS FP Deliverable 3.1. KDD concepts and methods proposal: report & design recommendations TELEIOS FP7-257662 Deliverable 3.1 KDD concepts and methods proposal: report & design recommendations Corneliu Octavian Dumitru, Daniela Espinoza Molina, Shiyong Cui, Jagmal Singh, Marco Quartulli 1, Mihai

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data

More information

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag. Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

SQL for Palm Zhiye LIU MSc in Information Systems 2002/2003

SQL for Palm Zhiye LIU MSc in Information Systems 2002/2003 Zhiye LIU MSc in Information Systems 2002/2003 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others.

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Basant Group of Institution

Basant Group of Institution Basant Group of Institution Visual Basic 6.0 Objective Question Q.1 In the relational modes, cardinality is termed as: (A) Number of tuples. (B) Number of attributes. (C) Number of tables. (D) Number of

More information

April Copyright 2013 Cloudera Inc. All rights reserved.

April Copyright 2013 Cloudera Inc. All rights reserved. Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on

More information

Introduction to Computer Science and Business

Introduction to Computer Science and Business Introduction to Computer Science and Business The Database Programming with PL/SQL course introduces students to the procedural language used to extend SQL in a programatic manner. This course outline

More information

DBMS (FYCS) Unit - 1. A database management system stores data in such a way that it becomes easier to retrieve, manipulate, and produce information.

DBMS (FYCS) Unit - 1. A database management system stores data in such a way that it becomes easier to retrieve, manipulate, and produce information. Prof- Neeta Bonde DBMS (FYCS) Unit - 1 DBMS: - Database is a collection of related data and data is a collection of facts and figures that can be processed to produce information. Mostly data represents

More information

Mahathma Gandhi University

Mahathma Gandhi University Mahathma Gandhi University BSc Computer science III Semester BCS 303 OBJECTIVE TYPE QUESTIONS Choose the correct or best alternative in the following: Q.1 In the relational modes, cardinality is termed

More information

Microsoft. [MS20762]: Developing SQL Databases

Microsoft. [MS20762]: Developing SQL Databases [MS20762]: Developing SQL Databases Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server Delivery Method : Instructor-led (Classroom) Course Overview This five-day

More information

The SQL data-definition language (DDL) allows defining :

The SQL data-definition language (DDL) allows defining : Introduction to SQL Introduction to SQL Overview of the SQL Query Language Data Definition Basic Query Structure Additional Basic Operations Set Operations Null Values Aggregate Functions Nested Subqueries

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

The PCAT Programming Language Reference Manual

The PCAT Programming Language Reference Manual The PCAT Programming Language Reference Manual Andrew Tolmach and Jingke Li Dept. of Computer Science Portland State University September 27, 1995 (revised October 15, 2002) 1 Introduction The PCAT language

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

RAISE in Perspective

RAISE in Perspective RAISE in Perspective Klaus Havelund NASA s Jet Propulsion Laboratory, Pasadena, USA Klaus.Havelund@jpl.nasa.gov 1 The Contribution of RAISE The RAISE [6] Specification Language, RSL, originated as a development

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Haskell 98 in short! CPSC 449 Principles of Programming Languages

Haskell 98 in short! CPSC 449 Principles of Programming Languages Haskell 98 in short! n Syntax and type inferencing similar to ML! n Strongly typed! n Allows for pattern matching in definitions! n Uses lazy evaluation" F definition of infinite lists possible! n Has

More information

Spatially-Aware Information Retrieval on the Internet

Spatially-Aware Information Retrieval on the Internet Spatially-Aware Information Retrieval on the Internet SPIRIT is funded by EU IST Programme Contract Number: Abstract Multi-Attribute Similarity Ranking Deliverable number: D17:5301 Deliverable type: R

More information

6.001 Notes: Section 6.1

6.001 Notes: Section 6.1 6.001 Notes: Section 6.1 Slide 6.1.1 When we first starting talking about Scheme expressions, you may recall we said that (almost) every Scheme expression had three components, a syntax (legal ways of

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Chapter 3: Introduction to SQL

Chapter 3: Introduction to SQL Chapter 3: Introduction to SQL Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 3: Introduction to SQL Overview of the SQL Query Language Data Definition Basic Query

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

20762B: DEVELOPING SQL DATABASES

20762B: DEVELOPING SQL DATABASES ABOUT THIS COURSE This five day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL Server 2016 database. The course focuses on teaching individuals how to

More information

Cost Models for Query Processing Strategies in the Active Data Repository

Cost Models for Query Processing Strategies in the Active Data Repository Cost Models for Query rocessing Strategies in the Active Data Repository Chialin Chang Institute for Advanced Computer Studies and Department of Computer Science University of Maryland, College ark 272

More information

Automating Information Lifecycle Management with

Automating Information Lifecycle Management with Automating Information Lifecycle Management with Oracle Database 2c The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

Introduction to Computer Science and Business

Introduction to Computer Science and Business Introduction to Computer Science and Business This is the second portion of the Database Design and Programming with SQL course. In this portion, students implement their database design by creating a

More information

Week. Lecture Topic day (including assignment/test) 1 st 1 st Introduction to Module 1 st. Practical

Week. Lecture Topic day (including assignment/test) 1 st 1 st Introduction to Module 1 st. Practical Name of faculty: Gaurav Gambhir Discipline: Computer Science Semester: 6 th Subject: CSE 304 N - Essentials of Information Technology Lesson Plan Duration: 15 Weeks (from January, 2018 to April, 2018)

More information

Object Query Standards by Andrew E. Wade, Ph.D.

Object Query Standards by Andrew E. Wade, Ph.D. Object Query Standards by Andrew E. Wade, Ph.D. ABSTRACT As object technology is adopted by software systems for analysis and design, language, GUI, and frameworks, the database community also is working

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1 Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished

More information

Overview. Database Application Development. SQL in Application Code. SQL in Application Code (cont.)

Overview. Database Application Development. SQL in Application Code. SQL in Application Code (cont.) Overview Database Application Development Chapter 6 Concepts covered in this lecture: SQL in application code Embedded SQL Cursors Dynamic SQL JDBC SQLJ Stored procedures Database Management Systems 3ed

More information

Database Application Development

Database Application Development Database Application Development Chapter 6 Database Management Systems 3ed 1 Overview Concepts covered in this lecture: SQL in application code Embedded SQL Cursors Dynamic SQL JDBC SQLJ Stored procedures

More information

Database Application Development

Database Application Development Database Application Development Chapter 6 Database Management Systems 3ed 1 Overview Concepts covered in this lecture: SQL in application code Embedded SQL Cursors Dynamic SQL JDBC SQLJ Stored procedures

More information

Query Processing Models

Query Processing Models Query Processing Models Holger Pirk Holger Pirk Query Processing Models 1 / 43 Purpose of this lecture By the end, you should Understand the principles of the different Query Processing Models Be able

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

ISO. International Organization for Standardization. ISO/IEC JTC 1/SC 32 Data Management and Interchange WG4 SQL/MM. Secretariat: USA (ANSI)

ISO. International Organization for Standardization. ISO/IEC JTC 1/SC 32 Data Management and Interchange WG4 SQL/MM. Secretariat: USA (ANSI) ISO/IEC JTC 1/SC 32 N 0736 ISO/IEC JTC 1/SC 32/WG 4 SQL/MM:VIE-006 January, 2002 ISO International Organization for Standardization ISO/IEC JTC 1/SC 32 Data Management and Interchange WG4 SQL/MM Secretariat:

More information

UDP Packet Monitoring with Stanford Data Stream Manager

UDP Packet Monitoring with Stanford Data Stream Manager UDP Packet Monitoring with Stanford Data Stream Manager Nadeem Akhtar #1, Faridul Haque Siddiqui #2 # Department of Computer Engineering, Aligarh Muslim University Aligarh, India 1 nadeemalakhtar@gmail.com

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Chapter 3: Operators, Expressions and Type Conversion

Chapter 3: Operators, Expressions and Type Conversion 101 Chapter 3 Operators, Expressions and Type Conversion Chapter 3: Operators, Expressions and Type Conversion Objectives To use basic arithmetic operators. To use increment and decrement operators. To

More information

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence What s a database system? Review of Basic Database Concepts CPS 296.1 Topics in Database Systems According to Oxford Dictionary Database: an organized body of related information Database system, DataBase

More information

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON.

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON. Fundamentals of Database Systems 5th Edition Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B. Navathe College of Computing Georgia Institute

More information

Single-pass Static Semantic Check for Efficient Translation in YAPL

Single-pass Static Semantic Check for Efficient Translation in YAPL Single-pass Static Semantic Check for Efficient Translation in YAPL Zafiris Karaiskos, Panajotis Katsaros and Constantine Lazos Department of Informatics, Aristotle University Thessaloniki, 54124, Greece

More information

Object-oriented Compiler Construction

Object-oriented Compiler Construction 1 Object-oriented Compiler Construction Extended Abstract Axel-Tobias Schreiner, Bernd Kühl University of Osnabrück, Germany {axel,bekuehl}@uos.de, http://www.inf.uos.de/talks/hc2 A compiler takes a program

More information

«Computer Science» Requirements for applicants by Innopolis University

«Computer Science» Requirements for applicants by Innopolis University «Computer Science» Requirements for applicants by Innopolis University Contents Architecture and Organization... 2 Digital Logic and Digital Systems... 2 Machine Level Representation of Data... 2 Assembly

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

UNIT-II. Part-2: CENTRAL PROCESSING UNIT Page1 UNIT-II Part-2: CENTRAL PROCESSING UNIT Stack Organization Instruction Formats Addressing Modes Data Transfer And Manipulation Program Control Reduced Instruction Set Computer (RISC) Introduction:

More information

Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery

Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Introduction Problems & Solutions Join Recognition Experimental Results Introduction GK Spring Workshop Waldau: Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Database & Information

More information

EDMS. Architecture and Concepts

EDMS. Architecture and Concepts EDMS Engineering Data Management System Architecture and Concepts Hannu Peltonen Helsinki University of Technology Department of Computer Science Laboratory of Information Processing Science Abstract

More information

Optimizing Testing Performance With Data Validation Option

Optimizing Testing Performance With Data Validation Option Optimizing Testing Performance With Data Validation Option 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

D WSMO Data Grounding Component

D WSMO Data Grounding Component Project Number: 215219 Project Acronym: SOA4All Project Title: Instrument: Thematic Priority: Service Oriented Architectures for All Integrated Project Information and Communication Technologies Activity

More information

MIDTERM EXAM (Solutions)

MIDTERM EXAM (Solutions) MIDTERM EXAM (Solutions) Total Score: 100, Max. Score: 83, Min. Score: 26, Avg. Score: 57.3 1. (10 pts.) List all major categories of programming languages, outline their definitive characteristics and

More information

Developing SQL Databases

Developing SQL Databases Course 20762B: Developing SQL Databases Page 1 of 9 Developing SQL Databases Course 20762B: 4 days; Instructor-Led Introduction This four-day instructor-led course provides students with the knowledge

More information

1. Data Definition Language.

1. Data Definition Language. CSC 468 DBMS Organization Spring 2016 Project, Stage 2, Part 2 FLOPPY SQL This document specifies the version of SQL that FLOPPY must support. We provide the full description of the FLOPPY SQL syntax.

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Basics of Java: Expressions & Statements. Nathaniel Osgood CMPT 858 February 15, 2011

Basics of Java: Expressions & Statements. Nathaniel Osgood CMPT 858 February 15, 2011 Basics of Java: Expressions & Statements Nathaniel Osgood CMPT 858 February 15, 2011 Java as a Formal Language Java supports many constructs that serve different functions Class & Interface declarations

More information

Chapter 11: Query Optimization

Chapter 11: Query Optimization Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming

More information

Chapter 2 Overview of the Design Methodology

Chapter 2 Overview of the Design Methodology Chapter 2 Overview of the Design Methodology This chapter presents an overview of the design methodology which is developed in this thesis, by identifying global abstraction levels at which a distributed

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 6 Basic SQL Slide 6-2 Chapter 6 Outline SQL Data Definition and Data Types Specifying Constraints in SQL Basic Retrieval Queries in SQL INSERT, DELETE, and UPDATE Statements in SQL Additional Features

More information

Language Basics. /* The NUMBER GAME - User tries to guess a number between 1 and 10 */ /* Generate a random number between 1 and 10 */

Language Basics. /* The NUMBER GAME - User tries to guess a number between 1 and 10 */ /* Generate a random number between 1 and 10 */ Overview Language Basics This chapter describes the basic elements of Rexx. It discusses the simple components that make up the language. These include script structure, elements of the language, operators,

More information