XMLDBMS. Computer Science 764. December 22, Kevin Beach, Vuk Ercegovac, Michael Henderson, Amy Rea, Suan Yong

Size: px
Start display at page:

Download "XMLDBMS. Computer Science 764. December 22, Kevin Beach, Vuk Ercegovac, Michael Henderson, Amy Rea, Suan Yong"

Transcription

1 XMLDBMS Computer Science 764 December 22, 1998 Kevin Beach, Vuk Ercegovac, Michael Henderson, Amy Rea, Suan Yong

2 Introduction: XML-QL is a query language for obtaining data from XML documents on the World Wide Web. From a database viewpoint, an XML document serves as a database from which a query will extract results. While the semi-structured nature of XML lends itself to an object data model, the relational data model has been shown to perform well with queries posed over large data sets. Thus, we have designed an implemented a simple database system that executes relational-like queries over XML data sets that have been transformed into the relational model. Specifically, we execute XML-QL queries in a system, which dynamically loads and transforms XML data sets into relations. The queries are transformed into intermediate execution plans from which an optimizer will produce a less costly plan to access the relations with RDBMS-like operators. Since we are primarily interested in issues concerning the use of relations to store and query XML data sets, we do not handle issues relating to recovery, concurrency, or the use of secondary and non-volatile storage. This decision is also supported by the expected normal usage of such a system: the intended user is an XML surfer who, given a set of XML documents, poses queries in XML-QL via a applet in a browser that can display the results of the query. In essence, the system serves as an XML document filter that transforms XML data sets into relations to facilitate more efficient processing. We have initially developed our system to support only a subset of the features provided by XML-QL. Supporting the complete XML-QL specifications is not necessary to achieve our goals. With respect to the query language, we have implemented the features that demonstrate most completely the querying aspect of the language and not the data manipulation aspect. As such, the optimizer will only be able to take advantage of operators for which language support

3 has been added. Similarly, the GUI attempts to provide a clean interface for constructing queries and displaying results in a straightforward way. We do not deal with the problem of displaying XML graphically. Our goal is to build a system with which we can attain some insight into the design considerations that arise when using relations to store and query XML data sets. Architecture Overview: Figure 1 is a schematic of the XMLDBMS system, showing the steps involved in processing a query. Initially, the client applet submits to the server an XML-QL query (or a document with an embedded query). The server strips out the query and forwards it to the XML- QL to SQL translator. The translator identifies the URLs of the XML documents that the query needs, and tells the storage manager. The storage manager will load the DTD document associated with the URL and convert it into an internal schema data structure. The storage manager will also get the catalog associated with the data in the XML document (at present, we load the document and build the catalog from scratch; in the future we envision having precomputed catalog information stored in a separate file. See Future Work). The schema and catalog is returned to the translator, which uses the schema to verify the validity of the XML-QL query. The translator then produces an SQL query, and combines the catalogs it has collected into a single catalog. The SQL query and catalog is passed on to the query optimizer, which generates the execution plan. The plan execution component obtains the tables from the storage manager (which fetches the XML document and translates it into an internal table data structure) and produces a resultant table that is returned to the translator. The results are then converted into the desired XML formatting and returned to the server, which passes it along (or embeds it into the document containing the embedded XML-QL query) to the client applet.

4 Client (applet) (fetch DTD document) (fetch XML document) XML-QL query Server XML results XML-QL formatted results URL DTD schema (build catalog) schema, catalog SQL, catalog Storage Manager plan Translator Optimizer Plan Execution result table XML tables table name table Figure 1 - flowchart of the XMLDBMS system The Storage Manager The XMLDBMS storage manager plays the role of a buffer manager for data that could potentially be scattered throughout the web. Specifically, it is responsible for acquiring, for a given XML document, a schema, a catalog, and a table containing the data in that document. It is also in charge of assigning to each XML document a unique page ID that is prepended to the name of each attribute in that document s table. This is to ensure, for example, that two XML documents contain tables that happen to have the same name will have different internal names. When the schema for a given document is needed, the storage manager will fetch the DTD for that document, and the DTD parser translates it into an internal schema data structure (which is actually just a table). At present we assume the DTD for a given document is in a separate file in the same directory, and has the filename of the document plus a.dtd suffix (e.g., the DTD for the file is in ). When the

5 table for a given XML document is needed, the storage manager fetches the document and gives it to the XML parser, which builds the tables associated with that document. When the catalog for a given document is needed, the storage manager will get the tables associated with the document and build a catalog from scratch. We treat the fetching of the catalog as a separate functionality of the storage manager because a possible extension of this project is to have the query execution distributed among multiple servers. In this case, it would be desirable to be able to obtain the catalog for a given XML document without having to fetch the document itself (the catalog information would, for example, be stored in a separate file, like the DTD). We describe this extension further in Future Work. Our current implementation of the storage manager caches the schema, tables, and catalogs it has built. This is desirable if we assume that when a client query over a given XML document is likely to make more queries over the same document. This also assumes tables fit in memory. In the current implementation the cache is never flushed. Possible future work could be to incorporate a more sophisticated buffer management system that could delete stale tables from the cache, or potentially to support tables that do not fit in memory. The XML-QL to SQL Translator The translator component of XMLDBMS uses an XML-QL parser that was constructed using the ANTLR parser-generating tool [Ant98]. The grammar for XML-QL as presented by Deutsch et al. in the W3C proposal [DFF+98] is incomplete, buggy, and at times confusing. As such our parser supports a modified subset of XML-QL, the grammar for which is given in

6 Table 1. In particular, we have excluded support for i) functions; ii) nested queries and query blocks; iii) Skolem functions; iv) Regular path expressions. Additionally, we do not at present support the use of tag variables in queries. queryblock ::= where ( orderby )? construct where ::= "WHERE" condition ("," condition )* condition ::= element "IN" datasource predicate element ::= starttag ( STRING LITERAL VAR ( element )+ ) endtag (( "ELEMENT_AS" VAR ) ( "CONTENT_AS" VAR ))* starttag ::= "<" ( VAR ID ) ( attribute )* ">" endtag ::= "</" ( VAR ID )? ">" attribute ::= ID "=" ( STRING VAR ) datasource ::= VAR STRING predicate ::= expression oprel expression expression ::= VAR STRING LITERAL oprel ::= "<" "<=" ">" ">=" "=" "!=" orderby ::= "ORDER-BY" ( VAR ) ("," VAR )* construct ::= "CONSTRUCT" ( result VAR ) result ::= starttag ( STRING LITERAL ( VAR result )+ ) endtag Table 1 - subset of XML-QL grammar supported by XMLDBMS. The parser builds an abstract syntax tree (AST) representing the XML-QL query, which at the root level consists of a "WHERE" clause and a "CONSTRUCT" clause. The translator

7 walks through the "WHERE" clause to first identify the URLs of the datasources over which the query is searching. It then requests from the storage manager the schemata (from the DTDs) and catalog information for the datasources. Note that the storage manager will first check its cache to see if the information has been previously loaded. The storage manager will also assign to each datasource a unique internal identifier (we use strings of the form pagen ) which is prepended to the name of each table in that datasource. This is to ensure that each table in the storage manager can be uniquely identified (specifically, we will not be confused if two different XML pages contain tables with the same name). The schemata are then used to verify the validity of the query, i.e. the translator checks to see if the elements described in the query does indeed exist in the schema of its datasource. After this, the translator can translate the "WHERE" clause of the query into a SQL query. This SQL query, along with the catalogs, is fed into the plan generation and execution components of XMLDBMS. The plan-generation component (the query optimizer) uses the catalogs to generate a plan tree, which is used by the plan-execution component to fetch the appropriate tables (through the storage manager) and perform the required operations to produce a result table. This result table is returned unprojected to the translator. The final task of the translator is to walk through the "CONSTRUCT" clause of the AST of the query, which describes the desired (XML) output format of the results. The translator converts the result table into a string containing the formatted (and projected) results and returns it to the server front end.

8 Parsing the DTD The DTD was parsed using a third party open source parser, which can be found at the following URL: ( Reading in the DTD and creating the schema of the database simply involves traversing the parse tree that is created by the DTD parser. The first level of children in the tree represents each relation in the database. For each node at this level, a new relation is created and is placed at the end of a vector stored by the DBMS. It is possible that a node at this level does not need to generate a new table, however we leave this to future generations of the software to make that decision. The second level of children in the tree represents the elements and attributes for each relation (the field names). Two vectors are maintained in each relation: one that stores the names of the elements/attributes and the other to store the corresponding type of the element/attribute. At this point the tables are unaware of any links between each other. <!ELEMENT book (author+, title, publisher)> <!ATTLIST book year CDATA #REQUIRED> <!ELEMENT publisher (name, address)> <!ELEMENT author (firstname?, lastname)> DTD Node BOOK AUTHOR PUBLISHER title year AUTHOR PUBLISHER firstname lastname name address Figure 2. Example DTD Parse Tree

9 Translating XML to a Relational Database Reading in the actual data from the XML page follows a similar process. An instance of the XML parser is needed to fetch the data into the tables, and a new parse tree is therefore created. The actual nodes created by the data parser contain a lot of information, but we only needed to use a small amount of the features. For this project the key features for each node in this tree are Node Type, Node Name, and a possible set of children. A depth-first approach was used to traverse the tree. The parser does have methods to examine sibling nodes very easily, so a breadth-first traversal would also have worked; however the depth first was more intuitive to code. As the parser traverses the tree, if the Node Name matches one of the table names in the schema of the database, a new record is created. All of the fields in a newly created record are initialized to null. (The way that the XML DTD is set up, there is no possibility of duplicate table naming, nor is there any crossover in field/relation names.) At this point, the children of this node are checked to see if their Node Names match any of the column names in this relation. If a node is found that does not match it means that this document is not consistent with the DTD that it specified. If the Node Name does match one of the column headings and the node only has one child, that child is in fact the text/data associated with this node. All of the text that is stored in the database is found in such leaf nodes. In this case, the text is just added to the current record. When the tree traversal pointer goes back up to the main parent (the table Name), the record is then appended to the table. Using the example in figure 3, if one of the nodes in level 2 has more than one child, this means that the node is the parent node of a new record. In this case, a new record is created to obtain the data in the children. The name of the Relation and the schema of the relation that the child record belongs to is stored in the parent node. In order to

10 link the parent field to the nested record, the text for this field is an id value that represents the record that is now stored in the child relation. The type of this field is also changed to lookup. An integer value of type lookup is actually an index to the child records. The way that the parse tree is set up naturally lends itself to having set valued attributes. Since our project is designed as relational model, a second pass through the data to break down the records was needed. Also, during the second pass is the best time to test for integrity constraints on the data since the data parser interpret everything as text. Book Level 1 title year AUTHOR PUBLISHER Level 2 data data firstname lastname name address Level 3 data data data data Figure 3. Example XML Parse Tree Catalogs Catalogs are stored as a set of three relations in the DBMS. Every XML page that is fetched results in the generation of a set for the XML data that is transformed into relational tables using the schema in Figure 4.

11 Relations Schema Relation Name Number of Tuples Number of Attributes Indices Schema Index Name Relation Name Key Num. Of Entries Num. Unique Max Value Min Value Index Type Attribute Schema Attribute Name Relation Name Attribute Type Figure 4. Schemas for the catalogs used by XMLDBMS As the set of relations is generated it is appended to a master catalog set in the storage manager. Because our array-based approach to relations automatically keeps track of the number of tuples in a relation, generating the catalogs is trivial given the relation and its schema. This approach would not be satisfactory however if we were to allow updates to made on the relations. Plan Generation and Execution For plan generation, we chose to use an existing query optimizer framework and support code, Opt++ [KD95]. Our decision was driven by two factors: development time and usefulness to a prototype system. With the sample optimizer provided with Opt++, we were able to immediately output a plan representation, thus enabling the development of the plan execution infrastructure in parallel with the modifications to the optimizer. Since Opt++ is designed to be extensible, we were able to customize it to handle our operators as we developed them and to modify the cost calculations to more closely reflect our system. More importantly, since Opt++ is designed for flexibility and since many factors contributing to the design of XMLDBMS, such as workloads, data sets, specifications, etc are either non-existent or in flux, the integration of such a system is of significantly greater importance when prototyping solutions and running

12 experiments. For example, by specifying the catalog of a hypothetical data set, we can get preliminary numbers when trying different search strategies or indices. Interface to Opt++ In line with the prototyping argument from above, we chose to use Java to implement XMLDBMS. However, Opt++ was written in C++ so we needed an interface between the optimizer and XMLDBMS, which we call the OptClient. The responsibilities of the OptClient are to manage the Opt++ process, feed it queries and catalog data, and translate the optimized plan from Opt++ into a form that can be executed using the services provided by XMLDBMS. Since one may want to use more than one optimizer and more importantly since Opt++ is meant to be extended (i.e. its output or inputs may change), we designed OptClient to be a generic interface that a developer could use if they wanted to hook up such an optimizer to a database written in Java such as XMLDBMS. The methods that must be supported are startoptimizer and optimize. The first simply exec s and sets up communication streams with an optimizer and the second returns the root of an optimized plan given a query and a catalog. It is assumed that the query is valid for the instance of the catalog passed in with the query. If no catalog is passed in then the query is assumed to be valid over the previous catalog. We have implemented a class that supports this interface to manage and communicate with the current version of Opt++. When a result comes back from Opt++, it must be parsed and translated into the operators supported by XMLDBMS. The output from Opt++ is composed of operator names in the first line and the operator s arguments in the second line. The set of such pairs is output as a tree traversal of the plan found in Opt++. Hence, the process used for the plan translation is: 1) bind

13 the name of the operator or access method to its corresponding operator in XMLDBMS, 2) let the operator parse its own argument, and 3) set the children of the node if its not a leaf node. Plan Execution Previously we described the process of converting the output from Opt++, a description of an optimized plan, to generate an execution tree that is set to process the given query over the data set. Now we will describe in more detail the operators that make up the execution tree and what is required during execution. All elements of the tree are referred to as operators even though logically, they can be either implementations of relational operators or access methods. In either case, they conform to an Operator interface that enforces the implementation of the methods open, next, close, and getoutput. Note that next will return the next satisfying tuple and null if the there are none left and getoutput provides a way for the parent to get the schema of a child. For the case of an access method such as filescan, the only criterion for passing a tuple to the caller is if all tuples have been seen in the relation. However, for an index such as a B-tree or an internal operator such as a Select or Join, a predicate is required to evaluate whether or not the tuple is passed on to the caller. Such a predicate is implemented as a generic set of OR predicates in an AND predicate, i.e. the predicate is in conjunctive normal form. Each OR expression is composed of standard predicates that take a value or an attribute reference as arguments. The predicates handled are >, >=, <, <=, = and are implemented in such a way to make extending these predicates to handle new types relatively painless. Thus the top level AND predicate is composed of a bunch of values and attribute references. To this top-level predicate, one or two tuples, depending on whether or not the

14 operator is unary or binary, will have to be evaluated over the predicate. To do this, the predicate has a left and right input where tuples from the right or left child of this operator will come from. This is done so that the values used in the outer tuple of a join are not re-referenced as the other child s tuples stream by. Thus, when parsing a predicate, the operator must determine which side of its predicate an attribute reference belongs to. This is done by checking the output schemas of the children to see which side of the tree an attribute originates from. Once the origin is known, the position of an attribute reference in a given operator is found before execution from the schema found in the outputs of the operator s children. If the node is a leaf, the schema is obtained from its base relation and the attributes are rewritten to provide a unique name, composed of table or variable name and attribute name. Since the attribute names are unique at the leaves and given that on any transition from child to parent will be some composition of fields, every attribute referenced in an operator will correspond to a unique name. The preceding discussion provides a guideline for how the methods in the interface should be written. The open method will be responsible for setting up its output schema by either opening its children if its has any and using their output schemas or if the operator is a leaf, using the base relation. In addition, if there exists a predicate, the mapping of attribute to position is now done. The next method will just return the next satisfying tuple, optionally applied to a predicate if the node requires one, and null if no more tuples satisfy. Though only nested loops is currently implemented, this framework would also support the implementation of an algorithm such as hash-join or sort-merge where there might be a materialization stage. Furthermore, there is nothing that precludes an implementation that sets up its source to be at a remote node, as long as the local operator adheres to the above interface.

15 Given this interface, once a root of such a plan is obtained, the execution follows by getting the output schema of the root, making a relation from such a schema, and filling it with tuples from next calls to the root until there are no more tuples remaining. Modifications to Opt++ The status of Opt++ as it came out of the box was that it parsed SQL, could take a catalog of a fixed format, had a number of algorithms to implement operators, and had preset cost parameters. However, these were all for an ORDBMS and had different system assumptions than XMLDBMS. Thus a number of modifications had to be made in sample code provided by Opt++ such that it would make sense for usage with XMLDBMS. However, since XMLDBMS manages relational data, there were a number of similarities that could be preserved and tweaked. The following will detail what could be salvaged, what had to be completely rewritten, and what major parts had to be modified to work with a system such as XMLDBMS. The primary component of Opt++ that remained was the SQL parser. The motivation for this decision was based on the fact that translating to SQL or relational algebra are equivalent in difficulty for our subset of XML-QL, but SQL is easier to understand when translating. Furthermore, the translation to relational algebra already existed in Opt++ so we traded off writing new code in Java rather than new code in C++. More importantly, we found XML-QL to be a weakly specified and clunky language so it is foreseeable that XML-QL will not be used in the future. Therefore, for such a prototype, we felt it was more useful to translate to an accepted and implemented language for Opt++, SQL, than to hardcode the translation to relational algebra from XML-QL.

16 While using SQL remains, the catalogs over which the optimizer tries to estimate the best plan was rewritten since the statistics and terminology were often irrelevant in a relational system and difficult to map from one to another. The goal for the catalog schemas was to start simple with the possibility of adding statistics when interesting trends in the data become apparent. For example, some of our data sets produced many null values when converted from XML to relations. If this statistic was recorded with a relation and there was no index on the often-null attribute, the selectivity factor would be significantly reduced. Also, we should mark attributes as being foreign keys of another table as this seems to be a common feature in our modeling of complex object and sets. This would save the optimizer the trouble of inferring the same from the indices where we would expect to see an index on the primary key of the relation pointed to. Furthermore, it is currently assumed that all query processing occurs locally, but if this is not the case, information regarding the remote properties of relations may be useful, such as round-trip time. This might be stored in another relation containing server data and maintained in a catalog proxy. Another issue that fits well in the ORDBMS version that does not fit well in the relational system was that of types. In the original version of Opt++, the type system was driven from the catalog information as expected since each relation is a type. In addition, even the primitive types such as integer, float, etc are not distinguished from relation types, thus when type checking a query, Opt++ uses the information for all types as originating from the catalog information. Since, this dependence is everywhere, it sufficed to maintain this catalog driven type system where the primitive types are placed in globally known locations and make relatively minor changes throughout the code.

17 Yet another issue that had to be dealt with was the terms used to calculate the estimated costs of a plan. In the original version of Opt++, the costs assumed I/O was necessary when processing a query. In the case of XMLDBMS, the database is assumed to be in main memory and for simplicity, we assume that the OS will not page the process out. In addition, we do not include the overhead incurred by using Java to store the data as this can be reduced to current DBMS standards in a more realistic system. A more realistic approach might be to replace the original disk I/O terms in place of network I/O, however we assume the data is in memory at time of optimization. Thus the terms modified in estimating the costs remove the costs associated with I/O and leave only the costs due to memory, such as expected number of tuples per operator output and expected number of operations per operator given such sizes. These changes were made where cost per implementation of operator was considered in the search space. It should be noted that these implementations existed in Opt++ and remain so that they can be either used as developed in XMLDBMS or made use of in hypothetical situations that can be mimicked by supplying Opt++ a hypothetical catalog. Because XMLDBMS supports a limited set of operator implementations, the impact of using such an optimizer is questionable in terms of performance where the only join is a nested loops join. However, our translation to the relational model provides much room for gains in join reordering and selection pushes. In addition, the choice of using Opt++ was driven by the benefit such a flexible system provides to a prototype such as XMLDBMS. Experiments The XMLDBMS system was tested on a 200 MHz Pentium Pro running Solaris 2.6. When querying over small test datafiles (less than 10 kilobytes) we were able to obtain results

18 almost in real time. When we ran queries over larger files (about 250 kilobytes), the response time varied greatly depending on the nature of the datafile. When querying over an XML document containing data that translates into a dense table, the response time was relatively quick. However, when querying over an XML document which contained data that was very sparse (the document contains the play The Tragedy of King Richard the Third [Sha98]), the time it took to translate the document into a table was overwhelmingly long. Among the notable functionality we were able to achieve with XMLDBMS are: i) Querying over multiple data sources. For example, the following query performs a join on tables from two XML documents, and constructs a resulting document containing attributes from both sources. WHERE <book> <title> $t </> <author><lastname> $l </></> </book> IN " <Item> <Title> $t </> <UnitPrice> $p </> </Item> IN "file:///u/k/b/kbeach/764/src/data/book.xml" CONSTRUCT <Book> <Title> $t </> <Author> $l </> <Price> $p </> </Book> ii) DTD Translation. The following example translates XML data that conforms to one DTD into XML that conforms to another. WHERE <book year=$y> <title> $t </> <author> <lastname> $l </> <firstname> $f </> </> <publisher> <name> $p </> <address> $a </> </> </book> IN "file:///u/k/b/kbeach/764/src/data/bib.xml" CONSTRUCT <thebook> <theyear> $y </> <thetitle> $t </> <theauthor firstname=$f> $l </> <thepublisher address=$a> $p </> </thebook>

19 Conclusions The XML-QL query language as described by the W3C proposal [DFF+98] tries to do too much, and so becomes very difficult to understand. The stripped-down subset of the language we have chosen to support, however, appears to lend itself very well to querying XML data, since they both use the <tag> syntax of SGML. While this subset of XML-QL is not very powerful, it would be interesting to see if a more powerful, and at the same time more intuitive, query language for XML can be developed. One significant conflict that arose was the management of set valued attributes. XML naturally lends itself to easily specifying set valued attributes. This results in a significantly time-consuming process of flattening out the tuples that contain sets and the reduction of the ability to efficiently perform more complex queries over those tuples. Furthermore, since we duplicate tuples containing a set of values N times, where N is the cardinality of the set, it drastically increases the memory and disk consumption of the database. While the relational model is possible, we believe that XML is more naturally adaptable to an object relational model, or a pure object oriented database. A copmarison study between the different approaches has not been done at this point in time. Future Work The XMLDBMS system appears to be easily extensible to make a distributed database system. At present, the execution plan (which is in the form of a tree) generated by the query optimizer is executed entirely locally, with the only remote action being the fetching of the tables (at the leaf nodes). For example, if it can be detected that a subtree of the execution plan uses only tables from a remote site, and the remainder of the tree does not depend on tables from

20 that site, it should be possible to migrate the entire subtree to the remote site and initiate execution there. If the resulting table that is sent back is significantly smaller than the original tables on which the subtree depended, there will be a significant gain in performance. In order to fully implement this, however, the process of deciding whether to migrate a subtree must be more involved, and would potentially require extending the query optimizer and having available more catalog information. Our current implementation assumes that an XML datasource is never altered. Thus, once the document is cached in the storage manager it is never refreshed. Because XML data is available in the same manner as a web page, we encounter the similar problem that the web caching community faces, which is maintaining consistency of its data. This problem is probably even more significant when these URL s are treated as entries or tables within a database since it is the database data that could be stale and not just a news article or personal web page. Currently, we cache all data and assume it will never become stale, however, future improvements should maintain strong consistency between the XML document and what is stored in the XMLDBMS. Bibliography [Ant98] ANTLR version 2.4.0, Magelang Institute. [DFF+98] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, D. Suciu. XML-QL: A Query Language for XML, submission to the World Wide Web Consortium, 19 August [KD95] N. Kabra, D. DeWitt. OPT++: An Object-Oriented Implementation for Extensible Database Query Optimization.

Query Containment for XML-QL

Query Containment for XML-QL Query Containment for XML-QL Deepak Jindal, Sambavi Muthukrishnan, Omer Zaki {jindal, sambavi, ozaki}@cs.wisc.edu University of Wisconsin-Madison April 28, 2000 Abstract The ability to answer an incoming

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

A Distributed Query Engine for XML-QL

A Distributed Query Engine for XML-QL A Distributed Query Engine for XML-QL Paramjit Oberoi and Vishal Kathuria University of Wisconsin-Madison {param,vishal}@cs.wisc.edu Abstract: This paper describes a distributed Query Engine for executing

More information

Database Systems. Project 2

Database Systems. Project 2 Database Systems CSCE 608 Project 2 December 6, 2017 Xichao Chen chenxichao@tamu.edu 127002358 Ruosi Lin rlin225@tamu.edu 826009602 1 Project Description 1.1 Overview Our TinySQL project is implemented

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Approaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values

Approaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values XML Storage CPS 296.1 Topics in Database Systems Approaches Text files Use DOM/XSLT to parse and access XML data Specialized DBMS Lore, Strudel, exist, etc. Still a long way to go Object-oriented DBMS

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

XML-QE: A Query Engine for XML Data Soures

XML-QE: A Query Engine for XML Data Soures XML-QE: A Query Engine for XML Data Soures Bruce Jackson, Adiel Yoaz {brucej, adiel}@cs.wisc.edu 1 1. Introduction XML, short for extensible Markup Language, may soon be used extensively for exchanging

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

Basant Group of Institution

Basant Group of Institution Basant Group of Institution Visual Basic 6.0 Objective Question Q.1 In the relational modes, cardinality is termed as: (A) Number of tuples. (B) Number of attributes. (C) Number of tables. (D) Number of

More information

Material You Need to Know

Material You Need to Know Review Quiz 2 Material You Need to Know Normalization Storage and Disk File Layout Indexing B-trees and B+ Trees Extensible Hashing Linear Hashing Decomposition Goals: Lossless Joins, Dependency preservation

More information

Mahathma Gandhi University

Mahathma Gandhi University Mahathma Gandhi University BSc Computer science III Semester BCS 303 OBJECTIVE TYPE QUESTIONS Choose the correct or best alternative in the following: Q.1 In the relational modes, cardinality is termed

More information

DBMS Query evaluation

DBMS Query evaluation Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/

More information

XML Systems & Benchmarks

XML Systems & Benchmarks XML Systems & Benchmarks Christoph Staudt Peter Chiv Saarland University, Germany July 1st, 2003 Main Goals of our talk Part I Show up how databases and XML come together Make clear the problems that arise

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Chapter 3. Algorithms for Query Processing and Optimization

Chapter 3. Algorithms for Query Processing and Optimization Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms

More information

Data about data is database Select correct option: True False Partially True None of the Above

Data about data is database Select correct option: True False Partially True None of the Above Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due

More information

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University Extra: B+ Trees CS1: Java Programming Colorado State University Slides by Wim Bohm and Russ Wakefield 1 Motivations Many times you want to minimize the disk accesses while doing a search. A binary search

More information

Web Services for Relational Data Access

Web Services for Relational Data Access Web Services for Relational Data Access Sal Valente CS 6750 Fall 2010 Abstract I describe services which make it easy for users of a grid system to share data from an RDBMS. The producer runs a web services

More information

Teiid Designer User Guide 7.5.0

Teiid Designer User Guide 7.5.0 Teiid Designer User Guide 1 7.5.0 1. Introduction... 1 1.1. What is Teiid Designer?... 1 1.2. Why Use Teiid Designer?... 2 1.3. Metadata Overview... 2 1.3.1. What is Metadata... 2 1.3.2. Editing Metadata

More information

CPS352 Lecture - Indexing

CPS352 Lecture - Indexing Objectives: CPS352 Lecture - Indexing Last revised 2/25/2019 1. To explain motivations and conflicting goals for indexing 2. To explain different types of indexes (ordered versus hashed; clustering versus

More information

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement. COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Chapter 2. DB2 concepts

Chapter 2. DB2 concepts 4960ch02qxd 10/6/2000 7:20 AM Page 37 DB2 concepts Chapter 2 Structured query language 38 DB2 data structures 40 Enforcing business rules 49 DB2 system structures 52 Application processes and transactions

More information

Table of Contents Chapter 1 - Introduction Chapter 2 - Designing XML Data and Applications Chapter 3 - Designing and Managing XML Storage Objects

Table of Contents Chapter 1 - Introduction Chapter 2 - Designing XML Data and Applications Chapter 3 - Designing and Managing XML Storage Objects Table of Contents Chapter 1 - Introduction 1.1 Anatomy of an XML Document 1.2 Differences Between XML and Relational Data 1.3 Overview of DB2 purexml 1.4 Benefits of DB2 purexml over Alternative Storage

More information

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments Administrivia Midterm on Thursday 10/18 CS 133: Databases Fall 2018 Lec 12 10/16 Prof. Beth Trushkowsky Assignments Lab 3 starts after fall break No problem set out this week Goals for Today Cost-based

More information

Performance Optimization for Informatica Data Services ( Hotfix 3)

Performance Optimization for Informatica Data Services ( Hotfix 3) Performance Optimization for Informatica Data Services (9.5.0-9.6.1 Hotfix 3) 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Semistructured Data Store Mapping with XML and Its Reconstruction

Semistructured Data Store Mapping with XML and Its Reconstruction Semistructured Data Store Mapping with XML and Its Reconstruction Enhong CHEN 1 Gongqing WU 1 Gabriela Lindemann 2 Mirjam Minor 2 1 Department of Computer Science University of Science and Technology of

More information

Design Pattern: Composite

Design Pattern: Composite Design Pattern: Composite Intent Compose objects into tree structures to represent part-whole hierarchies. Composite lets clients treat individual objects and compositions of objects uniformly. Motivation

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

CSE 344 FEBRUARY 14 TH INDEXING

CSE 344 FEBRUARY 14 TH INDEXING CSE 344 FEBRUARY 14 TH INDEXING EXAM Grades posted to Canvas Exams handed back in section tomorrow Regrades: Friday office hours EXAM Overall, you did well Average: 79 Remember: lowest between midterm/final

More information

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Statically vs. Dynamically typed languages

More information

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Indexing Chapter 8, 10, 11 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Tree-Based Indexing The data entries are arranged in sorted order by search key value. A hierarchical search

More information

Integrating Path Index with Value Index for XML data

Integrating Path Index with Value Index for XML data Integrating Path Index with Value Index for XML data Jing Wang 1, Xiaofeng Meng 2, Shan Wang 2 1 Institute of Computing Technology, Chinese Academy of Sciences, 100080 Beijing, China cuckoowj@btamail.net.cn

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

COSC 3311 Software Design Report 2: XML Translation On the Design of the System Gunnar Gotshalks

COSC 3311 Software Design Report 2: XML Translation On the Design of the System Gunnar Gotshalks Version 1.0 November 4 COSC 3311 Software Design Report 2: XML Translation On the Design of the System Gunnar Gotshalks 1 Introduction This document describes the design work and testing done in completing

More information

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example Student Introduction to Database Systems CSE 414 Hash table example Index Student_ID on Student.ID Data File Student 10 Tom Hanks 10 20 20 Amy Hanks ID fname lname 10 Tom Hanks 20 Amy Hanks Lecture 26:

More information

extensible Markup Language

extensible Markup Language extensible Markup Language XML is rapidly becoming a widespread method of creating, controlling and managing data on the Web. XML Orientation XML is a method for putting structured data in a text file.

More information

A new generation of tools for SGML

A new generation of tools for SGML Article A new generation of tools for SGML R. W. Matzen Oklahoma State University Department of Computer Science EMAIL rmatzen@acm.org Exceptions are used in many standard DTDs, including HTML, because

More information

DBMS (FYCS) Unit - 1. A database management system stores data in such a way that it becomes easier to retrieve, manipulate, and produce information.

DBMS (FYCS) Unit - 1. A database management system stores data in such a way that it becomes easier to retrieve, manipulate, and produce information. Prof- Neeta Bonde DBMS (FYCS) Unit - 1 DBMS: - Database is a collection of related data and data is a collection of facts and figures that can be processed to produce information. Mostly data represents

More information

Database Management Systems Paper Solution

Database Management Systems Paper Solution Database Management Systems Paper Solution Following questions have been asked in GATE CS exam. 1. Given the relations employee (name, salary, deptno) and department (deptno, deptname, address) Which of

More information

CMSC424: Programming Project

CMSC424: Programming Project CMSC424: Programming Project Due: April 24, 2012 There are two parts to this assignment. The first one involves generating and analyzing the query plans that Oracle generates. The second part asks you

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Fundamentals of Database Systems

Fundamentals of Database Systems Fundamentals of Database Systems Assignment: 4 September 21, 2015 Instructions 1. This question paper contains 10 questions in 5 pages. Q1: Calculate branching factor in case for B- tree index structure,

More information

KNSP: A Kweelt - Niagara based Quilt Processor Inside Cocoon over Apache

KNSP: A Kweelt - Niagara based Quilt Processor Inside Cocoon over Apache KNSP: A Kweelt - Niagara based Quilt Processor Inside Cocoon over Apache Xidong Wang & Shiliang Hu {wxd, shiliang}@cs.wisc.edu Department of Computer Science, University of Wisconsin Madison 1. Introduction

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

University of Waterloo Midterm Examination Sample Solution

University of Waterloo Midterm Examination Sample Solution 1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building External Sorting and Query Optimization A.R. Hurson 323 CS Building External sorting When data to be sorted cannot fit into available main memory, external sorting algorithm must be applied. Naturally,

More information

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15 Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Query Processing and Optimization using Compiler Tools

Query Processing and Optimization using Compiler Tools Query Processing and Optimization using Compiler Tools Caetano Sauer csauer@cs.uni-kl.de Karsten Schmidt kschmidt@cs.uni-kl.de Theo Härder haerder@cs.uni-kl.de ABSTRACT We propose a rule-based approach

More information

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs Introduction to Database Systems CSE 414 Lecture 26: More Indexes and Operator Costs CSE 414 - Spring 2018 1 Student ID fname lname Hash table example 10 Tom Hanks Index Student_ID on Student.ID Data File

More information

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Intro to DB CHAPTER 12 INDEXING & HASHING

Intro to DB CHAPTER 12 INDEXING & HASHING Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing

More information

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) Introduction This semester, through a project split into 3 phases, we are going

More information

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013 Assured and security Deep-Secure XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013 This technical note describes the extensible Data

More information

Query Processing Strategies and Optimization

Query Processing Strategies and Optimization Query Processing Strategies and Optimization CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/25/12 Agenda Check-in Design Project Presentations Query Processing Programming Project

More information

CS122 Lecture 4 Winter Term,

CS122 Lecture 4 Winter Term, CS122 Lecture 4 Winter Term, 2014-2015 2 SQL Query Transla.on Last time, introduced query evaluation pipeline SQL query SQL parser abstract syntax tree SQL translator relational algebra plan query plan

More information

McGill April 2009 Final Examination Database Systems COMP 421

McGill April 2009 Final Examination Database Systems COMP 421 McGill April 2009 Final Examination Database Systems COMP 421 Wednesday, April 15, 2009 9:00-12:00 Examiner: Prof. Bettina Kemme Associate Examiner: Prof. Muthucumaru Maheswaran Student name: Student Number:

More information

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation. Query Processing QUERY PROCESSING refers to the range of activities involved in extracting data from a database. The activities include translation of queries in high-level database languages into expressions

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

SEMANTIC ANALYSIS TYPES AND DECLARATIONS SEMANTIC ANALYSIS CS 403: Type Checking Stefan D. Bruda Winter 2015 Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination now we move to check whether

More information

Query processing and optimization

Query processing and optimization Query processing and optimization These slides are a modified version of the slides of the book Database System Concepts (Chapter 13 and 14), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan.

More information

CSCE-608 Database Systems. COURSE PROJECT #2 (Due December 5, 2018)

CSCE-608 Database Systems. COURSE PROJECT #2 (Due December 5, 2018) CSCE-608 Database Systems Fall 2018 Instructor: Dr. Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Office Hours: MWF 10:00am-11:00am Grader: Sambartika Guha Email: sambartika.guha@tamu.edu

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

Review. Support for data retrieval at the physical level:

Review. Support for data retrieval at the physical level: Query Processing Review Support for data retrieval at the physical level: Indices: data structures to help with some query evaluation: SELECTION queries (ssn = 123) RANGE queries (100

More information

SourceGen Project. Daniel Hoberecht Michael Lapp Kenneth Melby III

SourceGen Project. Daniel Hoberecht Michael Lapp Kenneth Melby III SourceGen Project Daniel Hoberecht Michael Lapp Kenneth Melby III June 21, 2007 Abstract Comverse develops and deploys world class billing and ordering applications for telecommunications companies worldwide.

More information

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide

More information

TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS

TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS SAMUEL MADDEN, MICHAEL J. FRANKLIN, JOSEPH HELLERSTEIN, AND WEI HONG Proceedings of the Fifth Symposium on Operating Systems Design and implementation

More information

Overview. Structured Data. The Structure of Data. Semi-Structured Data Introduction to XML Querying XML Documents. CMPUT 391: XML and Querying XML

Overview. Structured Data. The Structure of Data. Semi-Structured Data Introduction to XML Querying XML Documents. CMPUT 391: XML and Querying XML Database Management Systems Winter 2004 CMPUT 391: XML and Querying XML Lecture 12 Overview Semi-Structured Data Introduction to XML Querying XML Documents Dr. Osmar R. Zaïane University of Alberta Chapter

More information

CSE 444: Database Internals. Lectures 5-6 Indexing

CSE 444: Database Internals. Lectures 5-6 Indexing CSE 444: Database Internals Lectures 5-6 Indexing 1 Announcements HW1 due tonight by 11pm Turn in an electronic copy (word/pdf) by 11pm, or Turn in a hard copy in my office by 4pm Lab1 is due Friday, 11pm

More information