A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS

Size: px
Start display at page:

Download "A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS"

Transcription

1 A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS Alberto Pan, Paula Montoto and Anastasio Molano Denodo Technologies, Almirante Fco. Moreno 5 B, Madrid, Spain apan@denodo.com, pmontoto@denodo.com, amolano@denodo.com Manuel Álvarez, Juan Raposo and Ángel Viña Information and Communication Technologies Department, University of A Coruña, A Coruña, Spain mad@udc.es, jrs@udc.es, avc@udc.es Keywords: Abstract: Mediator systems, sources query capability, semi-structured data integration. Mediator systems aim to provide an unified global data schema over distributed heterogeneous structured and semi-structured data sources. These systems must deal with limitations on the query capabilities of the sources. This paper introduces a new framework for representing source query capability along with the algorithms needed to compute the query capabilities of the global schema from sources. Our approach for computing query capabilities is able to support a richer capabilities representation framework than the ones previously presented in the literature. We show that those approaches are insufficient to properly represent many real sources, and how our approach can solve those limitations. 1. INTRODUCTION The world today is characterised by the abundance of information sources available through media such as the WWW, databases, semi-structured files, etc. Nevertheless, this information is usually scattered, heterogeneous and weakly structured (semi-structured data). Mediator systems (Wiederhold, 1992), provide users with an unified global schema over distributed heterogeneous structured and semi-structured data sources. In the context in which the mediators operate, sources generally have limitations in the way in which they can be queried. For example, in Web sources (one of the most significant types of mediator sources) query possibilities are often limited to those provided by some type of HTML form. The mediator systems presented up to date, such as (García-Molina, 1995)(Halevy, 1996)(Tomasic, 1998), used simple query capability description frameworks in order to detect when a given query over the mediated global schema cannot be answered due to source limitations. But most of these frameworks fail to properly represent the query capabilities of many real sources. Besides, these systems do not compute the query capabilities of global schema relations from source capabilities. This has the following important disadvantages: Users are forced to write queries by a trial & error process, trying to figure out which queries can in fact be processed by the mediator. It is not possible to use mediators as source for new ones (achieving this can be very convenient in large data integration projects). Nevertheless its importance the issue of computing global schema query capabilities, as far as we are aware, has only been dealt with in (Yerneni, 1999). But this work was based on a simple query capability description framework unable to deal with the features encountered in many real sources.

2 In section 2 of this paper we present a query capability description framework able to properly model a much wider range of real sources. We also present an algorithm for propagating query capability from sources to global relations supported by our framework. In section 3, we provide an overview of a method for executing queries in a mediator using the former techniques. Those ideas have been implemented in a commercial mediator system which has been used for the construction of various industrial data integration applications. In section 4, we include some quantitative data from them, which is relevant for our discussion. Finally, Section 5 discusses related and future work. 2. QUERY CAPABILITY FRAMEWORK 2.1 Relation Model In our model each source exports a combination of relations, called base relations. Each base relation is composed of a combination of attributes, each one belonging to a particular data type. Besides of the usual atomic data-types (strings of characters, integers, money, dates, etc.) structs and arrays are also supported. Each relation will represent its query capability through what we term as search-methods. Each search-method is comprised of three elements: : indicate the restrictions which must be satisfied by the queries sent to the relation through this particular searchmethod. : Indicate the query capabilities offered by the relation, once the negative capabilities has been satisfied. Output of attributes: Set of attributes which appear in the response of the queries sent against the relation. This set is needed since some sources allow being queried by attributes which are not included in query answers. For instance, some web bookshops can be queried by the book subject, but such an attribute does not appear in the output. Both negative and positive capabilities are represented through a set of 4-uples with the format: (attribute, operator, multiplicity, possible_values) where: attribute is an attribute of the relation. operator is an operator valid for the attribute data type. multiplicity indicates how many query conditions for the given pair attribute/operator can execute the relation in the same query. + indicates a number of values above 0 but with no upper limit. Possible_values is the list of values by which the attribute can be consulted. If it contains the value Any it means that the search range is not limited (in the range associated to the attribute data type). EXAMPLE 1: We consider the example of an on-line bookshop whose search form is shown in figure 1. The form forces the user to specify a value for the TITLE attribute (i.e. it is a mandatory field) and optionally the user can specify a value for the AUTHOR attribute, for the FORMAT attribute (restricted to a list of possible values) and a range of values for the PRICE attribute. Queries by title and author are searches by keyword (operator Contains). An arbitrary number of words can be used to fill in the boxes and the source will perform the logical AND operation between them. For instance, filling the title box with the words java and xml would return all the books containing both words in its title. In addition to the former attributes, we assumed that the shop also returns a PUBLISHER attribute. Title Mandatory Author Format (select one) Price All formats Hardcover ebooks Paperback min max Figure 1. Bookshop search form We model this source as the following relation: R={TITLE:String, AUTHOR:String, FORMAT:Enumerated(Hardcover,

3 Paperback, ebooks), PUBLISHER: String, PRICE:Money} Which capabilities can be modelled with the following search-method: { NEGATIVE { (TITLE,Contains,1,ANY) } POSITIVE{ (TITLE,Contains,+,ANY) (AUTHOR,Contains,+,ANY) (FORMAT,=,1,{ Hardcover, Paperback, ebooks }) (PRICE,<=,1,ANY) (PRICE,>=,1,ANY) } OUTPUT {TITLE,AUTHOR,FORMAT,PUBLISHER,PRICE} } Not all the relations can be modelled with only one search-method. When a relation presents mutually excluding options, more than a searchmethod is needed. For instance, it could happen that the source, instead of forcing the user to specify a value for the TITLE attribute, only forced to specify a value either for the TITLE attribute or for the AUTHOR attribute. This could be represented by adding a second search-method identical to the former one but substituting its negative 4-uple into: (AUTHOR,CONTAINS,1,ANY). 2.2 Query Model We define a base query as a query which involves only one relation. A base query in our model is of the form (using SQL syntax): Select * from R where (QueryCondition1) and (QueryCondition2)and and (QueryConditionN) where a query condition is a Boolean expression combining 3-uples of the form (Attribute Operator Value). For instance, a valid base query over R would be: EXAMPLE 2: A base query over R. select * from R where (title contains java )and(title contains xml ) and (price <= 5000) and (publisher contains Addison-Wesley ) It is interesting to point out the differences between our base query model and the one supported in systems like (Yerneni, 1999) (which is the only previous system which computed mediator query capabilities). The main differences are that in those simpler models the only operator permitted in query conditions is the = operator; and only one query condition per attribute is permitted. Those models cannot properly represent the query capabilities of many sources. Let us consider Example 1: It is not possible to represent the query capabilities for the PRICE attribute, since the source allows to be queried using <= and >= operators, instead of =. It is not possible to properly represent the source query capabilities for TITLE and AUTHOR attributes, since they use the CONTAINS operator (which only could be poorly "emulated" by the = operator), and also the source allows to be queried with more than one query condition for those attributes (as in the previous example query, where there was two query conditions for the TITLE attribute). Of course, the system is not restricted to execute only base queries. As will become more evident below, queries can also combine different relations by using joins, unions, aggregation operations, etc. 2.3 Matching queries with searchmethods In order to execute a base query against a relation through a certain search-method, the following conditions are required: 1) All the negative 4-uples must be satisfied by the query conditions of the query. A negative 4-uple (A,OP,MUL,VALUES) is satisfied if the query contains at least a number of MUL query conditions of the form (A OP a1)where a1 VALUES. For instance the first base query in Example 2 would satisfy restrictions on the search-method from Example 1, since there is

4 one query condition involving the TITLE attribute and the CONTAINS operator. 2) The query conditions of the query are permitted by the positive 4-uples. For verifying that the conditions of a given query are permitted by a search-method, we iterate over all those conditions. Each condition of the form (A OP a1)is permitted if exists one positive 4-uple (A,OP2,MUL,VALUES) verifying: 1) OP=OP2, 2) MUL>=1, and 3) a1 VALUES. Each time a positive 4-uple having a multiplicity value different from '+' is used for permitting a query condition, we decrease its multiplicity by 1. When it reaches 0, the 4-uple cannot be used to permit more query conditions during the process. For instance, the first base query in Example 2 would not be directly answerable by the source since there is not 4-uple which permits querying the relation by the PUBLISHER attribute 2.4 Post-processing techniques Query conditions involving those attributes present in the output set can always be performed by the mediator, by applying post-processing techniques. EXAMPLE 3: The first base query showed in the Example 2 can, in fact, be executed over the relation R from the Example 1, by executing the query: select * from R where (title contains java ) and (price<=5000) on the source and then (since the PUBLISHER attribute is present in the output), post-processing in the view the obtained results by filtering out those tuples not verifying the additional condition: (publisher contains Addison-Wesley ) As a consequence of these techniques, we can add a 4-uple of the form (A,ANY,+,ANY)for each attribute A in the output set (and there will be no need of other 4-uples for these attributes, since this 4-uple represents the capability for performing any query condition). Any other boolean condition involving A could also be applied (even using other boolean operators like or, not, etc.) Another post-processing technique can be applied for further improving source query capability. Let us define the containment relation between operators. We will say that an operator OP2 is contained in an operator OP1, if OP1 can emulate OP2 by post-processing techniques. For instance, a query condition using = can be emulated by using the same query condition with the Contains operator, and then filtering out the non exact-match results. So we will say that = operator is contained in the Contains operator. This only can be done if the attribute of the condition is included in the output set. That means that every negative 4-uple (A,OP,MUL,VALUES), where A is included in the output set, can also be satisfied by conditions of the form (A OP2 a), where a VALUES and OP2 is contained in OP. In the same way, a positive 4-uple (A,OP,MUL,VALUES) where A is included in the output set, also permits a query condition (A OP2 a), where a VALUES and OP2 is contained in OP. Since the queries executed against a base relation by post-processing techniques, can be significantly less efficient (because more data must be transmitted over the network), it is useful to differentiate between the native and the extended capabilities of the source. In query execution this should be taken into account in order to use post-processing only when needed. 2.5 Global schema relations Once the base relations have been created, each relation of the global schema is defined by a query involving the base relations, in a similar way to the definition of views in a conventional database. This approach is known in literature as Global As View (Florescu, 1998). A view is defined through a relational algebra expression involving relations. The algebraic operators permitted are: union, join, selection and projection. Orderby and aggregation operations are also supported (though not described here for the sake of shorteness).

5 EXAMPLE 4: Consider the three base relations: A ={TITLE:String,AUTHOR:String, PRICE:Money} B={TITLE:String,AUTHOR:String, PRICE:Money} C={TITLE:String,AUTHOR:String, RELEVANCE:Integer} A and B represent two Internet book merchant sites. C represents a web source in which its users assess the books relevance, and the system enables searches for the average relevance of a certain book. Consider that we want to obtain a global schema relation: R={TITLE,AUTHOR,PRICE,RELEVANCE} which contains all the books from A and B (only one tuple for each book), together with their average relevance according to C, and the minimum price for each book found within both e-shops. The definition of R would be: SELECT TITLE,AUTHOR,RELEVANCE, MIN(PRICE) FROM (I SELECT TITLE,AUTHOR,PRICE FROM A UNION SELECT TITLE,AUTHOR,PRICE FROM B),C WHERE I.TITLE=C.TITLE AND I.AUTHOR=C.AUTHOR GROUPBY TITLE,AUTHOR,RELEVANCE Computing query capability for global schema relations When the global schema relations are defined, the mediator is capable of computing its query capabilities from the sources. The query capability of a relation depends on the definition of the view through which such relation was defined. Given the tree of the relational algebra expression defining the view, we will provide a set of rules for calculating the search-methods of a given node in function of its node-type (that is, union, join, selection or projection), and the searchmethods of its children. Given these rules, the search-methods for the new relation can be easily obtained by traversing the tree from leaf nodes to root node, and calculating the search-methods for each node. The search-methods for the root node will be the ones representing the query capability of the relation. The way these rules should be applied differs depending on the number of child nodes. When the node has only one child (this will happen for projection and selection nodes), for each search-method in the relation represented by the child node, we will create a new search-method in the parent node. So, we will provide the rules for obtaining a new search-method from a given searchmethod from the child node (we will term it as a base search-method) When the node has many children (union and join operations), we only need to consider the case when there are only two children. If there were n, it would suffice to apply (n-1) times the process for two nodes. For instance, if we have an union node with four children (F1,F2,F3,F4), we could compute: R1=(F1 F2) R2=(F3 R1) R3=(F4 R2) and R3 would be the desired result. With two children, a new search-method in the parent node will be generated for each pair in the cross product of the two sets of search-methods from the children. The rules for obtaining a new searchmethod in function of each pair (we will term them as base search-methods) will be given for each operation (join and union). Once the way rules should be used has been introduced, the following sections provide them for each operation Union nodes Output Set An attribute will be in the output set of the new search-method if it is in the output set of both base search-methods. In order to correctly understand the following steps, it is convenient to have in mind the process

6 followed by a mediator for executing a query over an union node: it will simply execute the query over both children; and then will perform the union of both results. A first consequence of this is that the negative capabilities of one base search-method must be permitted by the positive capabilities of the other one. In other case, no query could be executed by the mediator over that node since: both search-methods must execute the same queries and one of the base search-methods forces to be queried by conditions that the other base searchmethod specifically excludes. Given that the former condition is satisfied, then the new search-method can be created by including in it the negative capabilities of both base searchmethods (since queries will have to satisfy the query restrictions of both children). This process can lead to redundant restrictions. We will say that a 4-uple X=(A,OP1,MUL1,VALUES1) is less restrictive than another 4-uple Y=(A,OP2,MUL2,VALUES2) (and thus, X can be removed ) if: OP1=OP2 MUL1>=MUL2. ( '+' is considered as the higher value). VALUES2 VALUES1. ('ANY' is considered as a set containing all possible values). Due to post-processing techniques, we can add a 4-uple (A,ANY,+,ANY)for each attribute A in the output set. On the other side, query conditions by those attributes not present in the output set are only possible if they are permitted by both base searchmethods. Thus, in order to calculate the new 4-uples for those attributes, the following process is needed: For each attribute, we compute the cross product of the 4-uples for the attribute in both base searchmethods. For each pair (X,Y) in the calculated cross product where: X=(A,OP,MUL1,VALUES1) Y=(A,OP,MUL2,VALUES2) (note that in order to generate a new 4-uple, both base 4-uples must share the same operator) we will create a 4-uple in the new search-method with the following form: (A,OP,MIN(MUL1,MUL2),VALUES1 VALUES2) Join Nodes While for executing queries over union nodes the mediator had only one relevant strategy, for join nodes, several ones are possible. As a consequence, the query capabilities computed will differ depending on the execution method used. So, we must compute the query capabilities for each possible case. In execution time, the mediator will be responsible of consider both the query capabilities and the cost of each execution method in order to choose the better one for each particular query. Here we will consider two different join execution strategies: conventional-join and nestedjoin. Intuitively, a query over a join node using conventional strategy, is executed by sending a subquery over both child relations (each sub-query will contain the query conditions concerning its attributes) and then, joining the sub-results. A query using nested strategy is executed by first sending a sub-query against the first child relation, and then for each unique value for the join attribute(s) returned in the sub-result, sending a query against the second relation. EXAMPLE 5: Nested join Let us consider two relations R1(A,B,C) and R2(A,D,E). Let us also suppose a relation R3(A,B,C,D,E)defined by R1 A R2. If we receive the query: Select * from R3 where (A op1 value1) and (B op2 value2) and (D op3 value3) Then the nested join strategy is implemented in the following way: 1) The following query is sent against R1, obtaining the result Q1 :

7 Q1 = Select * from R1 where (A op1 value1) and (B op2 value2) 2) Then, for each unique value a i for the attribute A obtained in Q1, we send against R2 the query: Q2 i = Select * from R2 where (A = a i ) and (D op3 value3) 3) Now we can compute the union of all the partial results obtained from R2: Q2 = Q2 1 Q2 2 Q2 n 4) Now we can compute Q1 A Q2 for obtaining the final result Conventional join strategy Output Set In order to allow this strategy to be executed, it is necessary that the join attribute be in the output set of both base search-methods. In other case, the new search-method could not be created. Given this, the output set is obtained by simply making the union of the two base sets. The new 4-uples for the join attributes are obtained in the same manner which has been already described for the attributes of union nodes. The new 4-uples for non-join attributes are obtained by simply copying the ones from the base search-methods. As in the case of union nodes, we can add a 4- uple of the form (A,ANY,+,ANY)for each attribute A in the output set. About the attributes not present in the output set, the 4-uples for non-join attributes can be obtained by simply copying the ones from the base searchmethods. As we have already seen, join attributes must be present in the output set, so we do not need to consider this case Nested join strategy As we have seen in Example 5, the order of the child nodes is relevant. So, we need to repeat the process twice, considering the two cases where each child node is chosen as first. Output Set In order to allow this strategy to be executed, it is necessary that the join attribute(s) be in the output set of the first relation. In other case, the new searchmethod could not be created. Note that in a nestedjoin, it is not required that the join attributes be in the output of the second relation, since its value is always known in the tuples retrieved from that relation (recall step two from Example 5). Then, the output set is obtained by including the join attributes and the non-join attributes which are in the output set of its base search-method. The difference with the conventional join strategy is that, for join attributes, the 4-uples of the second relation having the = operator, should be not copied into the new search-method, because they will always be satisfied. This is due to the fact that nested strategy always uses the values returned by the first relation to query the second for the join attributes and with the = operator (again, recall step two from Example 5). A first consideration is that this join strategy is only possible if the second base search-method allows querying the relation by the join attributes with the operator =. If this is satisfied, we can add a 4-uple of the form (A,ANY,+,ANY)for each attribute A in the output set. About the attributes not present in the output set, the 4-uples for non-join attributes can be obtained by simply copying the ones from the base searchmethods. As we have already seen, join attributes must be present in the output set, so we do not need to consider this case Projection nodes As is well known, a projection operation over a relation consists in removing some of their attributes (which are called hidden attributes). Output Set

8 The attributes in the output set of the new search-method will be the same as the ones present in the base search-method, excluding the hidden attributes. If there exists in the base search-method a 4-uple for a hidden attribute, then the new search-method could not be created. In other case, the 4-uples of the new searchmethod are obtained by copying the 4-uples of the base search-method. Again, we can add a 4-uple of the form (A,ANY, +,ANY)for each attribute A present in the output set. For those attributes not present in the output set, the 4-uples of the base search-method are copied in the new one. The 4-uples of the hidden attributes are removed Selection Nodes Output attributes The attributes in the output set of the new search-method will be the same as the ones present in the base one. We can also add those attributes not appearing in the output set of the base searchmethod, but which verifies at least one of the two following conditions: to appear in a selection condition with the '=' operator. Let us suppose a relation R=(A,B,C) where the attribute C does not appear in the output. Suppose we define a selection view R' over R where the selection condition is C=c1. We can consider that C is present in the output of R', since it is guaranteed that its value will be always known (it will be c1). to appear with the '=' operator in a 4-uple from the negative capabilities. Suppose a relation R=(A,B,C) having a negative capability of the form (C,=,MUL,VALUES). Since this guarantees that all the valid queries over R will include a query condition of the form (C=c1), we can assume that attribute C is in the output set, having the value c1 for all answer tuples. The 4-uples of the new search-method are obtained by copying the 4-uples of the base searchmethod, and removing those ones which are satisfied by the selection conditions (since these restrictions are already satisfied by the selection view operation, they need not to be satisfied by the query conditions). In order to be able of creating a new search method, the positive capabilities of the base searchmethod must allow the execution of the selection view operation. The algorithm showed in section 2.3 is used to determine it. For those attributes not present in the output set, the new 4-uples will be the ones obtained after applying the mentioned algorithm (multiplicity of 4- uples can be decreased as a result of this process, since the query conditions of the selection operation spend some of the available query capability). For the output attributes, we just add a 4-uple (A,ANY,+,ANY). 3. QUERY EXECUTION Once the global relations have been created and their capabilities generated, the mediator is able to accept queries over them. A query can be considered as a definition of a view over the global schema relations. Therefore, that view can be built by following an analogous process to that used to define the global schema relations, so a set of search-methods are obtained for the query. Those search methods including at least one negative 4-uple will be discarded, since that implies that query do not satisfy all the sources query restrictions (or negative capabilities). At this point, each search-method for the query can be seen as an relational algebra operation tree whose leaves are source relations and the "trace" of the search-method says how to execute it. So, each remaining search-method can, in fact, be considered as a feasible query execution plan. The optimiser module of the mediator should now obtain the cost of each plan in order to select the best one. Then the execution engine travels the chosen plan tree from the roots up to the leaves, obtaining a set of sub-queries over sources. Each one of these sub-queries is sent to the source wrapper and

9 executed by it. The execution engine then combines the returned results in accordance with what is specified by the plan tree, thus obtaining the final response. 4. QUANTITATIVE EXPERIENCE The techniques presented have been implemented in a commercial mediator system, and has been successfully used in several industrial applications for modelling more than 700 sources such as Web sites (aprox. 75% of total), databases, spreadsheets and XML documents. Some relevant for our discussion quantitative data referred to source query capabilities follows: 1) The percentage of sources which use some operator different from = exceeds 80%. 2) The most frequent operators are (in descending order): CONTAINS, =, <= and >=. 3) The percentage of sources which can execute multiple query conditions for some attributes exceeds 62%. 5. RELATED WORK AND CONCLUSIONS In recent years, a significant amount of academic mediator systems has been developed, such as Tsimmis (Garcia-Molina,1995), Information Manifold (Halevy,1996), or Disco (Tomasic, 1998). Some specific aspects in the construction of mediators, which are relevant to our discussion, have also been studied, such as query rewriting (Halevy, 2000), source query capabilities description and capabilities-based query rewriting (Papakonstantinou, 1996), and computation of mediator capabilities (Yerneni,1999). Nevertheless, as has already been discussed, most of these works failed to properly represent some advanced query features which are present in many real sources. (Papakonstantinou, 1996) is an exception for this, since it presents a more powerful capability description framework than the other systems. Nevertheless, it still lacks of some desirable features such as explicit representation of mandatory and optional attributes, which leads to less concise representations. Besides, this work does not address the important issue of computing mediator query capabilities. We claim that computing the query capabilities of the global schema relations is a must-have feature for mediators aiming to gain industrial support: 1) It is needed in order to let users know the queries supported. In our experience, industrial users do not find trial&error query processes acceptable. 2) It makes possible to use mediators as sources for other mediators, easily enabling incremental data integration processes. This feature is very desirable for large data integration projects. This issue had been dealt with in (Yerneni, 1999) but the algorithms presented in that work assumed a very limited source capability representation framework. In turn, the algorithm we present here is able of managing the richer capabilities representation models we have described. 6. REFERENCES Florescu, D., Levy, A., Mendelzon, A., Database Techniques for the World-Wide Web: A Survey. In SIGMOD RECORD, vol. 27 nº3. García-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman A., Sagiv, Y., Ullman, J., Widow, J., The TSIMMIS approach to mediation: Data models and languages. En Proceedings of NGITS (Next Generation Information Technologies and Systems). Halevy, A., Theory of Answering Queries Using Views. In ACM SIGMOD Record vol. 29, nº 4. Halevy, A., Rajaraman, A., Ordille, J., Query answering algorithms for information agents. In Proceedings of the Thirteenth National Conference on Artificial Intelligence Papakonstantinou, Y., Gupta, A., Hass, L., Capabilities-Based Query Rewriting in Mediator Systems. En Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems Tomasic, A., Raschid, L., Valduriez, P., Scaling access to distributed heterogeneous data sources with Disco. En IEEE Transactions On Knowledge and Data Engineering. Wiederhold, G., Mediators in the architecture of future information systems. Computer, 25(3) Yerneni, R., Li, C., García-Molina, H., and Ullman, J., Computing Capabilities of Mediators. In Proccedings of the ACM SIGMOD Conference.

Integrated Usage of Heterogeneous Databases for Novice Users

Integrated Usage of Heterogeneous Databases for Novice Users International Journal of Networked and Distributed Computing, Vol. 3, No. 2 (April 2015), 109-118 Integrated Usage of Heterogeneous Databases for Novice Users Ayano Terakawa Dept. of Information Science,

More information

CRAWLING THE CLIENT-SIDE HIDDEN WEB

CRAWLING THE CLIENT-SIDE HIDDEN WEB CRAWLING THE CLIENT-SIDE HIDDEN WEB Manuel Álvarez, Alberto Pan, Juan Raposo, Ángel Viña Department of Information and Communications Technologies University of A Coruña.- 15071 A Coruña - Spain e-mail

More information

Mediating and Metasearching on the Internet

Mediating and Metasearching on the Internet Mediating and Metasearching on the Internet Luis Gravano Computer Science Department Columbia University www.cs.columbia.edu/ gravano Yannis Papakonstantinou Computer Science and Engineering Department

More information

Symmetrically Exploiting XML

Symmetrically Exploiting XML Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference

More information

MIWeb: Mediator-based Integration of Web Sources

MIWeb: Mediator-based Integration of Web Sources MIWeb: Mediator-based Integration of Web Sources Susanne Busse and Thomas Kabisch Technical University of Berlin Computation and Information Structures (CIS) sbusse,tkabisch@cs.tu-berlin.de Abstract MIWeb

More information

Element Algebra. 1 Introduction. M. G. Manukyan

Element Algebra. 1 Introduction. M. G. Manukyan Element Algebra M. G. Manukyan Yerevan State University Yerevan, 0025 mgm@ysu.am Abstract. An element algebra supporting the element calculus is proposed. The input and output of our algebra are xdm-elements.

More information

Wrapper 2 Wrapper 3. Information Source 2

Wrapper 2 Wrapper 3. Information Source 2 Integration of Semistructured Data Using Outer Joins Koichi Munakata Industrial Electronics & Systems Laboratory Mitsubishi Electric Corporation 8-1-1, Tsukaguchi Hon-machi, Amagasaki, Hyogo, 661, Japan

More information

The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes

The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes J. Raposo, A. Pan, M. Álvarez, Justo Hidalgo, A. Viña Denodo Technologies {apan, jhidalgo,@denodo.com University

More information

I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications,

I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications, I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications, Proc. of the International Conference on Knowledge Management

More information

Automatically Maintaining Wrappers for Semi- Structured Web Sources

Automatically Maintaining Wrappers for Semi- Structured Web Sources Automatically Maintaining Wrappers for Semi- Structured Web Sources Juan Raposo, Alberto Pan, Manuel Álvarez Department of Information and Communication Technologies. University of A Coruña. {jrs,apan,mad}@udc.es

More information

Query Processing and Optimization on the Web

Query Processing and Optimization on the Web Query Processing and Optimization on the Web Mourad Ouzzani and Athman Bouguettaya Presented By Issam Al-Azzoni 2/22/05 CS 856 1 Outline Part 1 Introduction Web Data Integration Systems Query Optimization

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Scaling Access to Heterogeneous Data Sources with DISCO

Scaling Access to Heterogeneous Data Sources with DISCO 808 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 10, NO. 5, SEPTEMBER/OCTOBER 1998 Scaling Access to Heterogeneous Data Sources with DISCO Anthony Tomasic, Louiqa Raschid, Member, IEEE, and

More information

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016 + Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

Information Discovery, Extraction and Integration for the Hidden Web

Information Discovery, Extraction and Integration for the Hidden Web Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk

More information

Stored Relvars 18 th April 2013 (30 th March 2001) David Livingstone. Stored Relvars

Stored Relvars 18 th April 2013 (30 th March 2001) David Livingstone. Stored Relvars Stored Relvars Introduction The purpose of a Stored Relvar (= Stored Relational Variable) is to provide a mechanism by which the value of a real (or base) relvar may be partitioned into fragments and/or

More information

CSE-6490B Final Exam

CSE-6490B Final Exam February 2009 CSE-6490B Final Exam Fall 2008 p 1 CSE-6490B Final Exam In your submitted work for this final exam, please include and sign the following statement: I understand that this final take-home

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information

Distributed Database System. Project. Query Evaluation and Web Recognition in Document Databases

Distributed Database System. Project. Query Evaluation and Web Recognition in Document Databases 74.783 Distributed Database System Project Query Evaluation and Web Recognition in Document Databases Instructor: Dr. Yangjun Chen Student: Kang Shi (6776229) August 1, 2003 1 Abstract A web and document

More information

A Component-based Approach for Engineering Enterprise Mashups

A Component-based Approach for Engineering Enterprise Mashups A Component-based Approach for Engineering Enterprise Mashups Javier López 1, Fernando Bellas 1, Alberto Pan 1, Paula Montoto 2, 1 Information and Communications Technology Department, University of A

More information

Data Integration by Describing Sources with Constraint Databases

Data Integration by Describing Sources with Constraint Databases Data Integration by Describing Sources with Constraint Databases Xun Cheng 1 Guozhu Dong 2 Tzekwan Lau 1 Jianwen Su 1 xun@cs.ucsb.edu gdong@cs.wright.edu tzekwan@cs.ucsb.edu su@cs.ucsb.edu 1 Department

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

Overview of the Integration Wizard Project for Querying and Managing Semistructured Data in Heterogeneous Sources

Overview of the Integration Wizard Project for Querying and Managing Semistructured Data in Heterogeneous Sources In Proceedings of the Fifth National Computer Science and Engineering Conference (NSEC 2001), Chiang Mai University, Chiang Mai, Thailand, November 2001. Overview of the Integration Wizard Project for

More information

Optimized Query Plan Algorithm for the Nested Query

Optimized Query Plan Algorithm for the Nested Query Optimized Query Plan Algorithm for the Nested Query Chittaranjan Pradhan School of Computer Engineering, KIIT University, Bhubaneswar, India Sushree Sangita Jena School of Computer Engineering, KIIT University,

More information

An Approach to Resolve Data Model Heterogeneities in Multiple Data Sources

An Approach to Resolve Data Model Heterogeneities in Multiple Data Sources Edith Cowan University Research Online ECU Publications Pre. 2011 2006 An Approach to Resolve Data Model Heterogeneities in Multiple Data Sources Chaiyaporn Chirathamjaree Edith Cowan University 10.1109/TENCON.2006.343819

More information

WEB DATA EXTRACTION METHOD BASED ON FEATURED TERNARY TREE

WEB DATA EXTRACTION METHOD BASED ON FEATURED TERNARY TREE WEB DATA EXTRACTION METHOD BASED ON FEATURED TERNARY TREE *Vidya.V.L, **Aarathy Gandhi *PG Scholar, Department of Computer Science, Mohandas College of Engineering and Technology, Anad **Assistant Professor,

More information

Inference in Hierarchical Multidimensional Space

Inference in Hierarchical Multidimensional Space Proc. International Conference on Data Technologies and Applications (DATA 2012), Rome, Italy, 25-27 July 2012, 70-76 Related papers: http://conceptoriented.org/ Inference in Hierarchical Multidimensional

More information

Query Transformation of SQL into XQuery within Federated Environments

Query Transformation of SQL into XQuery within Federated Environments Query Transformation of SQL into XQuery within Federated Environments Heiko Jahnkuhn, Ilvio Bruder, Ammar Balouch, Manja Nelius, and Andreas Heuer Department of Computer Science, University of Rostock,

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Information Mediation Across Heterogeneous Government Spatial Data Sources

Information Mediation Across Heterogeneous Government Spatial Data Sources Information Mediation Across Heterogeneous Government Spatial Data Sources Amarnath Gupta Ashraf Memon Joshua Tran Rajiv P. Bharadwaja Ilya Zaslavsky San Diego Supercomputer Center University of California

More information

Flat (Draft) Pasqualino Titto Assini 27 th of May 2016

Flat (Draft) Pasqualino Titto Assini 27 th of May 2016 Flat (Draft) Pasqualino Titto Assini (tittoassini@gmail.com) 27 th of May 206 Contents What is Flat?...................................... Design Goals...................................... Design Non-Goals...................................

More information

Semantic Optimization of Preference Queries

Semantic Optimization of Preference Queries Semantic Optimization of Preference Queries Jan Chomicki University at Buffalo http://www.cse.buffalo.edu/ chomicki 1 Querying with Preferences Find the best answers to a query, instead of all the answers.

More information

Semistructured Data Store Mapping with XML and Its Reconstruction

Semistructured Data Store Mapping with XML and Its Reconstruction Semistructured Data Store Mapping with XML and Its Reconstruction Enhong CHEN 1 Gongqing WU 1 Gabriela Lindemann 2 Mirjam Minor 2 1 Department of Computer Science University of Science and Technology of

More information

Working with Mediator Framework

Working with Mediator Framework CHAPTER 2 This chapter describes the Mediator framework and includes the following sections: Framework Overview, page 2-1 Configurable Nodes, page 2-2 Composite Nodes, page 2-4 Getting and Setting Node

More information

SEMI-AUTOMATIC WRAPPER GENERATION AND ADAPTION Living with heterogeneity in a market environment

SEMI-AUTOMATIC WRAPPER GENERATION AND ADAPTION Living with heterogeneity in a market environment SEMI-AUTOMATIC WRAPPER GENERATION AND ADAPTION Living with heterogeneity in a market environment Michael Christoffel, Bethina Schmitt, Jürgen Schneider Institute for Program Structures and Data Organization,

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the fundamentals of SQL and PL/SQL along with the

More information

Relational Algebra and SQL

Relational Algebra and SQL Relational Algebra and SQL Relational Algebra. This algebra is an important form of query language for the relational model. The operators of the relational algebra: divided into the following classes:

More information

Oracle Database: SQL and PL/SQL Fundamentals Ed 2

Oracle Database: SQL and PL/SQL Fundamentals Ed 2 Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Database: SQL and PL/SQL Fundamentals Ed 2 Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals

More information

Oracle Database 11g: SQL and PL/SQL Fundamentals

Oracle Database 11g: SQL and PL/SQL Fundamentals Oracle University Contact Us: +33 (0) 1 57 60 20 81 Oracle Database 11g: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn In this course, students learn the fundamentals of SQL and PL/SQL

More information

Object-Oriented Mediator Queries to Internet Search Engines

Object-Oriented Mediator Queries to Internet Search Engines Presented at International Workshop on Efficient Web-based Information Systems (EWIS-2002), Montpellier, France, September 2nd, 2002. Object-Oriented Mediator Queries to Internet Search Engines Timour

More information

An Object-Oriented Multi-Mediator Browser

An Object-Oriented Multi-Mediator Browser Presented at 2 nd International Workshop on User Interfaces to Data Intensive Systems, Zurich, Switzerland, May 31. - June 1. 2001 An Object-Oriented Multi-Mediator Browser Kristofer Cassel and Tore Risch

More information

CS W Introduction to Databases Spring Computer Science Department Columbia University

CS W Introduction to Databases Spring Computer Science Department Columbia University CS W4111.001 Introduction to Databases Spring 2018 Computer Science Department Columbia University 1 in SQL 1. Key constraints (PRIMARY KEY and UNIQUE) 2. Referential integrity constraints (FOREIGN KEY

More information

Answering Queries with Useful Bindings

Answering Queries with Useful Bindings Answering Queries with Useful Bindings CHEN LI University of California at Irvine and EDWARD CHANG University of California, Santa Barbara In information-integration systems, sources may have diverse and

More information

EXTRACTION AND ALIGNMENT OF DATA FROM WEB PAGES

EXTRACTION AND ALIGNMENT OF DATA FROM WEB PAGES EXTRACTION AND ALIGNMENT OF DATA FROM WEB PAGES Praveen Kumar Malapati 1, M. Harathi 2, Shaik Garib Nawaz 2 1 M.Tech, Computer Science Engineering, 2 M.Tech, Associate Professor, Computer Science Engineering,

More information

RAQUEL s Relational Operators

RAQUEL s Relational Operators Contents RAQUEL s Relational Operators Introduction 2 General Principles 2 Operator Parameters 3 Ordinary & High-Level Operators 3 Operator Valency 4 Default Tuples 5 The Relational Algebra Operators in

More information

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages S.Sathya M.Sc 1, Dr. B.Srinivasan M.C.A., M.Phil, M.B.A., Ph.D., 2 1 Mphil Scholar, Department of Computer Science, Gobi Arts

More information

Query Containment for XML-QL

Query Containment for XML-QL Query Containment for XML-QL Deepak Jindal, Sambavi Muthukrishnan, Omer Zaki {jindal, sambavi, ozaki}@cs.wisc.edu University of Wisconsin-Madison April 28, 2000 Abstract The ability to answer an incoming

More information

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while 1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling

More information

Who won the Universal Relation wars? Alberto Mendelzon University of Toronto

Who won the Universal Relation wars? Alberto Mendelzon University of Toronto Who won the Universal Relation wars? Alberto Mendelzon University of Toronto Who won the Universal Relation wars? 1 1-a Who won the Universal Relation wars? The Good Guys. The Good Guys 2 3 Outline The

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

XQuery Optimization Based on Rewriting

XQuery Optimization Based on Rewriting XQuery Optimization Based on Rewriting Maxim Grinev Moscow State University Vorob evy Gory, Moscow 119992, Russia maxim@grinev.net Abstract This paper briefly describes major results of the author s dissertation

More information

Arbori Starter Manual Eugene Perkov

Arbori Starter Manual Eugene Perkov Arbori Starter Manual Eugene Perkov What is Arbori? Arbori is a query language that takes a parse tree as an input and builds a result set 1 per specifications defined in a query. What is Parse Tree? A

More information

Overcoming Schematic Discrepancies in Interoperable Databases *

Overcoming Schematic Discrepancies in Interoperable Databases * Overcoming Schematic Discrepancies in Interoperable Databases * F. Saltor, M. G. Castellanos and M. García-Solaco Dept. de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Pau

More information

The Benefits of Utilizing Closeness in XML. Curtis Dyreson Utah State University Shuohao Zhang Microsoft, Marvell Hao Jin Expedia.

The Benefits of Utilizing Closeness in XML. Curtis Dyreson Utah State University Shuohao Zhang Microsoft, Marvell Hao Jin Expedia. The Benefits of Utilizing Closeness in XML Curtis Dyreson Utah State University Shuohao Zhang Microsoft, Marvell Hao Jin Expedia.com Bozen-Bolzano - September 5, 2008 Outline Motivation Tools Polymorphic

More information

User-defined Functions. Conditional Expressions in Scheme

User-defined Functions. Conditional Expressions in Scheme User-defined Functions The list (lambda (args (body s to a function with (args as its argument list and (body as the function body. No quotes are needed for (args or (body. (lambda (x (+ x 1 s to the increment

More information

Indexing XML Data with ToXin

Indexing XML Data with ToXin Indexing XML Data with ToXin Flavio Rizzolo, Alberto Mendelzon University of Toronto Department of Computer Science {flavio,mendel}@cs.toronto.edu Abstract Indexing schemes for semistructured data have

More information

Data Integration Systems

Data Integration Systems Data Integration Systems Haas et al. 98 Garcia-Molina et al. 97 Levy et al. 96 Chandrasekaran et al. 2003 Zachary G. Ives University of Pennsylvania January 13, 2003 CIS 650 Data Sharing and the Web Administrivia

More information

Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems

Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California, Irvine, CA 92697 chenli@ics.uci.edu Abstract In data-integration

More information

On the Role of Integrity Constraints in Data Integration

On the Role of Integrity Constraints in Data Integration On the Role of Integrity Constraints in Data Integration Andrea Calì, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza

More information

Data Warehousing Alternatives for Mobile Environments

Data Warehousing Alternatives for Mobile Environments Data Warehousing Alternatives for Mobile Environments I. Stanoi D. Agrawal A. El Abbadi Department of Computer Science University of California Santa Barbara, CA 93106 S. H. Phatak B. R. Badrinath Department

More information

Let the dynamic table support the operations TABLE-INSERT and TABLE-DELETE It is convenient to use the load factor ( )

Let the dynamic table support the operations TABLE-INSERT and TABLE-DELETE It is convenient to use the load factor ( ) 17.4 Dynamic tables Let us now study the problem of dynamically expanding and contracting a table We show that the amortized cost of insertion/ deletion is only (1) Though the actual cost of an operation

More information

A Data warehouse within a Federated database architecture

A Data warehouse within a Federated database architecture Association for Information Systems AIS Electronic Library (AISeL) AMCIS 1997 Proceedings Americas Conference on Information Systems (AMCIS) 8-15-1997 A Data warehouse within a Federated database architecture

More information

Browsing in the tsimmis System. Stanford University. into requests the source can execute. The data returned by the source is converted back into the

Browsing in the tsimmis System. Stanford University. into requests the source can execute. The data returned by the source is converted back into the Information Translation, Mediation, and Mosaic-Based Browsing in the tsimmis System SIGMOD Demo Proposal (nal version) Joachim Hammer, Hector Garcia-Molina, Kelly Ireland, Yannis Papakonstantinou, Jerey

More information

birds fly S NP N VP V Graphs and trees

birds fly S NP N VP V Graphs and trees birds fly S NP VP N birds V fly S NP NP N VP V VP S NP VP birds a fly b ab = string S A B a ab b S A B A a B b S NP VP birds a fly b ab = string Grammar 1: Grammar 2: A a A a A a B A B a B b A B A b Grammar

More information

DENODO VIRTUAL DATAPORT 4.5 ADMINISTRATOR GUIDE

DENODO VIRTUAL DATAPORT 4.5 ADMINISTRATOR GUIDE DENODO VIRTUAL DATAPORT 4.5 ADMINISTRATOR GUIDE Update Dec 09 th, 2009 NOTE This document is confidential and is the property of denodo technologies (hereinafter denodo). No part of the document may be

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

An ODBC CORBA-Based Data Mediation Service

An ODBC CORBA-Based Data Mediation Service An ODBC CORBA-Based Data Mediation Service Paul L. Bergstein Dept. of Computer and Information Science University of Massachusetts Dartmouth, Dartmouth MA pbergstein@umassd.edu Keywords: Data mediation,

More information

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9) Technology & Information Management Instructor: Michael Kremer, Ph.D. Class 6 Professional Program: Data Administration and Management MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9) AGENDA

More information

SQLfX Live Online Demo

SQLfX Live Online Demo SQLfX Live Online Demo SQLfX Overview SQLfX (SQL for XML) is the first and only transparent total integration of native XML by ANSI SQL. It requires no use of XML centric syntax or functions, XML training,

More information

Chapter S:II. II. Search Space Representation

Chapter S:II. II. Search Space Representation Chapter S:II II. Search Space Representation Systematic Search Encoding of Problems State-Space Representation Problem-Reduction Representation Choosing a Representation S:II-1 Search Space Representation

More information

CSCD43: Database Systems Technology. Lecture 4

CSCD43: Database Systems Technology. Lecture 4 CSCD43: Database Systems Technology Lecture 4 Wael Aboulsaadat Acknowledgment: these slides are based on Prof. Garcia-Molina & Prof. Ullman slides accompanying the course s textbook. Steps in Database

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology Trees : Part Section. () (2) Preorder, Postorder and Levelorder Traversals Definition: A tree is a connected graph with no cycles Consequences: Between any two vertices, there is exactly one unique path

More information

Using Knowledge of Redundancy for Query Optimization in Mediators

Using Knowledge of Redundancy for Query Optimization in Mediators Using Knowledge of Redundancy for Query Optimization in Mediators From: AAAI Technical Report WS-98-14. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Vasilis Vassalos* Stanford

More information

Interrogation System Architecture of Heterogeneous Data for Decision Making

Interrogation System Architecture of Heterogeneous Data for Decision Making Interrogation System Architecture of Heterogeneous Data for Decision Making Cécile Nicolle, Youssef Amghar, Jean-Marie Pinon Laboratoire d'ingénierie des Systèmes d'information INSA de Lyon Abstract Decision

More information

Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L,

Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L, Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L, 2 0 1 8 Table of Contents Introduction 1 Parent Table Child Table Joins 2 Comparison to RDBMS LEFT OUTER

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Foundations of Databases

Foundations of Databases Foundations of Databases Free University of Bozen Bolzano, 2004 2005 Thomas Eiter Institut für Informationssysteme Arbeitsbereich Wissensbasierte Systeme (184/3) Technische Universität Wien http://www.kr.tuwien.ac.at/staff/eiter

More information

Systems. Ramana Yerneni, Chen Li. fyerneni, chenli, ullman, Stanford University, USA

Systems. Ramana Yerneni, Chen Li. fyerneni, chenli, ullman, Stanford University, USA Optimizing Large Join Queries in Mediation Systems Ramana Yerneni, Chen Li Jerey Ullman, Hector Garcia-Molina fyerneni, chenli, ullman, hectorg@cs.stanford.edu, Stanford University, USA Abstract. In data

More information

Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database

Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database Yuanying Mo National University of Singapore moyuanyi@comp.nus.edu.sg Tok Wang Ling National University of Singapore

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

Compression of the Stream Array Data Structure

Compression of the Stream Array Data Structure Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In

More information

NULLs & Outer Joins. Objectives of the Lecture :

NULLs & Outer Joins. Objectives of the Lecture : Slide 1 NULLs & Outer Joins Objectives of the Lecture : To consider the use of NULLs in SQL. To consider Outer Join Operations, and their implementation in SQL. Slide 2 Missing Values : Possible Strategies

More information

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information

A multiple join index for data warehouses

A multiple join index for data warehouses Proceedings of the 5th WSEAS Int. Conf. on DATA NETWORKS, COMMUNICATIONS & COMPUTERS, Bucharest, Romania, October 16-17, 26 7 A multiple join index for data warehouses ADI-CRISTINA MITEA Computer Science

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Query Containment for Data Integration Systems

Query Containment for Data Integration Systems Query Containment for Data Integration Systems Todd Millstein University of Washington Seattle, Washington todd@cs.washington.edu Alon Levy University of Washington Seattle, Washington alon@cs.washington.edu

More information

ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES

ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES Fidel Cacheda, Alberto Pan, Lucía Ardao, Angel Viña Department of Tecnoloxías da Información e as Comunicacións, Facultad

More information

An approach to the model-based fragmentation and relational storage of XML-documents

An approach to the model-based fragmentation and relational storage of XML-documents An approach to the model-based fragmentation and relational storage of XML-documents Christian Süß Fakultät für Mathematik und Informatik, Universität Passau, D-94030 Passau, Germany Abstract A flexible

More information

Utilizing a Common Language as a Generative Software Reuse Tool

Utilizing a Common Language as a Generative Software Reuse Tool Utilizing a Common Language as a Generative Software Reuse Tool Chris Henry and Stanislaw Jarzabek Department of Computer Science School of Computing, National University of Singapore 3 Science Drive,

More information

Lecture2: Database Environment

Lecture2: Database Environment College of Computer and Information Sciences - Information Systems Dept. Lecture2: Database Environment 1 IS220 : D a t a b a s e F u n d a m e n t a l s Topics Covered Data abstraction Schemas and Instances

More information

A survey: Web mining via Tag and Value

A survey: Web mining via Tag and Value A survey: Web mining via Tag and Value Khirade Rajratna Rajaram. Information Technology Department SGGS IE&T, Nanded, India Balaji Shetty Information Technology Department SGGS IE&T, Nanded, India Abstract

More information

Deep Web Content Mining

Deep Web Content Mining Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased

More information

Improving Resource Management And Solving Scheduling Problem In Dataware House Using OLAP AND OLTP Authors Seenu Kohar 1, Surender Singh 2

Improving Resource Management And Solving Scheduling Problem In Dataware House Using OLAP AND OLTP Authors Seenu Kohar 1, Surender Singh 2 Improving Resource Management And Solving Scheduling Problem In Dataware House Using OLAP AND OLTP Authors Seenu Kohar 1, Surender Singh 2 1 M.tech Computer Engineering OITM Hissar, GJU Univesity Hissar

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Multiplex: Integrating. Autonomous, Heterogeneous and Inconsistent Information Sources

Multiplex: Integrating. Autonomous, Heterogeneous and Inconsistent Information Sources Multiplex: Integrating Autonomous, Heterogeneous and Inconsistent Information Sources Amihai Motro Proceedings of NGITS 99 Fourth International Workshop on Next Generation Information Technologies and

More information

Essay Question: Explain 4 different means by which constrains are represented in the Conceptual Data Model (CDM).

Essay Question: Explain 4 different means by which constrains are represented in the Conceptual Data Model (CDM). Question 1 Essay Question: Explain 4 different means by which constrains are represented in the Conceptual Data Model (CDM). By specifying participation conditions By specifying the degree of relationship

More information