On Adaptive and Online Data Integration

Size: px
Start display at page:

Download "On Adaptive and Online Data Integration"

Transcription

1 University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 On Adaptive and Online Data Integration J. R. Getta University of Wollongong, jrg@uow.edu.au Publication Details This paper was originally published as: Getta, JR, On Adaptive and Online Data Integration, 21st International Conference on Data Engineering Workshops, 5-8 April 2005, Copyright 2005 IEEE. Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: research-pubs@uow.edu.au

2 On Adaptive and Online Data Integration Abstract The recent works on integration of large database systems distributed over wide-area networks concentrate on the adaptive and online techniques. Online property of data integration means continuous integration of transmitted data with the already available results. Adaptivity materializes in a form of dynamic adjustments to the data integration plans in a response to the recent characteristics of data transmission. Implementation of adaptive and online data integration needs the specialized systems of operations and transformations of integration plans. This paper describes a new class of elementary operations on increments and/or decrements of data and shows how to express data integration plans as sequences of elementary operations. We demonstrate that class of operations proposed in the paper is sufficient for implementation of online and adaptive data integration systems and we discuss the operational properties of such systems. Disciplines Physical Sciences and Mathematics Publication Details This paper was originally published as: Getta, JR, On Adaptive and Online Data Integration, 21st International Conference on Data Engineering Workshops, 5-8 April 2005, Copyright 2005 IEEE. This conference paper is available at Research Online:

3 On Adaptive and Online Data Integration Janusz R. Getta School of Information Technology and Computer Science University of Wollongong Wollongong, NSW 2522, Australia Abstract The recent works on integration of large database systems distributed over wide-area networks concentrate on the adaptive and online techniques. Online property of data integration means continuous integration of transmitted data with the already available results. Adaptivity materializes in a form of dynamic adjustments to the data integration plans in a response to the recent characteristics of data transmission. Implementation of adaptive and online data integration needs the specialized systems of operations and transformations of integration plans. This paper describes a new class of elementary operations on increments and/or decrements of data and shows how to express data integration plans as sequences of elementary operations. We demonstrate that class of operations proposed in the paper is sufficient for implementation of online and adaptive data integration systems and we discuss the operational properties of such systems. 1. Introduction Advances in the technologies of persistent storage and wide-area networks allow for the relatively inexpensive implementations of unified and integrated views of data located at the remote and heterogeneous database systems. A central problem in the development of such systems is ad hoc integration of data transmitted over the networks. Efficiency of data integration depends on the advanced algorithms for merging the partial results of queries computed at the remote database sites. The recent trends in data integration lead towards online and adaptive algorithms. Online algorithms [7] process the incomplete sets of input data and continuously improve the solutions while the new data items are available for processing and the old data items are discarded. A typical example of an online algorithm is a virtual memory manager that operates on a window of theoretically unlimited sequence of tasks. Adaptive algorithms adjust their integration strategies to the external events, e.g. an arrival of a new packet of data or completion of transmission from a particular site. It is anticipated that data integration will soon emerge as an autonomous research area from the distributed computing and financial data processing triggered by the freely available distributed data sets and fast wide-area networks [8], [23]. Data integration has its roots in the processing of queries in the distributed and heterogeneous database systems, often called as multidatabase or federated database systems [25, 22]. The unpredictable behavior of data transmission systems and strong autonomy of remote database systems make the precise estimation of subquery processing time hard and imprecise. This is where the reactive query processing techniques show superiority over the classical proactive techniques commonly used for query processing in distributed database systems [1]. The early data integration systems looked for the solutions in the partitioning [6, 19] and dynamic modification of query processing plans [5, 10, 9]. Partitioning means that query execution plan is divided into subplans at a point when the further computations are no longer possible due to lack of data. Dynamic modification technique finds a plan equivalent to the original one plan and such that it can be partially computed with the available sets of data. Another group of ideas addresses the optimization of individual elementary operations used for data integration. The specialzed operations include the pipelined join operator XJoin [26], ripple join [14], double pipelined join [16], and hash-merge join [21]. The approaches based on scheduling change an order in which the operations are executed while preserving the semantics of data integration plan. The scheduling based techniques include query scrambling [28, 1] and dynamic scheduling of operators [27]. The techniques based on the redundant computations simultaneously execute a number of data integration plans leaving the plan that that provides the most advanced results [2].

4 The solutions based on data partitioning integrate different components of integrated arguments accordingly to different plans. The Eddies are able to process each tuple accordingly to a different plan [3]. A concept of state modules described in [24] allows for concurrent processing of the tuples and dynamically divides data integration task among different plans and executes the plans sequentially or in parallel. Adaptive data partitioning [17] technique processes different partitions of the same argument using different data integration plans. The recently developed data stream processing processing techniques [20, 11] also contribute to online data integration, e.g.. The works [4, 13, 15, 18] review the major solutions proposed so far. A more up-to-date and more detailed overview of the past works on adaptive data integration can be found in [12]. The approaches listed above adopt the relational model as a target data integration model and express the integration plan in the language of relational algebra. Majority of the works is limited to the plans exclusively formed from join operations and use dynamic query transformation and query scrambling techniques to migrate from one integration plan to another. The works on adaptive data partitioning [17] and optimizations of data stream processing [11] are the first attempts to use the associativity of join operation to integrate the different partitions of the same arguments accordingly to the different integration plans. It seems to us that relational algebra in its standard form is not the best language to describe the processes of online and adaptive data integration and that we need a new system of more elementary operations. The basic idea behind the online and adaptive computations is to restart the computations each time the processing of recently arrived data is possible and to reformulate an integration plan each time it is blocked by missing data. A data integrator processes a bit, waits, again processes a bit, again waits, and from time to time it adjusts a plan to the available data. A typical feature of online integration is that it never operates on a complete set of data. When the relational model is applied as a target integration mode, a data integrator must operate on the increments and decrements of relational tables and already integrated contents of the remaining relational tables. An increment is a collection of the most recently arrived and not yet processed packets of data. The decrements are created by non-monotonic operations like set difference operation where an increment of right hand side argument of the operation produces a decrement of the previous result of the set difference. As a consequence, the elementary operations of online data integrator should process the increments and/or decrements against the fixed size relational tables. Then, a data integration plan is a sequence of elementary operations whose arguments are the modifications of data containers and other data containers. The results of one elementary operation are passed to the next operation in a sequence. Adaptability of the system is achieved through a collection of rules that transform the plans blocked by unavailable data into the equivalent ones whose further execution is possible. The main objective of this work is to propose a system of elementary operations for online and adaptive integration of data and to show how such system can be applied in practice. In particular, we show that it is possible to derive such a system from a given collection of base operations, i.e. the operations on data containers like for instance relational algebra operations, or aggregation operations. Then, we define a data integration plan as a collection of local integration plans formed from the sequences of elementary operations and we discuss the plan transformations rules needed for the implementation of adaptive features of a sample data integration system. The paper is organized in the following way. Section 2 describes a data integration model used throughout the paper. The system of elementary operations and data flow expressions are defined in the Sections 3 and 4. Section 5 shows how the formal data integration model proposed in the previous sections can be used in implementation of a sample data integration system. Section 6 summarizes and concludes the paper. 2. Data integration model Consider a distributed multidatabase system that integrates a number of remote and heterogeneous database systems such that remote database sites are entirely transparent at a central site. A middleware that integrates the databases provides the users with a single view of a homogeneous database. Then, a query q(r 1,...,r k ) on a subset r 1,...,r k of the view is decomposed into k subqueries q r1,...,q rk that encapsulate the computations performed at the remote systems. Two generic strategies of distributed query processing either optimize an overall amount of time spend on the computations or optimize the total amount of data transmitted over a network. Query processing time is minimized when the queries q r1,...,q rk are submitted and processed simultaneously at the remote sites. Processing of subqueries one at a time and applying the results of one subquery to modify the remaining subqueries minimizes the amounts of transmitted data. The entire continuum of hybrid strategies is contained between these two extremes. Selection of the best strategy is a hard problem and it is beyond a scope of this paper. We adopt a strategy that minimizes query processing time through the simultaneous computations at the remote database sites. The results obtained from the remote sites are transmitted back to the central site. Next, the results are transformed into the containers r 1,...,r k structurally con-

5 sistent with a data model at the central site, i.e. into the relational tables. Finally, the results are integrated into the final answer accordingly to a global data integration plan P(r 1,...,r k ) derived from the original query q and built from the base operations on the data containers e.g. the relational algebra operations on the relational tables. A simple and rather ineffective approach would be to delay the integration until all partial results are fully transmitted to the central site. Contrary, an impatient approach that wakes up a data integrator each time a new packet of data arrives, would need too much time spent on the organizational aspects of the process. In this work we consider a strategy where a data integrator wakes up at the fixed intervals of time and starts integration only if there is enough data transmitted since the last integration cycle. If it is so, the recently arrived packets of data are integrated with the already available results. Such approach invalidates an idea of single global data integration plan because it may happen that partial results required to follow the plan are unavailable at the moment. On the other hand a global plan cannot be completely rejected because it represents the semantics of a database application. A solution is to transform the global plan into a set of local plans describing the actions performed when a new increment of data should be integrated with the already available partial results. The actions are expressed as elementary operations on the increments and/or decrements of data containers and other static data containers. The local integration plans plans are expressed as the sequences of elementary operations. 3. Elementary operations Let r and s be data containers, e.g. relational tables. A base operation A(r, s) is an operation whose arguments are data containers and result of the operation is a data container as well. A modification δ r of a data container r is a pair of containers <δr, δ r + > such that both elements of the pair have have the same structure (schema) as r. The first element δr of the pair represents the data items that should be removed from r to implement the first stage of the modification. The second element δ r + of the pair represents the data items that should be added to r to implement the second stage of the modification. An operation that integrates a container r with a modification δ r = <δr,δ r + > is denoted by r δ r and it is called as data integration operation. In the relational model a data integration operation is defined by an expression (r δr ) δ r +. An incremental/decremental operation (id-operation ) for the first argument r of a base operation A(r, s) is denoted by α A (δ r,s) and its result is a pair of the smallest and disjoint sets <δα,δ α + > that should be integrated with the result of A(r, s) to obtain the result of A((r δ r ),s) i.e. A(r, s) α A (δ r,s)=a((r δ r ),s) (1) An incremental/decremental operation (id-operation ) for the second argument s of a base operation A(r, s) is denoted by β A (r, δ s ) and its result is a pair of the smallest and disjoint sets <δ β,δ+ β > that should be integrated with the result of A(r, s) to obtain the result of A(r, (s δ s )), i.e. A(r, s) β A (r, δ s )=A(r, (s δ s )) (2) A base operation A(r, s) always has two id-operations α A (δ r,s) and β A (r, δ s ), one for processing δ r and other one for processing δ s. If a base operation is commutative then its id-operations are the same. If a base operation A(r, s) is monotonic for an argument r, i.e. A(r, s) A(r δ r,s) then a negative component of modification computed by α A (δ r,s) is always empty. Id-operations process the modifications of data containers and produce the modifications that can be integrated with the previous results of the respective base operation to obtain the new results of the base operation without its full re-computation. This is what is precisely needed for data integration. A modification of an argument in a global data integration plan is processed by an appropriate idoperation. The id-operation produces a modification which is processed by the next id-operation and so on until the final modification is integrated with the previous partial answer to provide a new partial answer. An interesting problem is how to find id-operations for a given base operation. If for a particular system of the base operations and data integration operation it is possible to express A((r δ r ),s) as a combination of an old result of base operation A(r, s) and modification δ r then it is possible to find the respective id-operations as the smallest solutions of the equations (1) and (2). In this paper we consider the relational model with the base operations of union ( ), join ( ), and antijoin ( ) and data integration operation operation defined as r δ r =(r δ r ) δ + r. We ignore the unary operations of selection (σ) and projection (π) as they can always be attached to the inputs or outputs of the binary operations. To solve the equation (1) we have to separately consider the negative and positive components of δ r and data integration operation. It leads to the equations: A(r, s) α(δ r,s)=a(r δ r,s) (3) A(r, s) α(δ r +,s)=a(r δ r +,s) (4) We are looking for the smallest solutions of the equations (3) and (4). The first equation is of type A x = A B where A, B, x are sets. The find the smallest solution we transform the equation into an equivalent fixed point equation x = x ((A x) (A B)) ((A B) (A x)). The solution of the fixed point equation is obtained

6 through a sequence of iterations starting from x =. In the second iteration the fixed point reached and it is equal x = A B. Hence, the solution of equation (3) is α(δr,s)=a(r, s) δr. For example if A(r, s) =r s then α(δr,s) = (r s) δr =. Note, that if δr denotes the rows removed from r then δr r. Finally, we get α(δr,s)=δr s. It is possible to derive in the same way all id-operation for the remaining base operations. Id-operations for the arguments of join ( ) are defined as follows: α (δ r,s)=< (δ r s), (δ + r s) > (5) β (r, δ s )=< (δs r), (δ s + r) > (6) Id-operations for the arguments of antijoin ( ) are defined as follows: α (δ r,s)=< (δ r s), (δ + r s) > (7) β (r, δ s )=< (r δ s + ), (r δs ) > (8) Finally, id-operations for the arguments of union ( ) are defined as follows: α (δ r,s)=< (δ r s), (δ + r s) > (9) β (r, δ s )=< (δs r), (δ s + r) > (10) As a sample application of id-operations, consider a global data integration plan q(r, s, t) =t (r s) and modification δ s = <, δ s + > of an argument s. Then, (8) and (5) contribute to a formula for processing δ s. Application of β to <, δ s + > provides <r δ s +, >. Next, application of α to the previous result provides < (r δ s + ) t, > Finally, the modifications should be integrated with the partial result of q as follows q := q (r δ s + ) t. A formula for processing the modifications of r can be derived in a similar way using (7) and (5) q := q (δ r + s) t. Processing the modifications of argument t requires the transformation of q(r, s, t) into an equivalent expression (t r) s. Then, application of (5) and (7) provides q := q (δ t + r) s. A problem what to do when the transformation performed above is impossible is discussed in the next sections. As another example consider a system of operation F = {agg, } where is a set union operation and agg is defined as follows. The operation agg x,a (r, s) replaces the second argument s with the result of SQL statement: SELECT x, sum(a) FROM r GROUP BY x; The id-operations of are the same as in the previous system. An id-operation α agg (δ r,s) combines δ r + with s in the following way. for all t δ r + if there exists t s such that t.x = t.x then replace t with t.a := t a + t.a; insert old t into δ agg and insert new t into δ + agg; else add t to s and add t to δ + agg; end if; for all t δ r if there exists t s such that t.x = t.x then replace t with t.a := t a t.a; insert old t into δ agg and insert new t into δ + agg; Finally, an id-operation β agg (r, δ s )=<δ s,δ + s >. 4. Data flow expressions A data flow expression is a sequence r 0 :α 1 (r 1 )...α n (r n ) where r 0 is a data container and each α i (r i ), i =1,...,n is either an abbreviation of id-operation α(δ rj,r i ) or abbreviation of data integration operation δ rj r i. The adjacent id-operations in a data flow expression are connected such that modification generated by α i is used as an argument δ αi of its successor α i+1. The evaluation of an expression starts from the first id-operation α 1 (δ r0,r 1 ).A modification δ α1 produced by the first id-operation becomes an argument of the next id-operation α 2 (δ α1,r 2 ). For example, r:α (s)α (t) (w) is a data flow expression where a modification δ r of argument r is joined with s. Then, t is deducted from the results of the join, and the results of the difference are integrated with w. A data flow expression related to an argument r i of an expression E(r 1,...,r i,...,r n ) is constructed through traversal of a syntax tree of E from a leaf node labeled with r i to the root node. Initially, at a leaf node r i, we start from an empty expression r i :. Next, we move one level up to a base operation operation A(E 1,E 2 ) where E 1 and E 2 are subexpressions (subtrees in a syntax tree) bound with a base operation A. If a subexpression E 1 is on the path being traversed then we append id-operation α A (w E2 ) to the data flow expression expression. Otherwise, if E 2 is on the path being traversed then we append β A (w E1 ) to the expression. Next, we move one level up to the next base operation and we repeat the actions listed above. At the end when all paths from the leaf nodes to the root node are traversed and data flow expression generated then we insert into the expressions data integration operations that produce the intermediate results. For example, application of the procedure described above to a relational algebra expression r (s b t) provides the following data flow expressions: r: α (w st ) (w) s: α (t) (w st ) β (r) (w)

7 t: β (s) (w st ) β (r) (w) Data flow expressions represent the sequences of operations performed on the recently arrived modifications at a data integration stage. Like in the traditional query processing, optimization of data integration expressions is performed through the transformations of data flow expressions. One group of transformations moves the most restrictive id-operations towards the left hand side of an expression in order to eliminate at the early stages of data integration as many data items as it is possible. The other group removes the intermediate data containers created and modified during the integration in order to reduce the total number of operations on persistent storage. Consider a data flow expression p which contains two adjacent id-operations α A (r i ) α B (r j ). A data flow expression p obtained from p by the order of id-operations α B (r i ) iα A (r j ) is equivalent to p if the respective base operations are associative, i.e. B(A(r, s),t)=a((b(r, t),s). Associativity of adjacent operations allows for the elimination of intermediate data containers. As an example consider the following system of data flow expressions. r: α A (s) (w rs ) α B (t) (w) s: β A (r) (w rs ) α B (t) (w) t: β B (w rs ) (w) where w rs is always equal to the result of A(r, s). Hence, the third data flow expression can be expressed as t: β B (A(r, s)). It is equivalent to two relational algebra expressions β B (A(r, s),δ t ) and β + B (A(r, s),δ+ t ). If the base operations A and B are associative then the expressions can be transformed into A(β B (r, δ t ),s) and A(β + B (r, δ+ t ),s). Taking the expressions together and replacing a base operation A with an id-operation α a we obtain α A (β B (r, δ t ),s) and in the consequence a data flow expression t: β B (r) α A (s) (w). Now, a temporary container w rs can be removed from the remaining dataflow expressions: r: α A (s) α B (t) (w) s: β A (r) α B (t) (w) 5. Data integration Let r 1,...,r k be the results of k subqueries q 1,...,q k computed at the remote database sites and transmitted to the central site. A global data integration plan P(r 1,...,r k ) is an expression build over the data containers r 1,...,r k and the base operations, e.g. relational algebra operations. In the traditional approaches data integration is delayed until the arguments bound by the base operations in P are available at the central site. Adaptive and incremental strategies allows for data integration while the arguments are still transmitted over a network. Implementation of incremental strategy needs the translation of a global integration plan into a set of local integration plans. A set of local integration plans for P(r 1,...,r k ) is equivalent to set of data flow expressions {p 1,...p k } where each p i represents a way how the increments of an argument r i are integrated with the intermediate results. An individual data integration plan p i is a sequence of id-operations performed by the system in order to process an increment δ ri. Consider a logical data intgeration plan (r s) t. An incremental integration strategy transforms the plan into the following individual data integration plans: r: α (s) α (t) (w) s:α (r) α (t) (w) t:α (s) α (r) (w) In another example elimination of union operation from a logical data integration expression r(ab) (s(ab) t(ab)) leads to expression with two occurrences of an argument r, i.e. (r(ab) s(ab)) (r(ab) t(ab)). Then an individual integration plan for an argument r consists of two data flow expressions: r : α (t) (v rt ) r : α (s) (v rs ) α (v rt ) (w) The remaining individual integration plans are as follows s: β (r) (v rs ) α (v rt ) (w) t: β (r) (v rt ) α (v rs ) (w) A global data integration plan P implemented as a set of local data integration plans allows for a correct and adaptive integration of the partial results. The local data integration plans are created such that each argument of the respective logical data integration expression gets its local plan. If, like in the example above, the same argument used used more than one time then swe get more than one plan as well. All plans associated with a given argument are activated when an increment of the argument has to be processed. Each local plan is a data flow expression constructed and optimized in a way described in the previous section. A process of incremental and adaptive data integration wakes up at the regular intervals of time, verifies the amounts of data transmitted since the last integration, and if there is enough data, prepares and implements the local integration plans. An algorithm that constructs the data flow expressions from a global data integration plan P is used to formulate a set of initial local integration plans. Next, the optimizations of the data flow expressions described in the previous section move the most selective operations towards the begining of each local plan and try to eliminate the integrations with the intermediate results. The optimization of the local plans assumes the most optimistic case of the initial availability and continuous transmissions of all arguments. In the reality the initializations of transmissions are frequently delayed or the transmissions cannot be completed for a longer period of time. This is why some the local plans have to be either suspended or reduced to the id-operations that can be

8 executed in a given moment of time followed by the integrations with the temporary data containers. The first run of the data integrator transforms the local plans obtained from the optimizer in way that takes under the consideration availability of the arguments and optimal integration of the available data. Each next invocation, adjusts the plans used in the pervious run to reflect the availability of the new arguments. When all arguments are partially available at the central site the local plans return to their optimized form. The run time transformations of local plans include the addition and elimination of integrations with the temporary data containers, elimination of subexpressions that can be totally evaluated and replaced with a constant data container, changing the order elimination of the local plans. Addition of the integration with a temporary data container is need when the computations of a plan r :α 1 (r 1 ),...,α i 1 (r i 1 )α i (r i ),... cannot be completed because a container r i is not available at the moment. Then, the plan is computed partially and integration with an intermediate container v i is inserted in front of α i in the following way r :α 1 (r 1 ),..., (v i )α i (r i ),... Moreover, a sequence of id-operations α 1 (r 1 ),...,α i 1 (r i 1 ) is replaced with β i (v i ) in all other local plans. A temporary data container is removed from the local plan r when an argument r i is not empty. Then, (v i ) is removed from the plan and β i (v i ) is replaced with the original sequence of operations in all other plans wherever it occurs. When a data integrator is invoked for the first time then some of the transmissions from the remote sites may already be completed. If both arguments of a base operation in a global integration plan are available then such operation can be computed in a traditional way and its results can be incorporated as a constant argument into the plan. Consider the local plans r i :α A (r j ),α B (r k )... and r j :β A (r i ),α B (r k )... and assume that both r i and r j are available for integration. Then, the respective base operation A(r i,r j ) is computed and its result r ij obtains a new local integration plan r ij :α B (r k )... and the plans r i and r j are removed. In all other plans a sequence α A (r j )α B (r k ) is replaced with β B (r ij ). Elimination of subexpression in a way described above is possible only if completely unavailable argument at one stage of integration is totally available. What if in the same situation transmission of some of the arguments is completed but no base operations can be computed? Consider a plan r i :α A (r j )α B (r k )... and assume that transmission of data container r i is completed. Then, a status of r i is changed to ready and its plan r i is removed from a set of local plans. Each of the arguments involved in data integration has its status recorded and maintained by the system. At the very beginning of data integration all arguments obtain a status missed active ready idle Figure 1. The transitions of argument states missing. Next, when an argument arrives and its transmission is completed the status changes to ready. If only a part of argument arrives its status is active and after the part is integrated a status changes to idle. The state transitions given in Figure 1 occur when a data integrator completes an integration cycle. When the data integrator wakes up for the first time the only local integration plans are those directly constructed and optimized from a global plan. First, data integrator considers the arguments that changed their status from missing to ready. The subexpressions of a global integration plan are computed in a way described above. The local plans for the arguments that that have status ready are removed from a set of local plans. Next, data integrator considers the arguments that changed their status from missing to active, i.e. only some of the components of these arguments have arrived. The local plans related to these arguments are computed as far as it is possible and whenever the computations do not reach integration with the final results then integration with a temporary relational table is performed, inserted into the plan, and the related local plans are modified in a way described above. No other state transitions are possible at the first integration stage. When the data integrator wakes up on any other time than the first time any transition of the argument states is possible. First, the data integrator considers the arguments that changed their status from missing to ready. The local integration plans for these arguments are removed from a set of local plans and the related plans are modified in a way described above. Next, the data integrator considers the arguments whose status has changed from active to ready. The local plans for these arguments are computed as far as possible and then the plans are removed from a set of local plans. Next, the data integrator considers the arguments that changed their status from missing to active. The local plans for these arguments are computed as far as possible and whenever the computations do not reach reach integration with the final results then integration with a temporary table is performed, inserted into the plan and the related plans are updated in a way described above. Whenever an argument is used in the computations then its plan

9 is made inactive for this cycle. If the computation of a local plan use a temporary relational table created earlier then the temporary table is removed from the plan and all other plans are updated in a way described above. Next, the data integrator considers the arguments whose status remained active and whose local plans have not been deactivated in this cycle These arguments are processed in the same way as above when a status have changed from missing to active. In all other cases, the integrator remains idle. 6. Summary and future work This paper considers the online and adaptive integration of large data sets distributed over the wide-area networks. We argue that traditional approach where the global integration plans are expressed as the relational algebra expressions is not appropriate to precisely describe the integration processes at a level where the individual packets of data are assembled into the final results. In contrast, we define a concept of id-operation as an elementary operation on the modifications (increments and/or decrements) of data containers and the partial results. Next, we show how to construct a data integration plan as a collection of data flow expressions composed of id-operations and data integeration operations. Finally, we describe the operational principles of a sample system capable of online and adaptive data integration. A number of interesting problems remains to be solved. These include a wider system of id-operations, investigations of the properties of dataflow algebra and further investigations on more advanced data integration algorithms References [1] L. Amsaleg, J. Franklin, and A. Tomasic. Dynamic query operator scheduling for wide-area remote access. Journal of Distributed and Parallel Databases, 6: , [2] G. Antoshenkov and M. Ziauddin. Query processing and optmization in oracle rdb. VLDB Journal, 5(4): , [3] R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages , [4] L. Bouganim, F. Fabret, and C. Mohan. A dynamic query processing architecture for data integration systems. Bulletin of the Technical Committee on Data Engineering, 23(2):42 48, June [5] J. Chudziak and J. R. Getta. On efficient query evaluation in multidatabase systems. In Second International Workshop on Advances in Database and Information Systems, ADBIS 95, pages 46 54, [6] R. L. Cole and G. Graefe. Optimization of dynamic query evaluation plans. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, [7] A. Fiat and G. J. Woeginger. On Line Algorithms, The State of the Art. Springer Verlag, [8] I. Foster and R. L. Grossman. Data integration in a bandwidth-rich world. Communications of the ACM, 46(11):51 57, November [9] J. R. Getta. Query scrambling in distributed multidatabase systems. In 11th Intl. Workshop on Database and Expert Systems Applications, DEXA 2000, [10] J. R. Getta and S. Sedighi. Optimizing global query processing plans in heterogeneous and distributed multi database systems. In 10th Intl. Workshop on Database and Expert Systems Applications, DEXA 1999, pages 12 16, [11] J. R. Getta and E. Vossough. Optimization of data stream processing. SIGMOD record, 33(3):34 39, [12] A. Gounaris, N. W. Paton, A. A. Fernandes, and R. Sakellariou. Adaptive query processing: A survey. In Proceedings of 19th British National Conference on Databases, pages 11 25, [13] G. Graefe. Dynamic query evaluation plans: Some course corrections? Bulletin of the Technical Committee on Data Engineering, 23(2):3 6, June [14] P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD 1999, Proceedings ACM SIGMOD Intl. Conf. on Management of Data, pages , [15] J. M. Hellerstein, M. J. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, and M. A. Shah. Adaptive query processing: Technology in evolution. Bulletin of the Technical Committee on Data Engineering, 23(2):7 18, June [16] Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages , [17] Z. G. Ives, A. Y. Halevy, and D. S. Weld. Adapting to source properties in processing data integration queries. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, [18] Z. G. Ives, A. Y. Levy, D. S. Weld, D. Florescu, and M. Friedman. Adaptive query processing for internet applications. Bulletin of the Technical Committee on Data Engineering, 23(2):19 26, June [19] N. Kabra and D. J. DeWitt. Efficient mid-query reoptimization of sub-optimal query execution plans. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, [20] S. Madden, M. A. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In Proceedings of the 2002 ACM SIGMOD Intl. Conf. on Management of Data, [21] M. F. Mokbel, M. Lu, and W. G. Aref. Hash-merge join: A non-blocking join algorithm for producing fast and early join results, [22] F. Ozcan, S. Nural, P. Koksal, C. Evrendilek, and A. Dogac. Dynamic query optimization in multidatabases. Bulletin of the Technical Committee on Data Engineering, 20:38 45, March 1997.

10 [23] A. Pan and A. Vina. An alternative architecture for financial data integration. Communications of the ACM, 47(5):37 40, May [24] V. Raman, A. Deshpande, and J. M. Hellerstein. Using state modules for adaptive query processing. In Proceeding of International Conference on Management of Data, [25] V. Srinivasan and M. J. Carey. Compensation-based on-line query processing. In Proceedings of the 1992 ACM SIG- MOD International Conference on Management of Data, pages , [26] T. Urhan and M. J. Franklin. Xjoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin 23(2), pages 27 33, [27] T. Urhan and M. J. Franklin. Dynamic pipeline scheduling for improving interactive performance of online queries. In Proceedings of International Conference on Very Large Databases, VLDB 2001, [28] T. Urhan, M. J. Franklin, and L. Amsaleg. Cost based query scrambling for initial delays. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA, pages , 1998.

Optimization of task processing schedules in distributed information systems

Optimization of task processing schedules in distributed information systems University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2011 Optimization of task processing schedules in distributed information

More information

Discovering Periodic Patterns in System Logs

Discovering Periodic Patterns in System Logs Discovering Periodic Patterns in System Logs Marcin Zimniak 1, Janusz R. Getta 2, and Wolfgang Benn 1 1 Faculty of Computer Science, TU Chemnitz, Germany {marcin.zimniak,benn}@cs.tu-chemnitz.de 2 School

More information

Adaptive Query Processing: A Survey

Adaptive Query Processing: A Survey Adaptive Query Processing: A Survey Anastasios Gounaris, Norman W. Paton, Alvaro A.A. Fernandes, and Rizos Sakellariou Department of Computer Science, University of Manchester Oxford Road, Manchester M13

More information

Discovering Periodic Patterns in Database Audit Trails

Discovering Periodic Patterns in Database Audit Trails Vol.29 (DTA 2013), pp.365-371 http://dx.doi.org/10.14257/astl.2013.29.76 Discovering Periodic Patterns in Database Audit Trails Marcin Zimniak 1, Janusz R. Getta 2, and Wolfgang Benn 1 1 Faculty of Computer

More information

A Case for Merge Joins in Mediator Systems

A Case for Merge Joins in Mediator Systems A Case for Merge Joins in Mediator Systems Ramon Lawrence Kirk Hackert IDEA Lab, Department of Computer Science, University of Iowa Iowa City, IA, USA {ramon-lawrence, kirk-hackert}@uiowa.edu Abstract

More information

On transformation of query scheduling strategies in distributed and heterogeneous database systems

On transformation of query scheduling strategies in distributed and heterogeneous database systems University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2015 On transformation of query scheduling strategies

More information

A CORBA-based Multidatabase System - Panorama Project

A CORBA-based Multidatabase System - Panorama Project A CORBA-based Multidatabase System - Panorama Project Lou Qin-jian, Sarem Mudar, Li Rui-xuan, Xiao Wei-jun, Lu Zheng-ding, Chen Chuan-bo School of Computer Science and Technology, Huazhong University of

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

An agent-based peer-to-peer grid computing architecture

An agent-based peer-to-peer grid computing architecture University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 An agent-based peer-to-peer grid computing architecture J. Tang University

More information

An Adaptive Query Execution Engine for Data Integration

An Adaptive Query Execution Engine for Data Integration An Adaptive Query Execution Engine for Data Integration Zachary Ives, Daniela Florescu, Marc Friedman, Alon Levy, Daniel S. Weld University of Washington Presented by Peng Li@CS.UBC 1 Outline The Background

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS. Mengmeng Liu. Computer and Information Science. University of Pennsylvania.

AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS. Mengmeng Liu. Computer and Information Science. University of Pennsylvania. AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS Mengmeng Liu Computer and Information Science University of Pennsylvania WPE-II exam Janurary 28, 2 ASTRACT Traditional database query processors separate

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Final Review. May 9, 2018 May 11, 2018

Final Review. May 9, 2018 May 11, 2018 Final Review May 9, 2018 May 11, 2018 1 SQL 2 A Basic SQL Query (optional) keyword indicating that the answer should not contain duplicates SELECT [DISTINCT] target-list A list of attributes of relations

More information

Final Review. May 9, 2017

Final Review. May 9, 2017 Final Review May 9, 2017 1 SQL 2 A Basic SQL Query (optional) keyword indicating that the answer should not contain duplicates SELECT [DISTINCT] target-list A list of attributes of relations in relation-list

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 7 - Query execution References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

An Initial Study of Overheads of Eddies

An Initial Study of Overheads of Eddies An Initial Study of Overheads of Eddies Amol Deshpande University of California Berkeley, CA USA amol@cs.berkeley.edu Abstract An eddy [2] is a highly adaptive query processing operator that continuously

More information

Chapter 3. Algorithms for Query Processing and Optimization

Chapter 3. Algorithms for Query Processing and Optimization Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Chapter 25 Distributed Databases and Client-Server Architectures Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 25 Outline

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Exploiting Predicate-window Semantics over Data Streams

Exploiting Predicate-window Semantics over Data Streams Exploiting Predicate-window Semantics over Data Streams Thanaa M. Ghanem Walid G. Aref Ahmed K. Elmagarmid Department of Computer Sciences, Purdue University, West Lafayette, IN 47907-1398 {ghanemtm,aref,ake}@cs.purdue.edu

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

MAXIMIZED RESULT RATE JOIN ALGORITHM

MAXIMIZED RESULT RATE JOIN ALGORITHM MAXIMIZED RESULT RATE JOIN ALGORITHM 1 HEMALATHA GUNASEKARAN, 2 THANUSHKODI K 1 Research Scholar, Anna University, India 2 Director, Akshaya College of Engineering and Technology, India E-mail: 1 hemalatha2107@gmail.com,

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

Distributed Query Optimization: Use of mobile Agents Kodanda Kumar Melpadi

Distributed Query Optimization: Use of mobile Agents Kodanda Kumar Melpadi Distributed Query Optimization: Use of mobile Agents Kodanda Kumar Melpadi M.Tech (IT) GGS Indraprastha University Delhi mk_kumar_76@yahoo.com Abstract DDBS adds to the conventional centralized DBS some

More information

New Join Operator Definitions for Sensor Network Databases *

New Join Operator Definitions for Sensor Network Databases * Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 41 New Join Operator Definitions for Sensor Network Databases * Seungjae

More information

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management Outline n Introduction & architectural issues n Data distribution n Distributed query processing n Distributed query optimization n Distributed transactions & concurrency control n Distributed reliability

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Outline. Eddies: Continuously Adaptive Query Processing. What s the Problem? What s the Problem? Outline. Discussion 1

Outline. Eddies: Continuously Adaptive Query Processing. What s the Problem? What s the Problem? Outline. Discussion 1 : Continuously Adaptive Query Processing CPSC 504 Presentation Avnur, R. and Hellerstein, J. M. 2000. : continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD international Conference

More information

CSC A Hash-Based Approach for Computing the Transitive Closure of Database Relations. Farshad Fotouhi, Andrew Johnson, S.P.

CSC A Hash-Based Approach for Computing the Transitive Closure of Database Relations. Farshad Fotouhi, Andrew Johnson, S.P. CSC-90-001 A Hash-Based Approach for Computing the Transitive Closure of Database Relations Farshad Fotouhi, Andrew Johnson, S.P. Rana A Hash-Based Approach for Computing the Transitive Closure of Database

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Scalable Hybrid Search on Distributed Databases

Scalable Hybrid Search on Distributed Databases Scalable Hybrid Search on Distributed Databases Jungkee Kim 1,2 and Geoffrey Fox 2 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu, 2 Community

More information

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. Chapter 18 Strategies for Query Processing We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. 1 1. Translating SQL Queries into Relational Algebra and Other Operators - SQL is

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Query Optimization in Distributed Databases. Dilşat ABDULLAH

Query Optimization in Distributed Databases. Dilşat ABDULLAH Query Optimization in Distributed Databases Dilşat ABDULLAH 1302108 Department of Computer Engineering Middle East Technical University December 2003 ABSTRACT Query optimization refers to the process of

More information

Query Evaluation and Optimization

Query Evaluation and Optimization Query Evaluation and Optimization Jan Chomicki University at Buffalo Jan Chomicki () Query Evaluation and Optimization 1 / 21 Evaluating σ E (R) Jan Chomicki () Query Evaluation and Optimization 2 / 21

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Faster Join Query Results Using Novel Bucket Join Algorithm

Faster Join Query Results Using Novel Bucket Join Algorithm ISSN 2320-2602 Volume 2, No.8, August 2013 Neeti Chadha et al., International Journal Journal of Advances of Advances in Computer in Science Computer and Technology, Science 2(8), and August Technology

More information

Application of snapshot isolation protocol to concurrent processing of long transactions

Application of snapshot isolation protocol to concurrent processing of long transactions University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2006 Application of snapshot isolation protocol to concurrent processing

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 6 Lifecycle of a Query Plan 1 Announcements HW1 is due Thursday Projects proposals are due on Wednesday Office hour canceled

More information

New Bucket Join Algorithm for Faster Join Query Results

New Bucket Join Algorithm for Faster Join Query Results The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015 701 New Bucket Algorithm for Faster Query Results Hemalatha Gunasekaran 1 and ThanushkodiKeppana Gowder 2 1 Department Of

More information

FlowBack: Providing Backward Recovery for Workflow Management Systems

FlowBack: Providing Backward Recovery for Workflow Management Systems FlowBack: Providing Backward Recovery for Workflow Management Systems Bartek Kiepuszewski, Ralf Muhlberger, Maria E. Orlowska Distributed Systems Technology Centre Distributed Databases Unit ABSTRACT The

More information

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014 Distributed DBMS Advantages and disadvantages of distributed databases. Functions of DDBMS. Distributed database design. Distributed Database A logically interrelated collection of shared data (and a description

More information

Relational Model: History

Relational Model: History Relational Model: History Objectives of Relational Model: 1. Promote high degree of data independence 2. Eliminate redundancy, consistency, etc. problems 3. Enable proliferation of non-procedural DML s

More information

Incremental Evaluation of Sliding-Window Queries over Data Streams

Incremental Evaluation of Sliding-Window Queries over Data Streams Incremental Evaluation of Sliding-Window Queries over Data Streams Thanaa M. Ghanem 1 Moustafa A. Hammad 2 Mohamed F. Mokbel 3 Walid G. Aref 1 Ahmed K. Elmagarmid 1 1 Department of Computer Science, Purdue

More information

Eddies: Continuously Adaptive Query Processing. Jae Kyu Chun Feb. 17, 2003

Eddies: Continuously Adaptive Query Processing. Jae Kyu Chun Feb. 17, 2003 Eddies: Continuously Adaptive Query Processing Jae Kyu Chun Feb. 17, 2003 Query in Large Scale System Hardware and Workload Complexity heterogeneous hardware mix unpredictable hardware performance Data

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach +

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Abdullah Al-Hamdani, Gultekin Ozsoyoglu Electrical Engineering and Computer Science Dept, Case Western Reserve University,

More information

Online Integration of Semistructured Data

Online Integration of Semistructured Data University of Wollongong Research Online University of Wollongong Thesis Collection 2017+ University of Wollongong Thesis Collections 2017 Online Integration of Semistructured Data Handoko University of

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany

More information

Relational Model, Relational Algebra, and SQL

Relational Model, Relational Algebra, and SQL Relational Model, Relational Algebra, and SQL August 29, 2007 1 Relational Model Data model. constraints. Set of conceptual tools for describing of data, data semantics, data relationships, and data integrity

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in

More information

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection Outline Query Processing Overview Algorithms for basic operations Sorting Selection Join Projection Query optimization Heuristics Cost-based optimization 19 Estimate I/O Cost for Implementations Count

More information

A Finite State Mobile Agent Computation Model

A Finite State Mobile Agent Computation Model A Finite State Mobile Agent Computation Model Yong Liu, Congfu Xu, Zhaohui Wu, Weidong Chen, and Yunhe Pan College of Computer Science, Zhejiang University Hangzhou 310027, PR China Abstract In this paper,

More information

Computing Data Cubes Using Massively Parallel Processors

Computing Data Cubes Using Massively Parallel Processors Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li {luhj,huangxia,lizhixia}@iscs.nus.edu.sg Department of Information Systems and Computer Science National University

More information

1 Introduction 2. 2 A Simple Algorithm 2. 3 A Fast Algorithm 2

1 Introduction 2. 2 A Simple Algorithm 2. 3 A Fast Algorithm 2 Polyline Reduction David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy

More information

Architecting a Network Query Engine for Producing Partial Results

Architecting a Network Query Engine for Producing Partial Results Architecting a Network Query Engine for Producing Partial Results Jayavel Shanmugasundaram 1,2 Kristin Tufte 3 David DeWitt 1 Jeffrey Naughton 1 David Maier 3 jai@cs.wisc.edu, tufte@cse.ogi.edu, dewitt@cs.wisc.edu,

More information

On Generalizing Rough Set Theory

On Generalizing Rough Set Theory On Generalizing Rough Set Theory Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract. This paper summarizes various formulations

More information

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo

More information

Optimization of Queries in Distributed Database Management System

Optimization of Queries in Distributed Database Management System Optimization of Queries in Distributed Database Management System Bhagvant Institute of Technology, Muzaffarnagar Abstract The query optimizer is widely considered to be the most important component of

More information

XQuery Optimization Based on Rewriting

XQuery Optimization Based on Rewriting XQuery Optimization Based on Rewriting Maxim Grinev Moscow State University Vorob evy Gory, Moscow 119992, Russia maxim@grinev.net Abstract This paper briefly describes major results of the author s dissertation

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS

A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS Alberto Pan, Paula Montoto and Anastasio Molano Denodo Technologies, Almirante Fco. Moreno 5 B, 28040 Madrid, Spain Email: apan@denodo.com,

More information

Petri-net-based Workflow Management Software

Petri-net-based Workflow Management Software Petri-net-based Workflow Management Software W.M.P. van der Aalst Department of Mathematics and Computing Science, Eindhoven University of Technology, P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands,

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset

More information

Towards a formal model of object-oriented hyperslices

Towards a formal model of object-oriented hyperslices Towards a formal model of object-oriented hyperslices Torsten Nelson, Donald Cowan, Paulo Alencar Computer Systems Group, University of Waterloo {torsten,dcowan,alencar}@csg.uwaterloo.ca Abstract This

More information

A New Framework For Query Optimization In Multidatabase System Environment

A New Framework For Query Optimization In Multidatabase System Environment A New Framework For Query Optimization In Multidatabase System Environment Mostafa M. Syiam Faculty of Computer Science & Information system, Ain Shams University, Egypt ABSTRACT H. A. Ali Computers &

More information

I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications,

I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications, I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications, Proc. of the International Conference on Knowledge Management

More information

A Framework for Enforcing Constrained RBAC Policies

A Framework for Enforcing Constrained RBAC Policies A Framework for Enforcing Constrained RBAC Policies Jason Crampton Information Security Group Royal Holloway, University of London jason.crampton@rhul.ac.uk Hemanth Khambhammettu Information Security Group

More information

3. Relational Data Model 3.5 The Tuple Relational Calculus

3. Relational Data Model 3.5 The Tuple Relational Calculus 3. Relational Data Model 3.5 The Tuple Relational Calculus forall quantification Syntax: t R(P(t)) semantics: for all tuples t in relation R, P(t) has to be fulfilled example query: Determine all students

More information

On some heuristic method for optimal database workload reconstruction

On some heuristic method for optimal database workload reconstruction On some heuristic method for optimal database workload reconstruction Marcin Zimniak 1, Marta Burzańska 2, and Bogdan Franczyk 1 1 Information Systems Institute Leipzig University, Germany {zimniak,franczyk}@wifa.uni-leipzig.de

More information

References. 6. Conclusions

References. 6. Conclusions insert((1, 2), R 1 ). Suppose further two local updates insert((2, 5), R 2 ) and delete((5, 6), R 3 ) occurred before the maintenance sub-queries for insert((1, 2), R 1 ) are evaluated by S 2 and S 3,

More information

Universal Timestamp-Scheduling for Real-Time Networks. Abstract

Universal Timestamp-Scheduling for Real-Time Networks. Abstract Universal Timestamp-Scheduling for Real-Time Networks Jorge A. Cobb Department of Computer Science Mail Station EC 31 The University of Texas at Dallas Richardson, TX 75083-0688 jcobb@utdallas.edu Abstract

More information

Protocols for Integrity Constraint Checking in Federated Databases *

Protocols for Integrity Constraint Checking in Federated Databases * Distributed and Parallel Databases, 5, 327 355 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Protocols for Integrity Constraint Checking in Federated Databases * PAUL GREFEN

More information

Data Flow Graph Partitioning Schemes

Data Flow Graph Partitioning Schemes Data Flow Graph Partitioning Schemes Avanti Nadgir and Harshal Haridas Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802 Abstract: The

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information

Mobile and Heterogeneous databases

Mobile and Heterogeneous databases Mobile and Heterogeneous databases Heterogeneous Distributed Databases Query Processing A.R. Hurson Computer Science Missouri Science & Technology 1 Note, this unit will be covered in two lectures. In

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

FREddies: DHT-Based Adaptive Query Processing via FedeRated Eddies

FREddies: DHT-Based Adaptive Query Processing via FedeRated Eddies FREddies: DHT-Based Adaptive Query Processing via FedeRated Eddies Ryan Huebsch and Shawn R. Jeffery EECS Computer Science Division, UC Berkeley {huebsch, jeffery}@cs.berkeley.edu Report No. UCB/CSD-4-1339

More information

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation! Chapter 3: Formal Relational Query Languages CS425 Fall 2013 Boris Glavic Chapter 3: Formal Relational Query Languages Relational Algebra Tuple Relational Calculus Domain Relational Calculus Textbook:

More information

An Efficient Ranking Algorithm of t-ary Trees in Gray-code Order

An Efficient Ranking Algorithm of t-ary Trees in Gray-code Order The 9th Workshop on Combinatorial Mathematics and Computation Theory An Efficient Ranking Algorithm of t-ary Trees in Gray-code Order Ro Yu Wu Jou Ming Chang, An Hang Chen Chun Liang Liu Department of

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Apache Flink. Alessandro Margara

Apache Flink. Alessandro Margara Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14: Query Optimization Chapter 14 Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, 00142 Roma, Italy e-mail: pimassol@istat.it 1. Introduction Questions can be usually asked following specific

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

An Optimization of Disjunctive Queries : Union-Pushdown *

An Optimization of Disjunctive Queries : Union-Pushdown * An Optimization of Disjunctive Queries : Union-Pushdown * Jae-young hang Sang-goo Lee Department of omputer Science Seoul National University Shilim-dong, San 56-1, Seoul, Korea 151-742 {jychang, sglee}@mercury.snu.ac.kr

More information