On Adaptive and Online Data Integration
|
|
- Alban Strickland
- 5 years ago
- Views:
Transcription
1 University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 On Adaptive and Online Data Integration J. R. Getta University of Wollongong, jrg@uow.edu.au Publication Details This paper was originally published as: Getta, JR, On Adaptive and Online Data Integration, 21st International Conference on Data Engineering Workshops, 5-8 April 2005, Copyright 2005 IEEE. Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: research-pubs@uow.edu.au
2 On Adaptive and Online Data Integration Abstract The recent works on integration of large database systems distributed over wide-area networks concentrate on the adaptive and online techniques. Online property of data integration means continuous integration of transmitted data with the already available results. Adaptivity materializes in a form of dynamic adjustments to the data integration plans in a response to the recent characteristics of data transmission. Implementation of adaptive and online data integration needs the specialized systems of operations and transformations of integration plans. This paper describes a new class of elementary operations on increments and/or decrements of data and shows how to express data integration plans as sequences of elementary operations. We demonstrate that class of operations proposed in the paper is sufficient for implementation of online and adaptive data integration systems and we discuss the operational properties of such systems. Disciplines Physical Sciences and Mathematics Publication Details This paper was originally published as: Getta, JR, On Adaptive and Online Data Integration, 21st International Conference on Data Engineering Workshops, 5-8 April 2005, Copyright 2005 IEEE. This conference paper is available at Research Online:
3 On Adaptive and Online Data Integration Janusz R. Getta School of Information Technology and Computer Science University of Wollongong Wollongong, NSW 2522, Australia Abstract The recent works on integration of large database systems distributed over wide-area networks concentrate on the adaptive and online techniques. Online property of data integration means continuous integration of transmitted data with the already available results. Adaptivity materializes in a form of dynamic adjustments to the data integration plans in a response to the recent characteristics of data transmission. Implementation of adaptive and online data integration needs the specialized systems of operations and transformations of integration plans. This paper describes a new class of elementary operations on increments and/or decrements of data and shows how to express data integration plans as sequences of elementary operations. We demonstrate that class of operations proposed in the paper is sufficient for implementation of online and adaptive data integration systems and we discuss the operational properties of such systems. 1. Introduction Advances in the technologies of persistent storage and wide-area networks allow for the relatively inexpensive implementations of unified and integrated views of data located at the remote and heterogeneous database systems. A central problem in the development of such systems is ad hoc integration of data transmitted over the networks. Efficiency of data integration depends on the advanced algorithms for merging the partial results of queries computed at the remote database sites. The recent trends in data integration lead towards online and adaptive algorithms. Online algorithms [7] process the incomplete sets of input data and continuously improve the solutions while the new data items are available for processing and the old data items are discarded. A typical example of an online algorithm is a virtual memory manager that operates on a window of theoretically unlimited sequence of tasks. Adaptive algorithms adjust their integration strategies to the external events, e.g. an arrival of a new packet of data or completion of transmission from a particular site. It is anticipated that data integration will soon emerge as an autonomous research area from the distributed computing and financial data processing triggered by the freely available distributed data sets and fast wide-area networks [8], [23]. Data integration has its roots in the processing of queries in the distributed and heterogeneous database systems, often called as multidatabase or federated database systems [25, 22]. The unpredictable behavior of data transmission systems and strong autonomy of remote database systems make the precise estimation of subquery processing time hard and imprecise. This is where the reactive query processing techniques show superiority over the classical proactive techniques commonly used for query processing in distributed database systems [1]. The early data integration systems looked for the solutions in the partitioning [6, 19] and dynamic modification of query processing plans [5, 10, 9]. Partitioning means that query execution plan is divided into subplans at a point when the further computations are no longer possible due to lack of data. Dynamic modification technique finds a plan equivalent to the original one plan and such that it can be partially computed with the available sets of data. Another group of ideas addresses the optimization of individual elementary operations used for data integration. The specialzed operations include the pipelined join operator XJoin [26], ripple join [14], double pipelined join [16], and hash-merge join [21]. The approaches based on scheduling change an order in which the operations are executed while preserving the semantics of data integration plan. The scheduling based techniques include query scrambling [28, 1] and dynamic scheduling of operators [27]. The techniques based on the redundant computations simultaneously execute a number of data integration plans leaving the plan that that provides the most advanced results [2].
4 The solutions based on data partitioning integrate different components of integrated arguments accordingly to different plans. The Eddies are able to process each tuple accordingly to a different plan [3]. A concept of state modules described in [24] allows for concurrent processing of the tuples and dynamically divides data integration task among different plans and executes the plans sequentially or in parallel. Adaptive data partitioning [17] technique processes different partitions of the same argument using different data integration plans. The recently developed data stream processing processing techniques [20, 11] also contribute to online data integration, e.g.. The works [4, 13, 15, 18] review the major solutions proposed so far. A more up-to-date and more detailed overview of the past works on adaptive data integration can be found in [12]. The approaches listed above adopt the relational model as a target data integration model and express the integration plan in the language of relational algebra. Majority of the works is limited to the plans exclusively formed from join operations and use dynamic query transformation and query scrambling techniques to migrate from one integration plan to another. The works on adaptive data partitioning [17] and optimizations of data stream processing [11] are the first attempts to use the associativity of join operation to integrate the different partitions of the same arguments accordingly to the different integration plans. It seems to us that relational algebra in its standard form is not the best language to describe the processes of online and adaptive data integration and that we need a new system of more elementary operations. The basic idea behind the online and adaptive computations is to restart the computations each time the processing of recently arrived data is possible and to reformulate an integration plan each time it is blocked by missing data. A data integrator processes a bit, waits, again processes a bit, again waits, and from time to time it adjusts a plan to the available data. A typical feature of online integration is that it never operates on a complete set of data. When the relational model is applied as a target integration mode, a data integrator must operate on the increments and decrements of relational tables and already integrated contents of the remaining relational tables. An increment is a collection of the most recently arrived and not yet processed packets of data. The decrements are created by non-monotonic operations like set difference operation where an increment of right hand side argument of the operation produces a decrement of the previous result of the set difference. As a consequence, the elementary operations of online data integrator should process the increments and/or decrements against the fixed size relational tables. Then, a data integration plan is a sequence of elementary operations whose arguments are the modifications of data containers and other data containers. The results of one elementary operation are passed to the next operation in a sequence. Adaptability of the system is achieved through a collection of rules that transform the plans blocked by unavailable data into the equivalent ones whose further execution is possible. The main objective of this work is to propose a system of elementary operations for online and adaptive integration of data and to show how such system can be applied in practice. In particular, we show that it is possible to derive such a system from a given collection of base operations, i.e. the operations on data containers like for instance relational algebra operations, or aggregation operations. Then, we define a data integration plan as a collection of local integration plans formed from the sequences of elementary operations and we discuss the plan transformations rules needed for the implementation of adaptive features of a sample data integration system. The paper is organized in the following way. Section 2 describes a data integration model used throughout the paper. The system of elementary operations and data flow expressions are defined in the Sections 3 and 4. Section 5 shows how the formal data integration model proposed in the previous sections can be used in implementation of a sample data integration system. Section 6 summarizes and concludes the paper. 2. Data integration model Consider a distributed multidatabase system that integrates a number of remote and heterogeneous database systems such that remote database sites are entirely transparent at a central site. A middleware that integrates the databases provides the users with a single view of a homogeneous database. Then, a query q(r 1,...,r k ) on a subset r 1,...,r k of the view is decomposed into k subqueries q r1,...,q rk that encapsulate the computations performed at the remote systems. Two generic strategies of distributed query processing either optimize an overall amount of time spend on the computations or optimize the total amount of data transmitted over a network. Query processing time is minimized when the queries q r1,...,q rk are submitted and processed simultaneously at the remote sites. Processing of subqueries one at a time and applying the results of one subquery to modify the remaining subqueries minimizes the amounts of transmitted data. The entire continuum of hybrid strategies is contained between these two extremes. Selection of the best strategy is a hard problem and it is beyond a scope of this paper. We adopt a strategy that minimizes query processing time through the simultaneous computations at the remote database sites. The results obtained from the remote sites are transmitted back to the central site. Next, the results are transformed into the containers r 1,...,r k structurally con-
5 sistent with a data model at the central site, i.e. into the relational tables. Finally, the results are integrated into the final answer accordingly to a global data integration plan P(r 1,...,r k ) derived from the original query q and built from the base operations on the data containers e.g. the relational algebra operations on the relational tables. A simple and rather ineffective approach would be to delay the integration until all partial results are fully transmitted to the central site. Contrary, an impatient approach that wakes up a data integrator each time a new packet of data arrives, would need too much time spent on the organizational aspects of the process. In this work we consider a strategy where a data integrator wakes up at the fixed intervals of time and starts integration only if there is enough data transmitted since the last integration cycle. If it is so, the recently arrived packets of data are integrated with the already available results. Such approach invalidates an idea of single global data integration plan because it may happen that partial results required to follow the plan are unavailable at the moment. On the other hand a global plan cannot be completely rejected because it represents the semantics of a database application. A solution is to transform the global plan into a set of local plans describing the actions performed when a new increment of data should be integrated with the already available partial results. The actions are expressed as elementary operations on the increments and/or decrements of data containers and other static data containers. The local integration plans plans are expressed as the sequences of elementary operations. 3. Elementary operations Let r and s be data containers, e.g. relational tables. A base operation A(r, s) is an operation whose arguments are data containers and result of the operation is a data container as well. A modification δ r of a data container r is a pair of containers <δr, δ r + > such that both elements of the pair have have the same structure (schema) as r. The first element δr of the pair represents the data items that should be removed from r to implement the first stage of the modification. The second element δ r + of the pair represents the data items that should be added to r to implement the second stage of the modification. An operation that integrates a container r with a modification δ r = <δr,δ r + > is denoted by r δ r and it is called as data integration operation. In the relational model a data integration operation is defined by an expression (r δr ) δ r +. An incremental/decremental operation (id-operation ) for the first argument r of a base operation A(r, s) is denoted by α A (δ r,s) and its result is a pair of the smallest and disjoint sets <δα,δ α + > that should be integrated with the result of A(r, s) to obtain the result of A((r δ r ),s) i.e. A(r, s) α A (δ r,s)=a((r δ r ),s) (1) An incremental/decremental operation (id-operation ) for the second argument s of a base operation A(r, s) is denoted by β A (r, δ s ) and its result is a pair of the smallest and disjoint sets <δ β,δ+ β > that should be integrated with the result of A(r, s) to obtain the result of A(r, (s δ s )), i.e. A(r, s) β A (r, δ s )=A(r, (s δ s )) (2) A base operation A(r, s) always has two id-operations α A (δ r,s) and β A (r, δ s ), one for processing δ r and other one for processing δ s. If a base operation is commutative then its id-operations are the same. If a base operation A(r, s) is monotonic for an argument r, i.e. A(r, s) A(r δ r,s) then a negative component of modification computed by α A (δ r,s) is always empty. Id-operations process the modifications of data containers and produce the modifications that can be integrated with the previous results of the respective base operation to obtain the new results of the base operation without its full re-computation. This is what is precisely needed for data integration. A modification of an argument in a global data integration plan is processed by an appropriate idoperation. The id-operation produces a modification which is processed by the next id-operation and so on until the final modification is integrated with the previous partial answer to provide a new partial answer. An interesting problem is how to find id-operations for a given base operation. If for a particular system of the base operations and data integration operation it is possible to express A((r δ r ),s) as a combination of an old result of base operation A(r, s) and modification δ r then it is possible to find the respective id-operations as the smallest solutions of the equations (1) and (2). In this paper we consider the relational model with the base operations of union ( ), join ( ), and antijoin ( ) and data integration operation operation defined as r δ r =(r δ r ) δ + r. We ignore the unary operations of selection (σ) and projection (π) as they can always be attached to the inputs or outputs of the binary operations. To solve the equation (1) we have to separately consider the negative and positive components of δ r and data integration operation. It leads to the equations: A(r, s) α(δ r,s)=a(r δ r,s) (3) A(r, s) α(δ r +,s)=a(r δ r +,s) (4) We are looking for the smallest solutions of the equations (3) and (4). The first equation is of type A x = A B where A, B, x are sets. The find the smallest solution we transform the equation into an equivalent fixed point equation x = x ((A x) (A B)) ((A B) (A x)). The solution of the fixed point equation is obtained
6 through a sequence of iterations starting from x =. In the second iteration the fixed point reached and it is equal x = A B. Hence, the solution of equation (3) is α(δr,s)=a(r, s) δr. For example if A(r, s) =r s then α(δr,s) = (r s) δr =. Note, that if δr denotes the rows removed from r then δr r. Finally, we get α(δr,s)=δr s. It is possible to derive in the same way all id-operation for the remaining base operations. Id-operations for the arguments of join ( ) are defined as follows: α (δ r,s)=< (δ r s), (δ + r s) > (5) β (r, δ s )=< (δs r), (δ s + r) > (6) Id-operations for the arguments of antijoin ( ) are defined as follows: α (δ r,s)=< (δ r s), (δ + r s) > (7) β (r, δ s )=< (r δ s + ), (r δs ) > (8) Finally, id-operations for the arguments of union ( ) are defined as follows: α (δ r,s)=< (δ r s), (δ + r s) > (9) β (r, δ s )=< (δs r), (δ s + r) > (10) As a sample application of id-operations, consider a global data integration plan q(r, s, t) =t (r s) and modification δ s = <, δ s + > of an argument s. Then, (8) and (5) contribute to a formula for processing δ s. Application of β to <, δ s + > provides <r δ s +, >. Next, application of α to the previous result provides < (r δ s + ) t, > Finally, the modifications should be integrated with the partial result of q as follows q := q (r δ s + ) t. A formula for processing the modifications of r can be derived in a similar way using (7) and (5) q := q (δ r + s) t. Processing the modifications of argument t requires the transformation of q(r, s, t) into an equivalent expression (t r) s. Then, application of (5) and (7) provides q := q (δ t + r) s. A problem what to do when the transformation performed above is impossible is discussed in the next sections. As another example consider a system of operation F = {agg, } where is a set union operation and agg is defined as follows. The operation agg x,a (r, s) replaces the second argument s with the result of SQL statement: SELECT x, sum(a) FROM r GROUP BY x; The id-operations of are the same as in the previous system. An id-operation α agg (δ r,s) combines δ r + with s in the following way. for all t δ r + if there exists t s such that t.x = t.x then replace t with t.a := t a + t.a; insert old t into δ agg and insert new t into δ + agg; else add t to s and add t to δ + agg; end if; for all t δ r if there exists t s such that t.x = t.x then replace t with t.a := t a t.a; insert old t into δ agg and insert new t into δ + agg; Finally, an id-operation β agg (r, δ s )=<δ s,δ + s >. 4. Data flow expressions A data flow expression is a sequence r 0 :α 1 (r 1 )...α n (r n ) where r 0 is a data container and each α i (r i ), i =1,...,n is either an abbreviation of id-operation α(δ rj,r i ) or abbreviation of data integration operation δ rj r i. The adjacent id-operations in a data flow expression are connected such that modification generated by α i is used as an argument δ αi of its successor α i+1. The evaluation of an expression starts from the first id-operation α 1 (δ r0,r 1 ).A modification δ α1 produced by the first id-operation becomes an argument of the next id-operation α 2 (δ α1,r 2 ). For example, r:α (s)α (t) (w) is a data flow expression where a modification δ r of argument r is joined with s. Then, t is deducted from the results of the join, and the results of the difference are integrated with w. A data flow expression related to an argument r i of an expression E(r 1,...,r i,...,r n ) is constructed through traversal of a syntax tree of E from a leaf node labeled with r i to the root node. Initially, at a leaf node r i, we start from an empty expression r i :. Next, we move one level up to a base operation operation A(E 1,E 2 ) where E 1 and E 2 are subexpressions (subtrees in a syntax tree) bound with a base operation A. If a subexpression E 1 is on the path being traversed then we append id-operation α A (w E2 ) to the data flow expression expression. Otherwise, if E 2 is on the path being traversed then we append β A (w E1 ) to the expression. Next, we move one level up to the next base operation and we repeat the actions listed above. At the end when all paths from the leaf nodes to the root node are traversed and data flow expression generated then we insert into the expressions data integration operations that produce the intermediate results. For example, application of the procedure described above to a relational algebra expression r (s b t) provides the following data flow expressions: r: α (w st ) (w) s: α (t) (w st ) β (r) (w)
7 t: β (s) (w st ) β (r) (w) Data flow expressions represent the sequences of operations performed on the recently arrived modifications at a data integration stage. Like in the traditional query processing, optimization of data integration expressions is performed through the transformations of data flow expressions. One group of transformations moves the most restrictive id-operations towards the left hand side of an expression in order to eliminate at the early stages of data integration as many data items as it is possible. The other group removes the intermediate data containers created and modified during the integration in order to reduce the total number of operations on persistent storage. Consider a data flow expression p which contains two adjacent id-operations α A (r i ) α B (r j ). A data flow expression p obtained from p by the order of id-operations α B (r i ) iα A (r j ) is equivalent to p if the respective base operations are associative, i.e. B(A(r, s),t)=a((b(r, t),s). Associativity of adjacent operations allows for the elimination of intermediate data containers. As an example consider the following system of data flow expressions. r: α A (s) (w rs ) α B (t) (w) s: β A (r) (w rs ) α B (t) (w) t: β B (w rs ) (w) where w rs is always equal to the result of A(r, s). Hence, the third data flow expression can be expressed as t: β B (A(r, s)). It is equivalent to two relational algebra expressions β B (A(r, s),δ t ) and β + B (A(r, s),δ+ t ). If the base operations A and B are associative then the expressions can be transformed into A(β B (r, δ t ),s) and A(β + B (r, δ+ t ),s). Taking the expressions together and replacing a base operation A with an id-operation α a we obtain α A (β B (r, δ t ),s) and in the consequence a data flow expression t: β B (r) α A (s) (w). Now, a temporary container w rs can be removed from the remaining dataflow expressions: r: α A (s) α B (t) (w) s: β A (r) α B (t) (w) 5. Data integration Let r 1,...,r k be the results of k subqueries q 1,...,q k computed at the remote database sites and transmitted to the central site. A global data integration plan P(r 1,...,r k ) is an expression build over the data containers r 1,...,r k and the base operations, e.g. relational algebra operations. In the traditional approaches data integration is delayed until the arguments bound by the base operations in P are available at the central site. Adaptive and incremental strategies allows for data integration while the arguments are still transmitted over a network. Implementation of incremental strategy needs the translation of a global integration plan into a set of local integration plans. A set of local integration plans for P(r 1,...,r k ) is equivalent to set of data flow expressions {p 1,...p k } where each p i represents a way how the increments of an argument r i are integrated with the intermediate results. An individual data integration plan p i is a sequence of id-operations performed by the system in order to process an increment δ ri. Consider a logical data intgeration plan (r s) t. An incremental integration strategy transforms the plan into the following individual data integration plans: r: α (s) α (t) (w) s:α (r) α (t) (w) t:α (s) α (r) (w) In another example elimination of union operation from a logical data integration expression r(ab) (s(ab) t(ab)) leads to expression with two occurrences of an argument r, i.e. (r(ab) s(ab)) (r(ab) t(ab)). Then an individual integration plan for an argument r consists of two data flow expressions: r : α (t) (v rt ) r : α (s) (v rs ) α (v rt ) (w) The remaining individual integration plans are as follows s: β (r) (v rs ) α (v rt ) (w) t: β (r) (v rt ) α (v rs ) (w) A global data integration plan P implemented as a set of local data integration plans allows for a correct and adaptive integration of the partial results. The local data integration plans are created such that each argument of the respective logical data integration expression gets its local plan. If, like in the example above, the same argument used used more than one time then swe get more than one plan as well. All plans associated with a given argument are activated when an increment of the argument has to be processed. Each local plan is a data flow expression constructed and optimized in a way described in the previous section. A process of incremental and adaptive data integration wakes up at the regular intervals of time, verifies the amounts of data transmitted since the last integration, and if there is enough data, prepares and implements the local integration plans. An algorithm that constructs the data flow expressions from a global data integration plan P is used to formulate a set of initial local integration plans. Next, the optimizations of the data flow expressions described in the previous section move the most selective operations towards the begining of each local plan and try to eliminate the integrations with the intermediate results. The optimization of the local plans assumes the most optimistic case of the initial availability and continuous transmissions of all arguments. In the reality the initializations of transmissions are frequently delayed or the transmissions cannot be completed for a longer period of time. This is why some the local plans have to be either suspended or reduced to the id-operations that can be
8 executed in a given moment of time followed by the integrations with the temporary data containers. The first run of the data integrator transforms the local plans obtained from the optimizer in way that takes under the consideration availability of the arguments and optimal integration of the available data. Each next invocation, adjusts the plans used in the pervious run to reflect the availability of the new arguments. When all arguments are partially available at the central site the local plans return to their optimized form. The run time transformations of local plans include the addition and elimination of integrations with the temporary data containers, elimination of subexpressions that can be totally evaluated and replaced with a constant data container, changing the order elimination of the local plans. Addition of the integration with a temporary data container is need when the computations of a plan r :α 1 (r 1 ),...,α i 1 (r i 1 )α i (r i ),... cannot be completed because a container r i is not available at the moment. Then, the plan is computed partially and integration with an intermediate container v i is inserted in front of α i in the following way r :α 1 (r 1 ),..., (v i )α i (r i ),... Moreover, a sequence of id-operations α 1 (r 1 ),...,α i 1 (r i 1 ) is replaced with β i (v i ) in all other local plans. A temporary data container is removed from the local plan r when an argument r i is not empty. Then, (v i ) is removed from the plan and β i (v i ) is replaced with the original sequence of operations in all other plans wherever it occurs. When a data integrator is invoked for the first time then some of the transmissions from the remote sites may already be completed. If both arguments of a base operation in a global integration plan are available then such operation can be computed in a traditional way and its results can be incorporated as a constant argument into the plan. Consider the local plans r i :α A (r j ),α B (r k )... and r j :β A (r i ),α B (r k )... and assume that both r i and r j are available for integration. Then, the respective base operation A(r i,r j ) is computed and its result r ij obtains a new local integration plan r ij :α B (r k )... and the plans r i and r j are removed. In all other plans a sequence α A (r j )α B (r k ) is replaced with β B (r ij ). Elimination of subexpression in a way described above is possible only if completely unavailable argument at one stage of integration is totally available. What if in the same situation transmission of some of the arguments is completed but no base operations can be computed? Consider a plan r i :α A (r j )α B (r k )... and assume that transmission of data container r i is completed. Then, a status of r i is changed to ready and its plan r i is removed from a set of local plans. Each of the arguments involved in data integration has its status recorded and maintained by the system. At the very beginning of data integration all arguments obtain a status missed active ready idle Figure 1. The transitions of argument states missing. Next, when an argument arrives and its transmission is completed the status changes to ready. If only a part of argument arrives its status is active and after the part is integrated a status changes to idle. The state transitions given in Figure 1 occur when a data integrator completes an integration cycle. When the data integrator wakes up for the first time the only local integration plans are those directly constructed and optimized from a global plan. First, data integrator considers the arguments that changed their status from missing to ready. The subexpressions of a global integration plan are computed in a way described above. The local plans for the arguments that that have status ready are removed from a set of local plans. Next, data integrator considers the arguments that changed their status from missing to active, i.e. only some of the components of these arguments have arrived. The local plans related to these arguments are computed as far as it is possible and whenever the computations do not reach integration with the final results then integration with a temporary relational table is performed, inserted into the plan, and the related local plans are modified in a way described above. No other state transitions are possible at the first integration stage. When the data integrator wakes up on any other time than the first time any transition of the argument states is possible. First, the data integrator considers the arguments that changed their status from missing to ready. The local integration plans for these arguments are removed from a set of local plans and the related plans are modified in a way described above. Next, the data integrator considers the arguments whose status has changed from active to ready. The local plans for these arguments are computed as far as possible and then the plans are removed from a set of local plans. Next, the data integrator considers the arguments that changed their status from missing to active. The local plans for these arguments are computed as far as possible and whenever the computations do not reach reach integration with the final results then integration with a temporary table is performed, inserted into the plan and the related plans are updated in a way described above. Whenever an argument is used in the computations then its plan
9 is made inactive for this cycle. If the computation of a local plan use a temporary relational table created earlier then the temporary table is removed from the plan and all other plans are updated in a way described above. Next, the data integrator considers the arguments whose status remained active and whose local plans have not been deactivated in this cycle These arguments are processed in the same way as above when a status have changed from missing to active. In all other cases, the integrator remains idle. 6. Summary and future work This paper considers the online and adaptive integration of large data sets distributed over the wide-area networks. We argue that traditional approach where the global integration plans are expressed as the relational algebra expressions is not appropriate to precisely describe the integration processes at a level where the individual packets of data are assembled into the final results. In contrast, we define a concept of id-operation as an elementary operation on the modifications (increments and/or decrements) of data containers and the partial results. Next, we show how to construct a data integration plan as a collection of data flow expressions composed of id-operations and data integeration operations. Finally, we describe the operational principles of a sample system capable of online and adaptive data integration. A number of interesting problems remains to be solved. These include a wider system of id-operations, investigations of the properties of dataflow algebra and further investigations on more advanced data integration algorithms References [1] L. Amsaleg, J. Franklin, and A. Tomasic. Dynamic query operator scheduling for wide-area remote access. Journal of Distributed and Parallel Databases, 6: , [2] G. Antoshenkov and M. Ziauddin. Query processing and optmization in oracle rdb. VLDB Journal, 5(4): , [3] R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages , [4] L. Bouganim, F. Fabret, and C. Mohan. A dynamic query processing architecture for data integration systems. Bulletin of the Technical Committee on Data Engineering, 23(2):42 48, June [5] J. Chudziak and J. R. Getta. On efficient query evaluation in multidatabase systems. In Second International Workshop on Advances in Database and Information Systems, ADBIS 95, pages 46 54, [6] R. L. Cole and G. Graefe. Optimization of dynamic query evaluation plans. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, [7] A. Fiat and G. J. Woeginger. On Line Algorithms, The State of the Art. Springer Verlag, [8] I. Foster and R. L. Grossman. Data integration in a bandwidth-rich world. Communications of the ACM, 46(11):51 57, November [9] J. R. Getta. Query scrambling in distributed multidatabase systems. In 11th Intl. Workshop on Database and Expert Systems Applications, DEXA 2000, [10] J. R. Getta and S. Sedighi. Optimizing global query processing plans in heterogeneous and distributed multi database systems. In 10th Intl. Workshop on Database and Expert Systems Applications, DEXA 1999, pages 12 16, [11] J. R. Getta and E. Vossough. Optimization of data stream processing. SIGMOD record, 33(3):34 39, [12] A. Gounaris, N. W. Paton, A. A. Fernandes, and R. Sakellariou. Adaptive query processing: A survey. In Proceedings of 19th British National Conference on Databases, pages 11 25, [13] G. Graefe. Dynamic query evaluation plans: Some course corrections? Bulletin of the Technical Committee on Data Engineering, 23(2):3 6, June [14] P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD 1999, Proceedings ACM SIGMOD Intl. Conf. on Management of Data, pages , [15] J. M. Hellerstein, M. J. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, and M. A. Shah. Adaptive query processing: Technology in evolution. Bulletin of the Technical Committee on Data Engineering, 23(2):7 18, June [16] Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages , [17] Z. G. Ives, A. Y. Halevy, and D. S. Weld. Adapting to source properties in processing data integration queries. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, [18] Z. G. Ives, A. Y. Levy, D. S. Weld, D. Florescu, and M. Friedman. Adaptive query processing for internet applications. Bulletin of the Technical Committee on Data Engineering, 23(2):19 26, June [19] N. Kabra and D. J. DeWitt. Efficient mid-query reoptimization of sub-optimal query execution plans. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, [20] S. Madden, M. A. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In Proceedings of the 2002 ACM SIGMOD Intl. Conf. on Management of Data, [21] M. F. Mokbel, M. Lu, and W. G. Aref. Hash-merge join: A non-blocking join algorithm for producing fast and early join results, [22] F. Ozcan, S. Nural, P. Koksal, C. Evrendilek, and A. Dogac. Dynamic query optimization in multidatabases. Bulletin of the Technical Committee on Data Engineering, 20:38 45, March 1997.
10 [23] A. Pan and A. Vina. An alternative architecture for financial data integration. Communications of the ACM, 47(5):37 40, May [24] V. Raman, A. Deshpande, and J. M. Hellerstein. Using state modules for adaptive query processing. In Proceeding of International Conference on Management of Data, [25] V. Srinivasan and M. J. Carey. Compensation-based on-line query processing. In Proceedings of the 1992 ACM SIG- MOD International Conference on Management of Data, pages , [26] T. Urhan and M. J. Franklin. Xjoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin 23(2), pages 27 33, [27] T. Urhan and M. J. Franklin. Dynamic pipeline scheduling for improving interactive performance of online queries. In Proceedings of International Conference on Very Large Databases, VLDB 2001, [28] T. Urhan, M. J. Franklin, and L. Amsaleg. Cost based query scrambling for initial delays. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA, pages , 1998.
Optimization of task processing schedules in distributed information systems
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2011 Optimization of task processing schedules in distributed information
More informationDiscovering Periodic Patterns in System Logs
Discovering Periodic Patterns in System Logs Marcin Zimniak 1, Janusz R. Getta 2, and Wolfgang Benn 1 1 Faculty of Computer Science, TU Chemnitz, Germany {marcin.zimniak,benn}@cs.tu-chemnitz.de 2 School
More informationAdaptive Query Processing: A Survey
Adaptive Query Processing: A Survey Anastasios Gounaris, Norman W. Paton, Alvaro A.A. Fernandes, and Rizos Sakellariou Department of Computer Science, University of Manchester Oxford Road, Manchester M13
More informationDiscovering Periodic Patterns in Database Audit Trails
Vol.29 (DTA 2013), pp.365-371 http://dx.doi.org/10.14257/astl.2013.29.76 Discovering Periodic Patterns in Database Audit Trails Marcin Zimniak 1, Janusz R. Getta 2, and Wolfgang Benn 1 1 Faculty of Computer
More informationA Case for Merge Joins in Mediator Systems
A Case for Merge Joins in Mediator Systems Ramon Lawrence Kirk Hackert IDEA Lab, Department of Computer Science, University of Iowa Iowa City, IA, USA {ramon-lawrence, kirk-hackert}@uiowa.edu Abstract
More informationOn transformation of query scheduling strategies in distributed and heterogeneous database systems
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2015 On transformation of query scheduling strategies
More informationA CORBA-based Multidatabase System - Panorama Project
A CORBA-based Multidatabase System - Panorama Project Lou Qin-jian, Sarem Mudar, Li Rui-xuan, Xiao Wei-jun, Lu Zheng-ding, Chen Chuan-bo School of Computer Science and Technology, Huazhong University of
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationAn agent-based peer-to-peer grid computing architecture
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 An agent-based peer-to-peer grid computing architecture J. Tang University
More informationAn Adaptive Query Execution Engine for Data Integration
An Adaptive Query Execution Engine for Data Integration Zachary Ives, Daniela Florescu, Marc Friedman, Alon Levy, Daniel S. Weld University of Washington Presented by Peng Li@CS.UBC 1 Outline The Background
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationAN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS. Mengmeng Liu. Computer and Information Science. University of Pennsylvania.
AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS Mengmeng Liu Computer and Information Science University of Pennsylvania WPE-II exam Janurary 28, 2 ASTRACT Traditional database query processors separate
More informationContents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...
Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationFinal Review. May 9, 2018 May 11, 2018
Final Review May 9, 2018 May 11, 2018 1 SQL 2 A Basic SQL Query (optional) keyword indicating that the answer should not contain duplicates SELECT [DISTINCT] target-list A list of attributes of relations
More informationFinal Review. May 9, 2017
Final Review May 9, 2017 1 SQL 2 A Basic SQL Query (optional) keyword indicating that the answer should not contain duplicates SELECT [DISTINCT] target-list A list of attributes of relations in relation-list
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 7 - Query execution References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton
More informationAlgorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)
Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two
More informationAn Initial Study of Overheads of Eddies
An Initial Study of Overheads of Eddies Amol Deshpande University of California Berkeley, CA USA amol@cs.berkeley.edu Abstract An eddy [2] is a highly adaptive query processing operator that continuously
More informationChapter 3. Algorithms for Query Processing and Optimization
Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Chapter 25 Distributed Databases and Client-Server Architectures Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 25 Outline
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationExploiting Predicate-window Semantics over Data Streams
Exploiting Predicate-window Semantics over Data Streams Thanaa M. Ghanem Walid G. Aref Ahmed K. Elmagarmid Department of Computer Sciences, Purdue University, West Lafayette, IN 47907-1398 {ghanemtm,aref,ake}@cs.purdue.edu
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationMAXIMIZED RESULT RATE JOIN ALGORITHM
MAXIMIZED RESULT RATE JOIN ALGORITHM 1 HEMALATHA GUNASEKARAN, 2 THANUSHKODI K 1 Research Scholar, Anna University, India 2 Director, Akshaya College of Engineering and Technology, India E-mail: 1 hemalatha2107@gmail.com,
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationRelational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4
More informationDistributed Query Optimization: Use of mobile Agents Kodanda Kumar Melpadi
Distributed Query Optimization: Use of mobile Agents Kodanda Kumar Melpadi M.Tech (IT) GGS Indraprastha University Delhi mk_kumar_76@yahoo.com Abstract DDBS adds to the conventional centralized DBS some
More informationNew Join Operator Definitions for Sensor Network Databases *
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 41 New Join Operator Definitions for Sensor Network Databases * Seungjae
More informationOutline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management
Outline n Introduction & architectural issues n Data distribution n Distributed query processing n Distributed query optimization n Distributed transactions & concurrency control n Distributed reliability
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationOutline. Eddies: Continuously Adaptive Query Processing. What s the Problem? What s the Problem? Outline. Discussion 1
: Continuously Adaptive Query Processing CPSC 504 Presentation Avnur, R. and Hellerstein, J. M. 2000. : continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD international Conference
More informationCSC A Hash-Based Approach for Computing the Transitive Closure of Database Relations. Farshad Fotouhi, Andrew Johnson, S.P.
CSC-90-001 A Hash-Based Approach for Computing the Transitive Closure of Database Relations Farshad Fotouhi, Andrew Johnson, S.P. Rana A Hash-Based Approach for Computing the Transitive Closure of Database
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationScalable Hybrid Search on Distributed Databases
Scalable Hybrid Search on Distributed Databases Jungkee Kim 1,2 and Geoffrey Fox 2 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu, 2 Community
More informationChapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.
Chapter 18 Strategies for Query Processing We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. 1 1. Translating SQL Queries into Relational Algebra and Other Operators - SQL is
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationQuery Optimization in Distributed Databases. Dilşat ABDULLAH
Query Optimization in Distributed Databases Dilşat ABDULLAH 1302108 Department of Computer Engineering Middle East Technical University December 2003 ABSTRACT Query optimization refers to the process of
More informationQuery Evaluation and Optimization
Query Evaluation and Optimization Jan Chomicki University at Buffalo Jan Chomicki () Query Evaluation and Optimization 1 / 21 Evaluating σ E (R) Jan Chomicki () Query Evaluation and Optimization 2 / 21
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationFaster Join Query Results Using Novel Bucket Join Algorithm
ISSN 2320-2602 Volume 2, No.8, August 2013 Neeti Chadha et al., International Journal Journal of Advances of Advances in Computer in Science Computer and Technology, Science 2(8), and August Technology
More informationApplication of snapshot isolation protocol to concurrent processing of long transactions
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2006 Application of snapshot isolation protocol to concurrent processing
More informationHash-Based Indexing 165
Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 6 Lifecycle of a Query Plan 1 Announcements HW1 is due Thursday Projects proposals are due on Wednesday Office hour canceled
More informationNew Bucket Join Algorithm for Faster Join Query Results
The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015 701 New Bucket Algorithm for Faster Query Results Hemalatha Gunasekaran 1 and ThanushkodiKeppana Gowder 2 1 Department Of
More informationFlowBack: Providing Backward Recovery for Workflow Management Systems
FlowBack: Providing Backward Recovery for Workflow Management Systems Bartek Kiepuszewski, Ralf Muhlberger, Maria E. Orlowska Distributed Systems Technology Centre Distributed Databases Unit ABSTRACT The
More informationDistributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014
Distributed DBMS Advantages and disadvantages of distributed databases. Functions of DDBMS. Distributed database design. Distributed Database A logically interrelated collection of shared data (and a description
More informationRelational Model: History
Relational Model: History Objectives of Relational Model: 1. Promote high degree of data independence 2. Eliminate redundancy, consistency, etc. problems 3. Enable proliferation of non-procedural DML s
More informationIncremental Evaluation of Sliding-Window Queries over Data Streams
Incremental Evaluation of Sliding-Window Queries over Data Streams Thanaa M. Ghanem 1 Moustafa A. Hammad 2 Mohamed F. Mokbel 3 Walid G. Aref 1 Ahmed K. Elmagarmid 1 1 Department of Computer Science, Purdue
More informationEddies: Continuously Adaptive Query Processing. Jae Kyu Chun Feb. 17, 2003
Eddies: Continuously Adaptive Query Processing Jae Kyu Chun Feb. 17, 2003 Query in Large Scale System Hardware and Workload Complexity heterogeneous hardware mix unpredictable hardware performance Data
More informationDistributed minimum spanning tree problem
Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with
More informationSelecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach +
Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Abdullah Al-Hamdani, Gultekin Ozsoyoglu Electrical Engineering and Computer Science Dept, Case Western Reserve University,
More informationOnline Integration of Semistructured Data
University of Wollongong Research Online University of Wollongong Thesis Collection 2017+ University of Wollongong Thesis Collections 2017 Online Integration of Semistructured Data Handoko University of
More informationOn Multiple Query Optimization in Data Mining
On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationFedX: A Federation Layer for Distributed Query Processing on Linked Open Data
FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany
More informationRelational Model, Relational Algebra, and SQL
Relational Model, Relational Algebra, and SQL August 29, 2007 1 Relational Model Data model. constraints. Set of conceptual tools for describing of data, data semantics, data relationships, and data integrity
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More informationOutline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection
Outline Query Processing Overview Algorithms for basic operations Sorting Selection Join Projection Query optimization Heuristics Cost-based optimization 19 Estimate I/O Cost for Implementations Count
More informationA Finite State Mobile Agent Computation Model
A Finite State Mobile Agent Computation Model Yong Liu, Congfu Xu, Zhaohui Wu, Weidong Chen, and Yunhe Pan College of Computer Science, Zhejiang University Hangzhou 310027, PR China Abstract In this paper,
More informationComputing Data Cubes Using Massively Parallel Processors
Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li {luhj,huangxia,lizhixia}@iscs.nus.edu.sg Department of Information Systems and Computer Science National University
More information1 Introduction 2. 2 A Simple Algorithm 2. 3 A Fast Algorithm 2
Polyline Reduction David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy
More informationArchitecting a Network Query Engine for Producing Partial Results
Architecting a Network Query Engine for Producing Partial Results Jayavel Shanmugasundaram 1,2 Kristin Tufte 3 David DeWitt 1 Jeffrey Naughton 1 David Maier 3 jai@cs.wisc.edu, tufte@cse.ogi.edu, dewitt@cs.wisc.edu,
More informationOn Generalizing Rough Set Theory
On Generalizing Rough Set Theory Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract. This paper summarizes various formulations
More informationFast Discovery of Sequential Patterns Using Materialized Data Mining Views
Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo
More informationOptimization of Queries in Distributed Database Management System
Optimization of Queries in Distributed Database Management System Bhagvant Institute of Technology, Muzaffarnagar Abstract The query optimizer is widely considered to be the most important component of
More informationXQuery Optimization Based on Rewriting
XQuery Optimization Based on Rewriting Maxim Grinev Moscow State University Vorob evy Gory, Moscow 119992, Russia maxim@grinev.net Abstract This paper briefly describes major results of the author s dissertation
More informationChapter 14: Query Optimization
Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog
More informationA MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS
A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS Alberto Pan, Paula Montoto and Anastasio Molano Denodo Technologies, Almirante Fco. Moreno 5 B, 28040 Madrid, Spain Email: apan@denodo.com,
More informationPetri-net-based Workflow Management Software
Petri-net-based Workflow Management Software W.M.P. van der Aalst Department of Mathematics and Computing Science, Eindhoven University of Technology, P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands,
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset
More informationTowards a formal model of object-oriented hyperslices
Towards a formal model of object-oriented hyperslices Torsten Nelson, Donald Cowan, Paulo Alencar Computer Systems Group, University of Waterloo {torsten,dcowan,alencar}@csg.uwaterloo.ca Abstract This
More informationA New Framework For Query Optimization In Multidatabase System Environment
A New Framework For Query Optimization In Multidatabase System Environment Mostafa M. Syiam Faculty of Computer Science & Information system, Ain Shams University, Egypt ABSTRACT H. A. Ali Computers &
More informationI. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications,
I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications, Proc. of the International Conference on Knowledge Management
More informationA Framework for Enforcing Constrained RBAC Policies
A Framework for Enforcing Constrained RBAC Policies Jason Crampton Information Security Group Royal Holloway, University of London jason.crampton@rhul.ac.uk Hemanth Khambhammettu Information Security Group
More information3. Relational Data Model 3.5 The Tuple Relational Calculus
3. Relational Data Model 3.5 The Tuple Relational Calculus forall quantification Syntax: t R(P(t)) semantics: for all tuples t in relation R, P(t) has to be fulfilled example query: Determine all students
More informationOn some heuristic method for optimal database workload reconstruction
On some heuristic method for optimal database workload reconstruction Marcin Zimniak 1, Marta Burzańska 2, and Bogdan Franczyk 1 1 Information Systems Institute Leipzig University, Germany {zimniak,franczyk}@wifa.uni-leipzig.de
More informationReferences. 6. Conclusions
insert((1, 2), R 1 ). Suppose further two local updates insert((2, 5), R 2 ) and delete((5, 6), R 3 ) occurred before the maintenance sub-queries for insert((1, 2), R 1 ) are evaluated by S 2 and S 3,
More informationUniversal Timestamp-Scheduling for Real-Time Networks. Abstract
Universal Timestamp-Scheduling for Real-Time Networks Jorge A. Cobb Department of Computer Science Mail Station EC 31 The University of Texas at Dallas Richardson, TX 75083-0688 jcobb@utdallas.edu Abstract
More informationProtocols for Integrity Constraint Checking in Federated Databases *
Distributed and Parallel Databases, 5, 327 355 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Protocols for Integrity Constraint Checking in Federated Databases * PAUL GREFEN
More informationData Flow Graph Partitioning Schemes
Data Flow Graph Partitioning Schemes Avanti Nadgir and Harshal Haridas Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802 Abstract: The
More informationA FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS
A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:
More informationMobile and Heterogeneous databases
Mobile and Heterogeneous databases Heterogeneous Distributed Databases Query Processing A.R. Hurson Computer Science Missouri Science & Technology 1 Note, this unit will be covered in two lectures. In
More informationTHE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER
THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose
More informationFREddies: DHT-Based Adaptive Query Processing via FedeRated Eddies
FREddies: DHT-Based Adaptive Query Processing via FedeRated Eddies Ryan Huebsch and Shawn R. Jeffery EECS Computer Science Division, UC Berkeley {huebsch, jeffery}@cs.berkeley.edu Report No. UCB/CSD-4-1339
More informationTextbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!
Chapter 3: Formal Relational Query Languages CS425 Fall 2013 Boris Glavic Chapter 3: Formal Relational Query Languages Relational Algebra Tuple Relational Calculus Domain Relational Calculus Textbook:
More informationAn Efficient Ranking Algorithm of t-ary Trees in Gray-code Order
The 9th Workshop on Combinatorial Mathematics and Computation Theory An Efficient Ranking Algorithm of t-ary Trees in Gray-code Order Ro Yu Wu Jou Ming Chang, An Hang Chen Chun Liang Liu Department of
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationChapter 14 Query Optimization
Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming
More informationApache Flink. Alessandro Margara
Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate
More informationChapter 14 Query Optimization
Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming
More informationChapter 14 Query Optimization
Chapter 14: Query Optimization Chapter 14 Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming
More informationGraph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy
Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, 00142 Roma, Italy e-mail: pimassol@istat.it 1. Introduction Questions can be usually asked following specific
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationAn Optimization of Disjunctive Queries : Union-Pushdown *
An Optimization of Disjunctive Queries : Union-Pushdown * Jae-young hang Sang-goo Lee Department of omputer Science Seoul National University Shilim-dong, San 56-1, Seoul, Korea 151-742 {jychang, sglee}@mercury.snu.ac.kr
More information