Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach +
|
|
- Jonas Griffith
- 5 years ago
- Views:
Transcription
1 Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Abdullah Al-Hamdani, Gultekin Ozsoyoglu Electrical Engineering and Computer Science Dept, Case Western Reserve University, Cleveland, Ohio (abd, tekin)@eecs.cwru.edu Abstract. This paper discusses algorithms for topic selection queries, designed to query a database containing metadata about web information resources. The metadata database contains topics and relationships, called metalinks, about topics. Topics in the database contain associated importance scores. The topic selection operator TSelection selects, within time T, topics that satisfy a given selection formula and having output importance scores above a given threshold value or in the top-k. The selection formula contains expensive predicates, in the form of user-defined functions. To minimize the number of expensive predicate evaluations (probes) in the TSelection algorithm, we introduce and evaluate three heuristics. Also, due to the time constraint T, the TSelection algorithm may terminate without locating all output tuples. In order to maximize the number of output tuples found, we introduce and evaluate three heuristics to locate a tuple to evaluate at a given time. 1 Introduction Search engines such as Yahoo! use topics and topic hierarchies, extracted as metadata from the web, in order to allow for keyword-based searches over the entire web. We propose (i) restricting the scope of metadata extraction to specific web information resources such as the ACM Digital Library [1], and (ii) extending the metadata extraction process to include automatic extraction of topics and relationships among topics (called metalinks in this paper). Such metadata is stored in a database, and employed for ad hoc querying of web information resources [4, 5]. Data Model. To model the metadata extracted from a web information resource, we have recently used [3, 4, 5] a topic maps-based [9] data model with topic entities (a keyword or a phrase) and metalinks. Examples of topics for the DBLP Bibliography [6] and the ACM SIGMOD Anthology [7] are T1: query optimization (a phrase ), T2: database dependency theory (a phrase), and T3: The Interaction between Functional Dependencies and Template Dependencies (the title of a paper by Sadri and Ullman [8] in ACM SIGMOD Anthology). Topics and metalinks have associated importance scores, which may be obtained using data mining techniques (e.g., association rule-based mining), derived by information retrieval techniques (e.g., the vector space model [10] and cosine similarities) [14], or others (e.g., [12]). Each topic has one or more topic sources; for example, the pdf file for paper with title T3 in the ACM SIGMOD Anthology constitutes a topic source for both topics T2 and T3. Note that topics and metalinks are metadata, (e.g., information about the web resource) + This research is supported by the National Science Foundation grants INT and DBI
2 whereas topic sources constitute data. Maintaining topics and metalinks as metadata in a database allows for ad hoc queries to locate relevant topic sources. Normally, the number of topics satisfying a user request is quite large. Therefore, an algebra-based way of ranking topics and returning only a small number of highly important topics (and their topic sources) is needed. Towards this goal, we have proposed [4] a sideway value algebra (SVA) for object-relational databases. This paper investigates the evaluation of one SVA operator for web computing, namely, the SVA selection operator that implements topic selection queries. Topic Selection (TS) Queries. Given a database of topics, metalinks, and topic source URLs, a TS query takes (i) a single topic relation R with each tuple t containing information about topics and having an importance score Imp(R(t)), (ii) a (propositional calculus) selection formula C with predicates on topic similarity, (iii) an output importance score computation function f out (t), (iv) a query stopping condition β in terms of top-k importance-scored output tuples or output tuples whose importance scores are above a given threshold, and (v) a query response time limit T. The TS query returns within time T those tuples t of R satisfying the selection formula C and with output importance scores computed as f out (t) satisfying β. Example 1. (TS query). Consider the web resources DBLP Bibliography and the ACM SIGMOD Anthology, and the associated metadata database at Assume that the relation RelatedToPapers, extracted from DBLP and Anthology, has the schema RelatedToPapers(Pid 1, Title 1, Abstract 1, Pid 2, Title 2, Abstract 2, ) where the importance score of each RelatedToPapers tuple t can be obtained through the function Imp(RelatedToPapers(t)). Topics/metalinks/topic source importance scores are reals in the range [0,1]. The paper type is a specialization of the topic type, and a paper (instance) is an object/entity with an object-id (i.e., Pid). A user is interested in selecting, within 2 minutes, the top 10 papers in DBLP and Anthology that are related to the paper [8] by Sadri and Ullman (say, with Pid of p23) with a RelatedToPapers importance score of 0.8 or above; and selected papers have either titlesimilarity of 0.9 or above to the paper p23, or an abstract-similarity to p23 of 0.95 or above. Such a query can be expressed using an SQL-like syntax as: select * from RelatedToPapers RT where RT.Pid1= p23 and Imp(RT) 0.8 and (Sim (RT.Title1, RT.Title2)> 0.9 or Sim(RT.Abstract1, RT.Abstract2) 0.95) propagate importance within selection as min for conjunctions and avg for disjunctions stop after 10 most important and within time 2 min where Sim() denotes a similarity function. Propagate importance clause defines the output tuple importance score computation function f out which, in this case, is defined as f out ( ) Imp(RT( )) * AVG [Sim(RT.Abstract 1, RT.Abstract 2 ), Sim(RT.Title 1, RT.Title 2 )] Issues addressed in this paper. We investigate two query processing issues for TS queries: (a) Expensive metalink importance score computation: Some of the metalink types such as RelatedTo are expensive [13] in that their importance score computations are time-consuming executions. We call such functions expensive functions, and any predicate that contains an expensive function an expensive predicate. As an example, the importance score of the metalink type RelatedToPapers, given papers pid 1 and pid 2, can be computed [14] by the function
3 Imp(RelatedToPapers(pid 1, pid 2 )) = w Title * Sim Title (pid 1.title, pid 2.title) +w Authors * Sim Authors (pid 1.authors, pid 2.authors) + w Abstract * Sim Abstract (pid 1.abstract, pid 2.abstract) + w IndexTerms * Sim IndexTerms (pid 1.index-terms, pid 2.index-terms) + w Body * Sim Body (pid 1.body, pid 2.body) + w References * Sim References (pid 1.references, pid 2.references). (1) where Sim Title ( ), Sim Authors ( ), Sim Abstract ( ), Sim IndexTerms ( ), Sim Body ( ), and Sim References ( ) denote similarity functions for pairs of paper titles, authors, abstracts, index terms, paper body, and references, respectively; and w terms constitute weight terms with the constraint w Title + w IndexTerms + w Authors + w Body + w References + w Abstract = 1. We refer to a metalink importance score evaluation as a probe. Clearly, a RelatedToPapers probe (i.e., the computation of Imp(RelatedToPapers(pid 1, pid 2 )) in equation (1)) is expensive, and the system must attempt to minimize the number of such probes. We assume that (i) all probes of a given metalink type have the same cost, and (ii) there is a total ordering to the probe costs of different metalink types. We make the assumption that it is not expensive to compute topic importance scores, and topic importance scores are computed a priori (using pre-collected topic source data) and maintained in the metadata database. (b) Time Constraints. TS query computation times can be too high. Therefore, such queries may have a time constraint clause of the form, say, Time=2 minutes. Time constraints can be transformed into constraints on the number of expensive predicate evaluations (i.e., probes). Time-constrained query evaluation algorithms must be correct ; i.e., given a top-k query with a time constraint, all output tuples must be in the top-k; and, for a threshold query with a threshold τ and a time constraint, importance scores of output tuples must be greater than τ. In the ideal case, a TS query evaluation performs only the probes needed for the output tuples, i.e., the positive probes. In general there will be some probes that will not contribute towards an output tuple, resulting in wasted time. In such cases, there can be multiple goals such as maximizing the number of highest importance-scored output tuples, or maximizing the number of output tuples satisfying an importance score threshold, etc. We present different heuristics and their evaluations for different goals. Contributions. We discuss TS algorithms for evaluating the SVA Topic Selection operator. The algorithms are pipelined: they continuously generate output, and attempt to maximize the number of positive probes they make. In section 2, we present the top-k and threshold-based TS algorithms. Section 3 briefly presents the experimental evaluations. Section 4 concludes. 2 Top-k and Threshold-based Topic Selection Algorithms In this section, we use an operator-based specification of the topic selection query (instead of the SQL-like syntax) as σ * C, f out, β, T (R) where each tuple r of the relation R has an input importance score Imp in (r), C is the selection condition with only expensive predicates, f out () is the output importance score function, β is the output threshold which is either a positive integer k as the ranking threshold, a real-valued importance score threshold V t in the range [0, 1], or the two-tuple (k, V t ), and T is the
4 time constraint. The operator σ * returns, in decreasing order of output importance scores and within time T, either (i) top k f out -ranking output tuples that satisfy the selection condition C (when β is k), or (ii) all tuples of R with an f out -importance score greater than V t and satisfy the selection condition C (when β is V t ), or (iii) top k f out - ranking output tuples that satisfy the selection condition C and with an f out -importance score greater than V t (when β is the two-tuple (k, V t )). When the time constraint T is not sufficient to get the answer, the selection query evaluation becomes a best-effort evaluation. We assume that the input relation R is sorted in decreasing order by the input importance scores Imp in of its tuples. Example 2. In the selection query of example 1, after eliminating inexpensive predicates, we have C as C = Imp(RT) 0.9 and [Sim(RT.Title 1, RT.Title 2 ) 0.9 or Sim(RT.Abstract 1, RT.Abstract 2 ) 0.95)] = C 1 I (C 2 U C 3 ) = (C 1 I C 2 ) U (C 1 I C 3 ) = term 1 U term 2. Using the importance score functions Min and Avg as specified in the query, we have, for a given tuple r, Imp(C, r) = AVG(Imp(term 1, r), Imp(term 2, r)) = AVG(MIN(Imp(C 1, r), Imp(C 2, r)), MIN(Imp(C 1, r), Imp(C 3, r))). If a predicate C i in a given term t is false (i.e., Imp(C i, r)=0) then term t is false (i.e., Imp(t, r)=0). Therefore, we do not need to evaluate the unevaluated predicates in term t. Note that, in our environment, the expensive predicates are either importance score computation functions (e.g., Imp(RelatedToPapers()) or similarity computation predicates (e.g., Sim() > 0.9). Without losing generality, we assume that the selection condition C is in the disjunctive normal form C = U i term i where term i = I j C j and C j is an atomic expensive predicate. The output importance score f out (r) for a given tuple r is computed as f out (r)=imp in (r) * Imp(C,r) = Imp in (r) * g 1 (g 2 (term 1,r), g 2 (term 2,r),.) where g 1 ( ) and g 2 ( ) are monotone functions that are used to incorporate the effects of the disjunctions and the conjunctions, respectively, on the output importance score computation. When g 1 ( ) and g 2 ( ) are monotone, the decision that a given tuple is not in the topk or does not satisfy the threshold value V t can be derived without evaluating all of its atomic expensive predicates. The monotonicity of a given function is defined below. Definition 1 (Monotone Function). A given function g( ) with n parameters is monotone iff it satisfies the following two conditions: (a) If a i b i for all 1 i n then g(a 1,..,a n ) g(b 1,..,b n ), and (b) g(a 1,,a n )=1.0 iff a i =1.0 for all 1 i n. Some examples of monotone functions are PRODUCT(a 1,, a n ) = n i= 1 a i, MIN(a 1,, a n ) = a i where a i a j for all 1 j n, and AVG(a 1,, a n ) = (a 1 +a 2 + +a n )/n. In this paper, we assume that g 1 ( ) and g 2 ( ) functions are monotone. 2.1 Fixed Order Probe-Optimal Topic Selection Algorithm Chang and Hwang [2] have proposed the MPro algorithm to evaluate the top-k Selection operator with only conjunctive expensive predicates using only one monotone function F(x,p 1,p 2,,p n ), where x is a pre-computed inexpensive predicate (i.e., with zero cost) and p 1,p 2,..,p n are expensive predicates. They have proven that MPro algorithm is probe-optimal assuming that there is a pre-defined and fixed evaluation ordering of the predicates. The problem with the MPro algorithm is that
5 usually there is no fixed optimal predicate evaluation order, and the best predicate evaluation order dynamically changes with respect to the tuple being evaluated. In the rest of this section, assuming a fixed pre-defined predicate evaluation ordering, we adapt and extend the minimal probing algorithm MPro in order to evaluate the top-k or threshold-based topic selection operator with two evaluation functions g 1 and g 2. Then, in section 2.2, we eliminate the fixed predefined predicate ordering assumption, and revise the algorithms with dynamically chosen predicates to evaluate. First, we define the evaluation cost of a given expensive predicate. Definition 2 (Expected Evaluation Cost). Let Cost(C i ) be the expected evaluation (time) cost of C i where C i is a conjunct in a selection formula C, 1 i n, which is in the disjunctive normal form. Then the expected evaluation cost of C is defined as n Cost(C) = Σ i= 1 Cost(C i ). Assume, at a given time, the predicates C 1 to C current, 1 current n, have been evaluated using Imp( ), here referred to as EvaluatedImp( ). Definition 3 (Unevaluated Predicate Cost). Let C 1, C 2, C 3,, C n be the pre-defined evaluation ordering for computing f out for a topic selection operator on relation R, and Cost(C i ) be the expected evaluation cost of a given predicate C i. The unevaluated predicate cost UCost(r) for a given tuple r after computing the predicate C current with n tuple r in relation R is defined as UCost(r) = Σ Cost(C i ). i= current + 1 Definition 4(Imp(C j,r)). Let C 1, C 2, C 3,, C n be the pre-defined evaluation ordering for computing f out for a topic selection operator on relation R. The importance score for the j th expensive predicate C j after computing the predicate C current with tuple r in relation R is defined as: Imp(C j,r) = EvaluatedImp(C j, r) if j current 1 otherwise When j>current, we refer to C j as an unevaluated expensive predicate; otherwise it is an evaluated expensive predicate. Definition 5 (Current Output Importance Score). Consider a topic selection operator with the selection condition C = U m i= 1 term i and term i = j C j. The current output importance score of a tuple r on a relation R using the topic selection operator after evaluating the expensive predicate C current is Imp current (r) = Imp in (r) * g 1 (g 2 (term 1,r), g 2 (term 2,r),, g 2 (term m,r)), where g 1 ( ), g 2 ( ) are monotone functions, n is the number of atomic selection predicates, 1 current n, m is number of terms, g 2 (term i,r) = g 2 (Imp(C j,r),imp(c j+1,r), ), and Imp(C j,r) is as defined in definition 4. Proofs of all lemma and theorems are presented in [11]. Lemma 1. (a) For 1 current <n, f out (r) Imp current (r) (b) When current=n then f out (r) = Imp current (r) For the threshold-based TSelection algorithm, lemma 2 states the early termination criteria for a tuple to be dropped from the output. Lemma 2. Assume that the threshold-based topic selection operator is to be applied to tuple r in relation R with atomic expensive predicates C 1, C 2,..., C n.. During the
6 evaluation of expensive predicates, if Imp current (r) becomes less than the threshold value V t then the tuple r cannot be in the output. Note that, in comparison, the top-k based TSelection algorithm does not have the early termination criteria. A tuple r can be dropped from the output only if Imp current (r) less is than the importance score f out of k fully evaluated tuples. Fig.1 illustrates the threshold-based topic selection Threshold-TSelection algorithm. In each iteration, the algorithm finds a tuple r for evaluation using the LocateTuple( ) function, and finds an unevaluated predicate C j of tuple r using the LocatePredicate( ) function. It evaluates the predicate C j for tuple r and computes Imp current (r) using g 1 ( ) and g 2 ( ) functions. If Imp current (r) is less than the threshold value V t then tuple r is discarded. If all predicates are evaluated for tuple r and Imp current (r) V t then tuple r is added to the output. The algorithm stops when all output tuples are found or when the time T runs out. We assume in this section that the LocatePredicate( ) function locates the next predicate to evaluate by using the same pre-defined predicate evaluation ordering for all tuples. Algorithm Threshold-TSelection (V t,r,c,g1,g2,t) For each tuple r in R do{ Imp current (r)=imp in (r); Set Imp(C i,r)=1.0 for all expensive predicates C i ; if(imp in (r) V t )then Add r into PossibleOutput;} While(PossibleOutput is not empty)and(time T is sufficient)do{ r=locatetuple(possibleoutput); C j =LocatePredicate(r);//C j is an unevaluated expensive predicate Imp(C j,r)=probe(c j ); Compute Imp current (r)using g 1 and g 2 ; if(imp current (r)< V t ) then remove r from PossibleOutput else if(all predicates of r are evaluated)then add r into Output;} Fig. 1: Threshold-TSelection Algorithm Theorem 1. Threshold-Selection algorithm has no false drops; i.e., it does not output tuples that are not in the output of the TSelection operator. And, when T is sufficiently large to evaluate all tuples in PossibleOutput, the algorithm has no false dismissals; i.e., it outputs all tuples that are in the output of the TSelection operator. Fig. 2 illustrates the top-k topic selection Top-k-TSelection algorithm. Algorithm Top-k-TSelection (k,r,c,g 1,g 2,T) For each tuple r in R do{ Set Imp(C i,r)=1.0 for all expensive predicates C i ; Imp current (r)=imp in (r); Add r into PossibleOutput;} While ( Output <k) and (Time T is sufficient) do{ Let CurrentTopK be (k- Output ) tuples in PossibleOutput with the highest Imp current. r=locatetuple(currenttopk); if(there exist an unevaluated expensive predicate in tuple r) then{c j =LocatePredicate(r);//C j is an unevaluated expensive predicate Imp(C j,r)=probe(c j ); Compute Imp current (r)using g 1 and g 2 ;} if(all r s predicates are evaluated)and(r is current top-k tuples) then add r into Output;} Fig. 2: Top-k-TSelection Algorithm
7 Theorem 2. Top-k-Selection algorithm has no false drops. And, when T is sufficiently large to evaluate all tuples in PossibleOutput, the algorithm has no false dismissals. 2.2 Time-Constrained Query Evaluation Heuristics Due to the time constraint T, TSelection algorithm is terminated at time T, possibly before locating all output tuples that satisfy the given threshold value V t. There are multiple possible query evaluation goals: (1) Maximize the number of highest importance-scored output tuples. MaxImpLT (Locate Tuple) Heuristic: Locate the tuple r with the highest Imp current ( ) from PossibleOutput. (2) Maximize the number of higher (but, not the highest) importance-scored output tuples with lower unevaluated predicate costs. HigherImpLT Heuristic: Locate the tuple r with the highest Imp current ( ) / UCost( ) from PossibleOutput. (3) Maximize the size of the query output. MaxSizeLT Heuristic: Locate the tuple r with the lowest UCost( ) from PossibleOutput. If two or more tuples have the lowest UCost( ) then locate among them the tuple with the highest Imp current ( ). In section 3.2, we present comparative evaluations of the three heuristics. 2.3 Dynamic Predicate Evaluation Order Heuristics Next, we discuss heuristics for the LocatePredicate() function. The best predicate evaluation order may change with respect to the tuple chosen with LocateTuple( ) while evaluating the expensive predicates. For example, assume that the importance score of a given evaluated expensive predicate C i is very small (say, Imp(C i )=0.05) for a given tuple r. Therefore, the remaining unevaluated predicates in the same conjunctive term with C i possibly have a very small effect on the overall output importance score Imp current (r). Therefore, after computing Imp(C i )=0.05 for r, it is better to evaluate an expensive predicate in another term. Thus, the best order of the expensive predicates for a given tuple should be dynamically computed. Next, we define the heuristic (Predicate-Selection-by-)D(ynamic-)E(ffect-On-)C(ost)1 that locates the next expensive predicate to be evaluated in a dynamic manner for a given tuple r. DEC1-LP (LocatePredicate) Heuristic: Assume that, at a given time, the predicates C 1,C 2,, C current in the selection condition C are evaluated for a given tuple r in a given relation R. We choose the unevaluated expensive predicate C i with the highest max-effect-on-cost as the next predicate in the dynamic evaluation order. The maxeffect-on-cost(c i ) for each unevaluated predicate C i is computed as follows: Let Imp min (C i,r) be computed as Imp current (r) after assigning Imp(C i,r)=0 and Imp(C j,r)=1 for unevaluated predicates C j, j i. Max-Effect-On-Cost(C i,r) [Imp current (r) Imp min (C i,r)]/cost(c i ) It is easy to prove that if the importance score of the predicates of a given selection condition C are uniformly distributed and independent from each other then the DEC1 heuristic gives the optimum order of predicate evaluation for a given tuple r. If the importance scores of predicates are not uniformly distributed or their distribution is not known then we may take a small sample of the tuples from relation R.
8 DEC2-LP Heuristic. At a given time, let the predicates C 1,C 2,, C current be evaluated for a given tuple r in a given relation R. The heuristic chooses the unevaluated expensive predicate C i with the highest Max-Effect-On-Cost2 as the next predicate in the dynamic evaluation order. The Max-Effect-On-Cost2(C i ) for each unevaluated predicate C i is computed as follows: Let Imp min (C i,r) be computed as Imp current (r) after assigning Imp(C i,r)=0 and Imp(C j,r)=1 for unevaluated predicates C j, j i. Also, let SampleSel(C i ) be the selectivity of the importance scores of predicate C i using a sample from relation R. Max-Effect-On-Cost2(C i,r) [(Imp current (r) Imp min (C i,r))+(1-samplesel(c i )]/Cost(C i ) The following heuristic (Predicate-Selection-by-)D(ynamic-)A(vg-)E(ffect-On- )C(ost) uses the average expected Effect-On-Cost for each unevaluated predicate, and chooses the predicate with the highest Avg-Effect-On-Cost as the next predicate to evaluate. DAEC-LP Heuristic. Assume that, at a given time, the predicates C 1,C 2,, C current are evaluated in the selection condition C for a given tuple r in a given relation R. The heuristic chooses the next predicate to evaluate as the predicate C i with the highest Avg-effect-On-Cost. The Avg-effect-On-Cost(C i,r) for each unevaluated predicate C i is computed as follows: Let Imp avg (C i,r) be computed as Imp current (r) after assigning Imp(C i,r)= expected average of Imp(C i ), computed using sampling, and Imp(C j,r)=1 for unevaluated predicates C j, j i. Avg-Effect-On-Cost(C i ) [Imp current (r) Imp avg (C i,r)]/ Cost(C i ) 3 Experimental Evaluations of TSelection Algorithms We have implemented Top-k-TSelection and Threshold-TSelection algorithms using synthetic data. The importance scores for the expensive predicates are generated using uniform distribution, and normal distributions. We compute the selectivity of predicates by randomly evaluating the predicates for 1% of the tuples from a relation R. For each tuple t, we compute its derived importance score Imp(t) using g 1 ( ) and g 2 ( ) functions. Let Imp max be the tuple with the highest Imp on the sample S. The selectivity Sel(P) of a predicate P is established as the number of tuples in S with Imp(t, P) > Imp max divided by the total number of tuples in S. 3.1 Locate-Predicate Heuristics We compare the performances of the MPro, DEC1, DEC2 and DAEC heuristics in terms of the time differences between their evaluation times and the evaluation time of the Best heuristic. For the Best heuristic, we assume that at a given time we know the actual truth values and importance scores for all predicates for a given tuple t. As in [2], the fixed predicate ordering for MPro heuristic is the descending ordering of the predicates by their ranks, where the rank of a predicate P is (1 - Sel(P)) / Cost(P). We use the selection condition (C 1 I C 2 ) U (C 3 I C 4 I C 5 ) with the expected evaluation cost Cost(C i )={0.5, 0.3, 1.0, 0.4, 0.1} seconds, respectively. That is, C 1 takes 0.5 seconds to evaluate, C 2 takes 0.3 seconds to evaluate and so on. The size of the input relation R is 1000 tuples. We compute the derived importance score Imp(t) of a given tuple t using the average function as g 1 and the minimum function as g 2. a) Uniform distribution: The importance scores of all expensive predicates have been generated using the uniform distribution. We have observed that the dynamic
9 predicate evaluation order heuristics improve the performances of the top-k and threshold based TSelection algorithms by 8% to 20%. The time difference between the MPro heuristic evaluation time and that using other dynamic heuristics increases as the threshold V t decreases or k increases. As expected, the DEC1 heuristic has lower total cost (i.e., faster) than other heuristics for both top-k and threshold based TSelection algorithms. The DEC1 heuristic is (14, 20)% better than the MPro heuristic. The evaluation time using the DAEC heuristic is very close to that using the DEC1 heuristics, whereas, the DEC2 heuristic has a higher evaluation time. The increase in the evaluation time for threshold-based TSelection is almost linear with respect to the decrease threshold V t. b) Combinations of distributions: The distributions of the importance scores for the expensive predicates C 1 and C 3 are chosen as uniform, and C 2, C 4, and C 5 are chosen as normal with a mean of 0.7 and a standard deviation of 0.2.We have observed that the evaluation time differences between the MPro heuristic and other dynamic heuristics is small as compared to those in other distributions. The difference decreases when k increases or threshold V t decreases. The dynamic heuristics are (0, 9)% better than the MPro heuristic for threshold-based and top-k algorithms. Also, the DEC2 heuristic has the best performance for both top-k and threshold based TSelection algorithms. In conclusion, using any distribution to generate the importance scores of the expensive predicates, the dynamic predicate evaluation order heuristics improve the performances of the top-k and threshold based TSelection algorithms. If the importance scores for all expensive predicates have the same distribution then the best heuristic is the DEC1 heuristic. If the importance scores have different distributions then the DEC2 heuristic has the best performance. 3.2 Time Constraints We have evaluated top-k and threshold based TSelection algorithms with a time constraint T and by using different distributions to generate importance scores for expensive predicates. We used the MaxImpLT, HigherImpLT, and MaxSizeLT heuristics to locate the tuple from which an unevaluated expensive predicate is to be evaluated at a given time. For a given time constraint T, we compare the performances of the three heuristics in terms of the precision and wasted time ratio. Definition 6 (Wasted Time Ratio, Precision): Let T be a query evaluation time limit, and t useful be the time spent, out of time T, to completely evaluate those tuples that are verified to be in the output. Then, Wasted time ratio = 1 - ( t useful / T), and Precision =(No. of output tuples found within time T)/(No. of tuples in the fully evaluated output). We used an input relation R of size 500 tuples. We used the selection condition (C 1 I C 2 ) U (C 3 I C 4 I C 5 ) with the expected evaluation cost Cost(C i ) = {0.5, 0.3, 1.0, 0.4, 0.1} seconds, respectively. The derived importance score Imp(t) of a given tuple t is computed using the average function as g 1 and the minimum function as g 2. For the threshold-based TSelection algorithm, we located the tuples that satisfy the threshold value V t of 0.5. For the top-k TSelection algorithm, we computed the top-50 tuples. Let T Required be the required time to fully evaluate a given query. We used 0.5, 0.6, 0.7, 0.8 and 0.9 of T Required as the time constraint T, and computed the precision and wasted time ratio.
10 First, the importance scores for all expensive predicates are generated using the normal distribution with a mean of 0.7 and a standard deviation of 0.2. The MaxImpLT heuristic has the worst performance as compared to the other heuristics. It has (14, 54)% lower precision and (20,63)% higher wasted time ratio. The MaxSizeLT has the best performance for all values of the time constraint T: it has (0, 23)% higher precision and (0, 42)% wasted time ratio as compared to the HigherImpLT heuristic. As for the performance for the top-k TSelection algorithm using the normal distribution, the MaxImpLT has the worst performance and there is a small difference between its performance and that of other heuristics: it has (0, 20)% lower precision and (0, 4)% higher wasted time ratio. The HigherImpLT heuristic has the best performance, it has (0, 6)% higher precision and (0, 2)% lower wasted time ratio as compared to the MaxSizeLT heuristic. In conclusion, using any distribution to generate the importance scores for expensive predicates, the MaxImpLT heuristic has the worst performance for the top-k and threshold-based TSelection algorithms. In the threshold-based algorithm, there is a large difference between the performances of the MaxImpLT heuristic and that of other heuristics. And, at a given time T, MaxSizeLT has the best performance. In the top-k based algorithm, there is no large difference in the performances of the MaxImpLT heuristic and that of other heuristics, and HigherImpLT has the best performance. 4 Conclusions We have presented algorithms to evaluate the topic selection TSelection operator for information resource discovery. We have proposed and evaluated heuristics to locate tuples and to evaluate expensive predicates. References 1. ACM Digital Library, at 2. Chan, K., C-C, Hwang, S-W., "Minimal Probing: Supporting Expensive Predicates for Topk Queries", ACM SIGMOD, Altingovde, I.S., et al, Topic-Centric Querying of Web Information Resources, Proc, DEXA Ozsoyoglu, G, Al-Hamdani, A, Altıngovde, I.S, Ozel, S.A, Ulusoy, O, Ozsoyoglu, Z.M., "Sideway Value Algebra for Object-Relational Databases", VLDB Conf., Ozsoyoglu, G., Altingovde, I. S., Al-Hamdani, A., Ozel, S. A., Ulusoy, O., Ozsoyoglu, M., Extending SQL for Metadata-based Querying, Submitted for journal publication, DBLP Bibliography, by Michael Ley, at 7. ACM SIGMOD Anthology, at 8. Sadri, F., Ullman, J., The Interaction between Functional Dependencies and Template Dependencies, SIGMOD Conf., Biezunski, M., Bryan, M., Newcomb, S., editors, ISO/IEC 13250, Topic Maps, available at Salton, G., Automatic Text Processing, Addison-Wesley, Al-Hamdani, A., Ozsoyoglu, G., Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach, technical report, EECS, CWRU, Agichtein, E., Gravano, L., Snowball: Extracting Relations from Large Plain-Text Collections, Proc. of the 5 th ACM International Conf. on Digital Libraries, Hellerstein, J.M, Stonebraker, M., Predicate Migration: Optimizing queries with expensive predicates, ACM SIGMOD Li, Li, Finding Related Papers in a Digital Library, MS Thesis, CWRU, June 2003.
Topic Area: Infrastructure for information systems Category: Research
Paper Title: Sideway Value Algebra for Object-Relational Databases Paper Authors: Ozsoyoglu #, G, Al-Hamdani, A, Altıngovde, I.S, Ozel, S.A, Ulusoy, O, Ozsoyoglu, Z.M Note: G. Ozsoyoglu is on the Core
More informationTHE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER
THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose
More informationIJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 2.114
[Saranya, 4(3): March, 2015] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A SURVEY ON KEYWORD QUERY ROUTING IN DATABASES N.Saranya*, R.Rajeshkumar, S.Saranya
More informationAn Overview of various methodologies used in Data set Preparation for Data mining Analysis
An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationLeveraging Transitive Relations for Crowdsourced Joins*
Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,
More informationA Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods
A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering
More informationHash-Based Indexing 165
Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19
More informationEfficient World-Wide-Web Information Gathering. Tian Fanjiang Wang Xidong Wang Dingxing
Efficient World-Wide-Web Information Gathering Tian Fanjiang Wang Xidong Wang Dingxing (Department of Computer Science and Technology, Tsinghua University, Beijing 100084,tfj@www.cs.tsinghua.edu.cn) Abstract
More informationDesigning and Building an Automatic Information Retrieval System for Handling the Arabic Data
American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far
More informationDIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY Reham I. Abdel Monem 1, Ali H. El-Bastawissy 2 and Mohamed M. Elwakil 3 1 Information Systems Department, Faculty of computers and information,
More informationT h e incomplete database
T h e incomplete database Karen L. Kwast University of Amsterdam Departments of Mathematics and Computer Science, Plantage Muidergracht 24, 1018 TV, Amsterdam Abstract The introduction of nulls (unknown
More informationFUZZY SPECIFICATION IN SOFTWARE ENGINEERING
1 FUZZY SPECIFICATION IN SOFTWARE ENGINEERING V. LOPEZ Faculty of Informatics, Complutense University Madrid, Spain E-mail: ab vlopez@fdi.ucm.es www.fdi.ucm.es J. MONTERO Faculty of Mathematics, Complutense
More informationOptimization of Queries with User-Defined Predicates
Optimization of Queries with User-Defined Predicates SURAJIT CHAUDHURI Microsoft Research and KYUSEOK SHIM Bell Laboratories Relational databases provide the ability to store user-defined functions and
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationFinding Hubs and authorities using Information scent to improve the Information Retrieval precision
Finding Hubs and authorities using Information scent to improve the Information Retrieval precision Suruchi Chawla 1, Dr Punam Bedi 2 1 Department of Computer Science, University of Delhi, Delhi, INDIA
More informationTop-k Keyword Search Over Graphs Based On Backward Search
Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer
More informationUAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA
UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationExploiting Index Pruning Methods for Clustering XML Collections
Exploiting Index Pruning Methods for Clustering XML Collections Ismail Sengor Altingovde, Duygu Atilgan and Özgür Ulusoy Department of Computer Engineering, Bilkent University, Ankara, Turkey {ismaila,
More informationNP-Completeness of 3SAT, 1-IN-3SAT and MAX 2SAT
NP-Completeness of 3SAT, 1-IN-3SAT and MAX 2SAT 3SAT The 3SAT problem is the following. INSTANCE : Given a boolean expression E in conjunctive normal form (CNF) that is the conjunction of clauses, each
More informationCSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Chapter p. 1/27
CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Chapter 2.1-2.7 p. 1/27 CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer
More informationElement Algebra. 1 Introduction. M. G. Manukyan
Element Algebra M. G. Manukyan Yerevan State University Yerevan, 0025 mgm@ysu.am Abstract. An element algebra supporting the element calculus is proposed. The input and output of our algebra are xdm-elements.
More informationA novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems
A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics
More informationDiversified Top-k Graph Pattern Matching
Diversified Top-k Graph Pattern Matching Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 3 1 University of Edinburgh 2 RCBD and SKLSDE Lab, Beihang University 3 UC Santa Barbara {wenfei@inf, x.wang-36@sms, y.wu-18@sms}.ed.ac.uk
More informationOn Multiple Query Optimization in Data Mining
On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl
More informationInformation Discovery, Extraction and Integration for the Hidden Web
Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk
More informationA List Heuristic for Vertex Cover
A List Heuristic for Vertex Cover Happy Birthday Vasek! David Avis McGill University Tomokazu Imamura Kyoto University Operations Research Letters (to appear) Online: http://cgm.cs.mcgill.ca/ avis revised:
More informationPrinciples of AI Planning. Principles of AI Planning. 7.1 How to obtain a heuristic. 7.2 Relaxed planning tasks. 7.1 How to obtain a heuristic
Principles of AI Planning June 8th, 2010 7. Planning as search: relaxed planning tasks Principles of AI Planning 7. Planning as search: relaxed planning tasks Malte Helmert and Bernhard Nebel 7.1 How to
More informationAnnouncement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17
Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa
More information8. Relational Calculus (Part II)
8. Relational Calculus (Part II) Relational Calculus, as defined in the previous chapter, provides the theoretical foundations for the design of practical data sub-languages (DSL). In this chapter, we
More informationOn the Hardness of Counting the Solutions of SPARQL Queries
On the Hardness of Counting the Solutions of SPARQL Queries Reinhard Pichler and Sebastian Skritek Vienna University of Technology, Faculty of Informatics {pichler,skritek}@dbai.tuwien.ac.at 1 Introduction
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationAn Optimization of Disjunctive Queries : Union-Pushdown *
An Optimization of Disjunctive Queries : Union-Pushdown * Jae-young hang Sang-goo Lee Department of omputer Science Seoul National University Shilim-dong, San 56-1, Seoul, Korea 151-742 {jychang, sglee}@mercury.snu.ac.kr
More informationWEB DATA EXTRACTION METHOD BASED ON FEATURED TERNARY TREE
WEB DATA EXTRACTION METHOD BASED ON FEATURED TERNARY TREE *Vidya.V.L, **Aarathy Gandhi *PG Scholar, Department of Computer Science, Mohandas College of Engineering and Technology, Anad **Assistant Professor,
More informationDelay-minimal Transmission for Energy Constrained Wireless Communications
Delay-minimal Transmission for Energy Constrained Wireless Communications Jing Yang Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College Park, M0742 yangjing@umd.edu
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationRelational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4
More informationDesigning Views to Answer Queries under Set, Bag,and BagSet Semantics
Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati
More informationAnalysis of Basic Data Reordering Techniques
Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu
More informationSemantic Optimization of Preference Queries
Semantic Optimization of Preference Queries Jan Chomicki University at Buffalo http://www.cse.buffalo.edu/ chomicki 1 Querying with Preferences Find the best answers to a query, instead of all the answers.
More informationActive Blocking Scheme Learning for Entity Resolution
Active Blocking Scheme Learning for Entity Resolution Jingyu Shao and Qing Wang Research School of Computer Science, Australian National University {jingyu.shao,qing.wang}@anu.edu.au Abstract. Blocking
More informationA synthetic query-aware database generator
A synthetic query-aware database generator Anonymous Department of Computer Science Golisano College of Computing and Information Sciences Rochester, NY 14586 Abstract In database applications and DBMS
More informationCS 512, Spring 2017: Take-Home End-of-Term Examination
CS 512, Spring 2017: Take-Home End-of-Term Examination Out: Tuesday, 9 May 2017, 12:00 noon Due: Wednesday, 10 May 2017, by 11:59 am Turn in your solutions electronically, as a single PDF file, by placing
More informationExploring a Few Good Tuples From Text Databases
Exploring a Few Good Tuples From Text Databases Alpa Jain, Divesh Srivastava Columbia University, AT&T Labs-Research Abstract Information extraction from text databases is a useful paradigm to populate
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationSituation Calculus and YAGI
Situation Calculus and YAGI Institute for Software Technology 1 Progression another solution to the projection problem does a sentence hold for a future situation used for automated reasoning and planning
More informationSPARK: Top-k Keyword Query in Relational Database
SPARK: Top-k Keyword Query in Relational Database Wei Wang University of New South Wales Australia 20/03/2007 1 Outline Demo & Introduction Ranking Query Evaluation Conclusions 20/03/2007 2 Demo 20/03/2007
More informationKeyword Join: Realizing Keyword Search in P2P-based Database Systems
Keyword Join: Realizing Keyword Search in P2P-based Database Systems Bei Yu, Ling Liu 2, Beng Chin Ooi 3 and Kian-Lee Tan 3 Singapore-MIT Alliance 2 Georgia Institute of Technology, 3 National University
More informationEfficient Incremental Mining of Top-K Frequent Closed Itemsets
Efficient Incremental Mining of Top- Frequent Closed Itemsets Andrea Pietracaprina and Fabio Vandin Dipartimento di Ingegneria dell Informazione, Università di Padova, Via Gradenigo 6/B, 35131, Padova,
More informationParallel Query Processing and Edge Ranking of Graphs
Parallel Query Processing and Edge Ranking of Graphs Dariusz Dereniowski, Marek Kubale Department of Algorithms and System Modeling, Gdańsk University of Technology, Poland, {deren,kubale}@eti.pg.gda.pl
More informationTowards a Logical Reconstruction of Relational Database Theory
Towards a Logical Reconstruction of Relational Database Theory On Conceptual Modelling, Lecture Notes in Computer Science. 1984 Raymond Reiter Summary by C. Rey November 27, 2008-1 / 63 Foreword DB: 2
More informationData Integration: Logic Query Languages
Data Integration: Logic Query Languages Jan Chomicki University at Buffalo Datalog Datalog A logic language Datalog programs consist of logical facts and rules Datalog is a subset of Prolog (no data structures)
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationKeyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan
Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation
More informationHandout 9: Imperative Programs and State
06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative
More informationTowards Incremental Grounding in Tuffy
Towards Incremental Grounding in Tuffy Wentao Wu, Junming Sui, Ye Liu University of Wisconsin-Madison ABSTRACT Markov Logic Networks (MLN) have become a powerful framework in logical and statistical modeling.
More informationExact and Approximate Generic Multi-criteria Top-k Query Processing
Exact and Approximate Generic Multi-criteria Top-k Query Processing Mehdi Badr, Dan Vodislav To cite this version: Mehdi Badr, Dan Vodislav. Exact and Approximate Generic Multi-criteria Top-k Query Processing.
More informationContent Based Cross-Site Mining Web Data Records
Content Based Cross-Site Mining Web Data Records Jebeh Kawah, Faisal Razzaq, Enzhou Wang Mentor: Shui-Lung Chuang Project #7 Data Record Extraction 1. Introduction Current web data record extraction methods
More informationEvaluating Top-k Queries Over Web-Accessible Databases
Evaluating Top-k Queries Over Web-Accessible Databases AMÉLIE MARIAN Columbia University, New York NICOLAS BRUNO Microsoft Research, Redmond, Washington and LUIS GRAVANO Columbia University, New York A
More informationNotes for Chapter 12 Logic Programming. The AI War Basic Concepts of Logic Programming Prolog Review questions
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions The AI War How machines should learn: inductive or deductive? Deductive: Expert => rules =>
More informationOverview of Query Evaluation. Chapter 12
Overview of Query Evaluation Chapter 12 1 Outline Query Optimization Overview Algorithm for Relational Operations 2 Overview of Query Evaluation DBMS keeps descriptive data in system catalogs. SQL queries
More informationImage retrieval based on bag of images
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong
More informationEstimating the Quality of Databases
Estimating the Quality of Databases Ami Motro Igor Rakov George Mason University May 1998 1 Outline: 1. Introduction 2. Simple quality estimation 3. Refined quality estimation 4. Computing the quality
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY
ALGEBRAIC METHODS IN LOGIC AND IN COMPUTER SCIENCE BANACH CENTER PUBLICATIONS, VOLUME 28 INSTITUTE OF MATHEMATICS POLISH ACADEMY OF SCIENCES WARSZAWA 1993 ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationProUD: Probabilistic Ranking in Uncertain Databases
Proc. 20th Int. Conf. on Scientific and Statistical Database Management (SSDBM'08), Hong Kong, China, 2008. ProUD: Probabilistic Ranking in Uncertain Databases Thomas Bernecker, Hans-Peter Kriegel, Matthias
More informationTHE RELATIONAL MODEL. University of Waterloo
THE RELATIONAL MODEL 1-1 List of Slides 1 2 The Relational Model 3 Relations and Databases 4 Example 5 Another Example 6 What does it mean? 7 Example Database 8 What can we do with it? 9 Variables and
More informationDiscovering Periodic Patterns in Database Audit Trails
Vol.29 (DTA 2013), pp.365-371 http://dx.doi.org/10.14257/astl.2013.29.76 Discovering Periodic Patterns in Database Audit Trails Marcin Zimniak 1, Janusz R. Getta 2, and Wolfgang Benn 1 1 Faculty of Computer
More informationQUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose
QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,
More informationTwo hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 16th January 2014 Time: 09:45-11:45. Please answer BOTH Questions
Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Advanced Database Management Systems Date: Thursday 16th January 2014 Time: 09:45-11:45 Please answer BOTH Questions This is a CLOSED book
More informationAn Evolutionary Algorithm for the Multi-objective Shortest Path Problem
An Evolutionary Algorithm for the Multi-objective Shortest Path Problem Fangguo He Huan Qi Qiong Fan Institute of Systems Engineering, Huazhong University of Science & Technology, Wuhan 430074, P. R. China
More informationQuestion Score Points Out Of 25
University of Texas at Austin 6 May 2005 Department of Computer Science Theory in Programming Practice, Spring 2005 Test #3 Instructions. This is a 50-minute test. No electronic devices (including calculators)
More informationarxiv: v2 [cs.cc] 29 Mar 2010
On a variant of Monotone NAE-3SAT and the Triangle-Free Cut problem. arxiv:1003.3704v2 [cs.cc] 29 Mar 2010 Peiyush Jain, Microsoft Corporation. June 28, 2018 Abstract In this paper we define a restricted
More informationDatabase Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler
Database Theory Database Theory VU 181.140, SS 2018 1. Introduction: Relational Query Languages Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 6 March,
More informationModule 9: Selectivity Estimation
Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock
More informationAn Attribute-Based Access Matrix Model
An Attribute-Based Access Matrix Model Xinwen Zhang Lab for Information Security Technology George Mason University xzhang6@gmu.edu Yingjiu Li School of Information Systems Singapore Management University
More informationAnswering Aggregation Queries on Hierarchical Web Sites Using Adaptive Sampling (Technical Report, UCI ICS, August, 2005)
Answering Aggregation Queries on Hierarchical Web Sites Using Adaptive Sampling (Technical Report, UCI ICS, August, 2005) Foto N. Afrati Computer Science Division NTUA, Athens, Greece afrati@softlab.ece.ntua.gr
More informationRefinement Types as Proof Irrelevance. William Lovas with Frank Pfenning
Refinement Types as Proof Irrelevance William Lovas with Frank Pfenning Overview Refinement types sharpen existing type systems without complicating their metatheory Subset interpretation soundly and completely
More informationPathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data
PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg
More informationAlgorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)
Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two
More informationFountain Codes Based on Zigzag Decodable Coding
Fountain Codes Based on Zigzag Decodable Coding Takayuki Nozaki Kanagawa University, JAPAN Email: nozaki@kanagawa-u.ac.jp Abstract Fountain codes based on non-binary low-density parity-check (LDPC) codes
More informationKeywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.
Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Advanced Preferred
More informationCompression of the Stream Array Data Structure
Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In
More informationOptimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-based Approach
UIUC Technical Report UIUCDCS-R-03-2324, UILU-ENG-03-1711. March 03 (Revised March 04) Optimizing Access Cost for Top-k Queries over Web Sources A Unified Cost-based Approach Seung-won Hwang and Kevin
More informationRandom Permutations, Random Sudoku Matrices and Randomized Algorithms
Random Permutations, Random Sudoku Matrices and Randomized Algorithms arxiv:1312.0192v1 [math.co] 1 Dec 2013 Krasimir Yordzhev Faculty of Mathematics and Natural Sciences South-West University, Blagoevgrad,
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationCSC Discrete Math I, Spring Sets
CSC 125 - Discrete Math I, Spring 2017 Sets Sets A set is well-defined, unordered collection of objects The objects in a set are called the elements, or members, of the set A set is said to contain its
More informationALGORITHMIC DECIDABILITY OF COMPUTER PROGRAM-FUNCTIONS LANGUAGE PROPERTIES. Nikolay Kosovskiy
International Journal Information Theories and Applications, Vol. 20, Number 2, 2013 131 ALGORITHMIC DECIDABILITY OF COMPUTER PROGRAM-FUNCTIONS LANGUAGE PROPERTIES Nikolay Kosovskiy Abstract: A mathematical
More informationEvaluation of relational operations
Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book
More informationChapter 2 & 3: Representations & Reasoning Systems (2.2)
Chapter 2 & 3: A Representation & Reasoning System & Using Definite Knowledge Representations & Reasoning Systems (RRS) (2.2) Simplifying Assumptions of the Initial RRS (2.3) Datalog (2.4) Semantics (2.5)
More informationarxiv: v3 [cs.db] 20 Feb 2018
Variance-Optimal Offline and Streaming Stratified Random Sampling Trong Duc Nguyen 1 Ming-Hung Shih 1 Divesh Srivastava 2 Srikanta Tirthapura 1 Bojian Xu 3 arxiv:1801.09039v3 [cs.db] 20 Feb 2018 1 Iowa
More informationTree Interpolation in Vampire
Tree Interpolation in Vampire Régis Blanc 1, Ashutosh Gupta 2, Laura Kovács 3, and Bernhard Kragl 4 1 EPFL 2 IST Austria 3 Chalmers 4 TU Vienna Abstract. We describe new extensions of the Vampire theorem
More informationTyped Lambda Calculus
Department of Linguistics Ohio State University Sept. 8, 2016 The Two Sides of A typed lambda calculus (TLC) can be viewed in two complementary ways: model-theoretically, as a system of notation for functions
More informationRank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML
Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 253 Rank-aware XML Data Model and Algebra: Towards Unifying Exact
More informationInducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm
Inducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm I. Bruha and F. Franek Dept of Computing & Software, McMaster University Hamilton, Ont., Canada, L8S4K1 Email:
More information