A NOVEL METHOD FOR THE EVALUATION OF BOOLEAN QUERY EFFECTIVENESS ACROSS A WIDE OPERATIONAL RANGE

Size: px
Start display at page:

Download "A NOVEL METHOD FOR THE EVALUATION OF BOOLEAN QUERY EFFECTIVENESS ACROSS A WIDE OPERATIONAL RANGE"

Transcription

1 A NOVEL METHOD FOR THE EVALUATION OF BOOLEAN QUERY EFFECTIVENESS ACROSS A WIDE OPERATIONAL RANGE Eero Sormunen Department of Information Studies, University of Tampere P.O. Box 607, FIN 330 Tampere, Finland Tel Mail eero.sormunen@uta.fi ABSTRACT Traditional methods for the system-oriented evaluation of Boolean IR systems suffer from validity and reliability problems. Laboratory-based research neglects the searcher and studies suboptimal queries. Research on operational systems fails to make a distinction between searcher performance and system performance. This approach is neither capable of measuring performance at standard points of operation (e.g. across R0.0-R.0). A new laboratory-based evaluation method for Boolean IR systems is proposed. It is based on a controlled formulation of inclusive query plans, on an automatic conversion of query plans into elementary queries, and on combining elementary queries into optimal queries at standard points of operation. Major results of a large case experiment are reported. The validity, reliability, and efficiency of the method are considered in the light of empirical and analytical test data. Keywords evaluation (general), structured queries, testing methodology, test collections. INTRODUCTION The mainstream of the evaluative IR research has followed the Cranfield paradigm. The major focus has been on the best match IR models, see e.g. [2, 23]. The low interest in studying the Boolean IR model can be seen in the low volume of research output (see e.g. [8] and other TREC reports), and also in the slow development of system-oriented evaluation methods for the Boolean IR model. Research on operational systems has focused on Boolean IR systems but the contribution on the development of methods has been very slight [3, 28]. Research within the Cranfield paradigm has shared a very critical attitude towards the Boolean IR model [7]. The studies of Salton [2] and Turtle [30] are examples of attempts to show empirically the overall superiority of the best match IR models over the Boolean IR model. The results of some recent comparisons, have suggested that studying the overall superiority of one model over the other may be a naive approach [, 20]. Boolean queries seem to perform better in some situations, and best match queries in other situations. It may be more reasonable to focus on studying performance of different IR models under changing operational constraints. New methods are needed to draw a more detailed picture of query effectiveness in different IR models.. Methodological Problems in Boolean IR Experiments The Boolean IR model has three features that cause methodological problems for experimental research [3]:. The formulation of Boolean queries requires a trained person to translate the user request into a query. 2. The searcher has very little control over the size of the output produced by a particular query. 3. The Boolean IR model does not support ranking of documents in order of decreasing probability of relevance. The necessity to use a human expert in query formulation is a potential source of validity and reliability problems. It is very difficult to separate the effects of a technical IR system from those of a human searcher. For instance, in the well known STAIRS study, the searchers had a predefined goal to locate at least 75 per cent of all relevant documents. It turned out that only less than 20 per cent of relevant documents were found. On the other hand, the average precision of the test queries was as high as 79 per cent [3]. The searchers were obviously formulating high-precision queries although they were asked to work towards high recall. The latter two features (no ranking, little control over the output size) of the Boolean IR model cause problems in measuring the performance at the standard point of operation (SPO, e.g. at fixed recall levels or document cut-off values). Typically, only one query (from an arbitrary operational level) is formulated per search request. Performance is measured using single recall/precision values. and precision are averaged separately over all requests. As Lancaster has shown [5], the distribution of recall and precision values for a large set of requests is very wide. It is very difficult see how the averaged recall and precision values should or could be interpreted, since averaging mixes queries from different operational levels. The coordination level method developed for the Cranfield 2 project, is a traditional approach to omit the trained searcher from the query formulation, to rank output, and to measure the wide range performance of a Boolean system [5]. Unfortunately, replacing the cognitive effort of a searcher by a mechanical query term selection procedure leads to a

2 Facet A [Information retrieval] Facet B [Search process] (information retrieval OR online systems OR online(w)search?) AND (tactic? OR heuristic? OR trial(w)error OR expert systems OR artificial intelligence OR attitudes/de OR behavior?/de,id,ti OR cognitive/de) Figure. An example of a high recall Oriented query used by Harter [0] to illustrate the facet based query planning approach. fundamental validity problem. Queries exploit the Boolean IR model in a suboptimal way..2 Harter s Idea: the Most Rational Path Harter [0] introduced an idea for an evaluation method based on the notion of elementary queries (EQ). Harter used a single search topic to illustrate how the method could be applied. He designed a high recall oriented query plan (see Fig ). Harter applied the building block search strategy which quite commonly used by professional searchers [6, 9, 2, 6]. The major steps of the building blocks strategy are ) Identify major facets and their logical relationships with one another. 2) Identify query terms that represent each facet: words, phrases, etc. 3) Combine the query terms of a facet by disjunction (OR operation). 4) Combine the facets by conjunction or negation (AND or ANDNOT operation) [9]. The notion of facet is important in query planning. It is a concept that is identified from, and defines one exclusive aspect of a search topic. In step 2, a typical goal is to discover all plausible query terms appropriate in representing the selected facet. Next, Harter retrieved all documents matching the conjunction of facets A and B represented by the disjunction of all selected query terms, and assessed the relevance of resulting 37 documents. In addition, all conjunctions of two query terms (called elementary queries) from the query plan representing facets A and B in Fig. were composed and executed. A sample from the 24 elementary queries and the summary of their retrieval results are presented in Table. Harter [0] demonstrated the procedure of constructing optimal queries (called the most rational path). An estimate for maximum precision across the whole relative recall range was determined by applying a simple incremental algorithm:. To create the initial optimal query, choose the EQ that achieves the highest precision. Eq # Elementary queries # of Docs s information retrieval AND tactic? 8 s2 information retrieval AND heuristic? 7 s3 information retrieval AND trial(w)error s22 online(w)search? AND attitudes/de 9 s23 online(w)search? AND behavior?/de,id,ti 8 s24 online(w)search? AND cognitive/de 0 s25 s-s24/or 37 # of Rel Docs Precision 2. Create in turn the disjunction of each of the remaining EQs with the current optimal query. Select the disjunction with the EQ that maximizes precision. The disjunction of the current optimal query and the selected EQ creates a new optimal query. 3. Repeat step 2 until all elementary queries have been exhausted. Precision and recall values for the 24 elementary queries and the respective curve for the optimal queries are presented in Fig 2. Harter never reported full-scale evaluation results based on the idea of the most rational path except this single example. He did neither develop operational guidelines for a fluent use of the method in practice Table. Retrieval results for the 24 elementary queries in the case search by Harter (990). Precision,00 0,90 0,80 0,70 0,60 0,50 0,40 0,30 0,20 0,0 0,00 s3 s3 or s8 s3 or s8 or s24 s s2 0,00 0,0 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90,00 Most rational path Elementary queries Figure 2. and precision of the 24 elementary queries and the most rational path in the case search presented by Harter [0]. Actually Harter talked about elementary postings sets. This is very confusing since it applies set-based terminology to address queries as logical statements.

3 .3 Research Goals The main goal of the study was to create an evaluation method for measuring performance of Boolean queries across a wide operational range by elaborating the ideas introduced by Harter [0]. The method is presented and argued using the framework suggested by Newell M={domain, procedure, justification} [9]:. The domain of the method specifies the appropriate application area for the method. 2. The procedure of the method consists of the ordered set of operations required in the proper use of the method. Especially, two major operations unique to the procedure need to be elaborated: a) Query formulation. How the set of elementary queries is composed from a search topic? b) Query optimization. What algorithm should be used for combining the elementary queries to find the optimal query for different operational levels? 3. The justification of the method. The appropriateness, validity, reliability and efficiency of the method within the specified domain must be justified. The structure of this paper is the following: First, some basic concepts and the procedure of the method are introduced. Second, a case experiment is briefly reported to illustrate the domain and the use of the proposed method in a concrete experimental setting. Third, the other justification issues of the method: validity, reliability and efficiency are discussed. Several empirical tests were carried out to assess the potential validity and reliability problems in applying the method. 2. OUTLINE FOR THE METHOD The aim of this section is to introduce a sound theoretical framework for the procedure of the method and to formulate operational guidelines for exercising it. 2. Query Structures and Query Tuning Spaces IR models address the issue of comparing a query as a representation of a request for information with representations of texts. The Boolean IR model supports rich query structures, a (simple) binary representation of texts, and an exact match technique for comparing queries and text representations [2]. A Boolean query consists of query terms and operators. Query terms are usually words, phrases, or other character strings typical of natural language texts. The Boolean query structures are based on three logic connectives conjunction ( ), disjunction ( ), negation ( ), and on the use of parentheses. A query expresses the combination of terms that retrieved documents have to contain. If we want to generate all possible Boolean queries for a particular request, we have to identify all query terms that might be useful, and to generate all logically reasonable query structures. Facet, as defined in section.2, is a very useful notion in representing relationships between Boolean query structures and the search topic. Terms within a facet are naturally combined by disjunctions. Facets themselves present the exclusive aspects of desired documents, and are naturally combined by Boolean conjunction or negation. [9]. Expert searchers tend to formulate query plans applying the notion of facet [9, 6]. Resulting query plans are usually in a standard form, the conjunctive normal form (CNF) (for a formal definition, see []). The structure of a Boolean query can be easily characterized in CNF queries: Query exhaustivity (Exh) is the number of facets that are exploited. Query extent (QE) characterizes the broadness of a query, and can be measured, e.g. as the average number of query terms per facet. For instance, in the query plan designed by Harter Exh=2 and QE=5.5 (see Fig. ). The changes made in query exhaustivity and extent to achieve appropriate retrieval goals are called here query tuning. The range within which query exhaustivity and query extent can change sets the boundaries for query tuning. The set of all elementary queries and their feasible combinations composed at all available exhaustivity and extent levels form the query tuning space. In the example by Harter (Fig ), seven different disjunctions of query terms can be generated from facet A (=2 3 -) and 255 from facet B (=2 8 -). The total number of possible EQ combinations is then 7 x 255 =,785 at Exh= 2. In addition, 7 and 255 EQ combinations can be formed at Exh= from facets A and B, respectively. Thus, the total number of EQ combinations creating the query tuning space across exhaustivity levels and 2 for the sample query plan is 2, The Procedure of the Method The procedure of the proposed method consists of eight operations at three stages: STAGE I. INCLUSIVE QUERY PLANNING. Design inclusive query plans. Experienced searchers formulate inclusive query plans for each given search topic. It yields a comprehensive representation of the query tuning space available for a search topic. 2. Execute extensive queries. The goal of extensive queries is to gain reliable recall base estimates. 3. Determine the order of facets. The facet order of inclusive query plans is determined by ranking the facets according to their measured recall power, i.e. their capability to retrieve relevant documents. STAGE II. QUERY OPTIMISATION 4. Generate the set of elementary queries (EQ). Inclusive query plans in the conjunctive normal form (CNF) at different exhaustivity levels are transformed into the disjunctive normal form (DNF) where the elementary conjunctions create the set of elementary queries. All elementary queries are executed to find the set of relevant and non-relevant documents associated with each EQ. 5. Select standard points of operation (SPO). Both fixed recall levels R0.,,R.0 and fixed document cut-off values, e.g. DCV2, DCV5,,DCV500 may be used as SPOs. 6. Optimization of queries. An optimisation algorithm is used to compose the combinations of EQs performing optimally at each selected SPO. STAGE III. EVALUATION OF RESULTS 7. Measure precision at each SPO. Precision can be used as a performance measure. Precision is averaged over all search topics at each SPO.

4 8. Analyse the characteristics of optimal queries. The optimal queries are analysed to explain the changes in the performance of an IR system. The above steps describe the ordered set of operations constituting the procedure of the proposed method. Inclusive query planning (steps -3) and the search for the optimal set of elementary queries (steps 4-6), are in the focus of this study..3 Inclusive Query Planning The techniques of query planning are routinely taught to novice searchers [9, 6]. A common feature in different query planning techniques is that they emphasize the analysis and identification of searchable facets, and the representation of each facet as an exhaustive disjunction of query terms. The goal of inclusive query planning is similar, but the thoroughness of identification task is stressed even more. In inclusive query planning, the goal is to identify. all searchable facets of a search topic, and 2. all plausible query terms for each facet. A major doubt in using human experts to design queries is probably associated with the reliability of experimental designs. For instance, the average inter-searcher overlap in selection of query terms (measured character-by-character) is usually around 30 per cent [25]. Fortunately, the situation is not so bad when facets are considered. For instance, in a study by Iivonen [2], the average concept-consistency rose up to 88 per cent, and experienced searchers were even more consistent. This indicates that expert searchers are able to identify the facets of a topic consistently although the overlap of queries at string level may be low. The identification of all plausible query terms for each identified facet is another task requiring searching expertise. Basically, the comprehensiveness of facet representations is mostly a question of how much effort are used to identify potential query terms. The query designer is freed from the needs to make compromised query term selections typical of practical search situations. The optimization operation will automatically reject ill-behaving query terms. The process can be improved by appropriate tools (dictionaries, thesauri, browsing tools for database indexes, etc.). The final step is to decide the order of facets in the query plan. In the case of a laboratory test collection, full relevance data (or at least its justified estimate) is available. The facets of an inclusive query plan can be ranked in the descending order of recall. The disjunction of all query terms identified for a facet is used to measure recall values..4 Search for the Optimal Set of EQs The size of the query tuning space increases exponentially as a function of the number of EQs. We are obviously facing the risk of combinatorial explosion since we do not know the upper limit of query exhaustivity and, especially, query extent in inclusive query plans. Solving the optimization problem by blind search algorithms could lead to unmanageably long running times. The search for the optimal set of EQs is a NPhard problem. Harter [0] introduced a simple heuristic algorithm but he did not define it formally. Query optimization resembles a traditional integer programming case called the Knapsack Problem. The problem is to fill a container with a set of items so that the value of the cargo is maximized, and the weight limit for the cargo is not exceeded [4]. The special case where each item is selected once only (like EQs), is called the 0- Knapsack Problem. Efficient approximation algorithms have been developed to find a feasible lower bound for the optimum [7]. The problem of finding the optimal query from the query tuning space can be formally defined by applying the definitions of the 0- Knapsack Problem as follows: Select a set of EQs so as to maximise z = subject to and DCV n rixi i= n n ixi i= DCV, if eqi is selected where xi = 0, otherwise ri = no of relevant documents retrieved by eqi ni = no of documents retrieved by eqi j = selected document cut j - off value The above definition of the optimization problem is in its maximization version. The number of relevant documents is maximized while the total number of retrieved documents is restricted by the given DCV j. In the minimization version of the problem, the goal is to minimize the total number of documents while requiring that the number of relevant documents exceeds some minimum value (a fixed recall level). Unfortunately, standard algorithms designed for physical objects would not work properly with EQs. Different EQs tend to overlap and retrieve at least some joint documents. This means that, in a disjunction of elementary queries, the profit r i and the weight n i of the elementary query eq i have dynamically changing effective values that depend on the EQs selected earlier. The effect of overlap in a combination of several query sets is hard to predict. A simple heuristic procedure for an incremental construction of the optimal queries was designed applying the notion of efficiency list [7]. The maximization version of the algorithm contains seven steps: Remove all elementary queries eq i a) retrieving more documents than the upper limit for the number of documents (i.e. n i > residual document cutoff value DCV', starting from DCV' = DCV j ) or b) retrieving no relevant documents (r i =0). 2. Stop, if no elementary queries eq i are available. 3. Calculate the efficiency list using precision values r i /n i for remaining m elementary queries and sort elementary queries in order of descending efficiency. In the case of equal values, use the number of relevant documents (r i ) retrieved as the second sorting criterion. 4. Move eq at the top of the efficiency list to the optimal query.

5 5. Remove all documents retrieved by eq from the result sets of remaining elementary queries eq 2,..., eq m. 6. Calculate the new value for free space DCV'. 7. Continue from step one. The basic algorithm favors narrowly formulated EQs retrieving a few relevant documents with high precision at the expense of broader queries retrieving many relevant documents with medium precision. The problem can be reduced by running the optimization in an alternative mode differing only in step four of the first iteration round: eq i retrieving the largest set of relevant documents is selected from the efficiency list instead of eq. The alternative mode is called the largest first optimization and the basic mode the precision first optimization. 3. A CASE EXPERIMENT The goal of the case experiment was to elucidate the potential uses of the proposed method, to clarify the types of research questions that can be effectively solved by the method, and to explicate the operational pragmatics of the method. 3. Research Questions The case experiment focused on the mechanism of falling effectiveness of Boolean queries in free-text searching of largefull-text databases. The work was inspired by the debate concerning the results of the STAIRS study [3, 22]. The goal was to draw a more detailed picture of system performance and optimal query structures in search situations typical of large databases. Assuming an ideally performing searcher, the main question was: What is the difference in maximum performance of Boolean queries between a small database and two types of large databases? The large & dense database contained a larger volume of documents than the small database but the density of relevant documents (generality) was the same. In the large & sparse database, both the volume of documents was higher and the density of relevant documents was lower than in the small database. Twelve hypotheses were formulated concerning effectiveness, exhaustivity and proportional query extent of queries in large databases. For details, see [26]. 3.2 Data and Methods 3.2. Optimization Algorithm The optimization algorithm described in Section 2.5 was programmed in C for Unix. Both a maximization version exploiting a standard set of document cut-off values (DCV 2, DCV 5,, DCV 500 ) and a minimization version exploiting fixed recall levels (R 0. R.0 ) were implemented. At each SPO, the iteration round (called optimization lap) was executed ten times starting each round by selecting a different top EQ from the efficiency list: five laps in the largest first mode, and five in the precision first mode. The alternative results at a particular SPO achieved by the algorithm in different optimization laps were sorted to find the most optimal queries for further analysis Test Collection The Finnish Full-Text Test Collection developed at the University of Tampere was used in the case experiment [4]. The test database contains about 54,000 newspaper articles from three Finnish newspapers. A set of 35 search topics are available including verbal topic descriptions and relevance assessments. The test database is implemented for the TRIP retrieval systems 2. The test database played the role of the large & dense database. Other databases, the small database and the large & sparse database, were created through sampling from EQ result sets. The large & sparse database was created by deleting about 80 % of the relevant documents, and the small database by deleting about 80 % of all documents of the EQ result sets. Thus, the EQ result sets for the small database contained the same relevant documents as those for the large & sparse database. Query optimization was done separately on these three EQ data sets Inclusive Query Plans The initial versions of inclusive query plans were designed by an experienced search analyst working for three months on the project. Query planning was an interactive process based on thorough test queries and on the use of vocabulary sources. Later parallel experiments (probabilistic queries) revealed that the initial query plans failed to retrieve some relevant documents. These documents were analyzed, and some new query terms were added to represent the facets comprehensively. The final inclusive query plans were capable to retrieve 270 (99,3 %) out of the 278 known relevant documents at exhaustivity level one. In total, inclusive query plans contained 34 facets. The average exhaustivity of query plans was 3.8 ranging from 2 to 5. The total number of query terms identified was 2,330 (67 per query plan and 8 per facet). The number of terms ranged from 23 to 69 per query plan, and from to 74 per facet. The wide variation in the number of query terms per facet characterizes the difference between specific concepts (e.g. named persons or organizations) and general concepts (e.g., domains or processes) Data Collection and Analysis Precision, query exhaustivity and query extent data were collected for the optimal queries at SPOs. The sensitivity of results to changes in search topic characteristics like the size of a recall base, the number of facets identified, etc. were analyzed. Also the searchable expressions referring to query plan facets were identified in all relevant documents of a sample of 8 test topics to find explanations for the observed performance differences. Statistical tests were applied to all major results. 3.3 Sample Results Figures 3-5 summarize the comparisons between the small, large & dense, and large & sparse databases: average precision, exhaustivity and proportional extent of optimal queries at recall levels R 0. -R.0. 3 The case experiment could reveal interesting performance characteristics of Boolean queries in large databases. The average precision across R 0. -R.0 was about 3 % lower in the 2 TRIP by TietoEnator, Inc. 3 Proportional query extent (PQE) was measured only for high recall and high precision searching because of research economical reasons. PQE is the share of query terms actually used of the available terms in inclusive query plans (average over facets).

6 Precision Exhaustivity,00 0,90 0,80 0,70 0,60 0,50 0,40 0,30 0,20 0,0 0,00 0,00 0,20 0,40 0,60 0,80,00 Figure 3. Average precision at fixed recall levels in optimal queries for small, large&dense and large&sparse databases. 5,0 4,0 3,0 2,0,0 Figure 4. Exhaustivity of high recall queries optimised for small, large&dense and large&sparse databases. Proportional query extent 0,0 0,00 0,20 0,40 0,60 0,80,00 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0, 0,0 0,0 0,20 0,30 0, ,00 Figure 5. Proportional query extent (PQE) of optimal queries in the small, large&dense, and large&sparse databases. Small db L&d db L&s db Small db L&d db L&s db Small db L&d db L&s db large & dense database (database size effect), and about 40 % lower in the large & sparse database (database size + density effect) than in the small database (see Fig 3). The average exhaustivity of optimal queries was higher in the large databases than in the small one, but the level of precision could not be maintained. Proportional query extent was highest in the large & dense database suggesting that more query terms are needed per facet when a larger number of documents have to be retrieved. The number of topics Figure 6. The number of search topics where full recall can be achieved as a function of query exhaustivity in the small and large recall bases (8 topics in total) (8 topics) 2 (8) 3 (7) 4 (2) 5 (5) Query exhaustivity A very interesting deviation was identified in the precision and exhaustivity curves at the highest recall levels. In the large & dense database, the precision and exhaustivity of optimal queries fell dramatically between R 0.9 and R.0. The results of the facet analysis of all relevant documents in a sample of 8 test topics clarified the role of the recall base size in falling effectiveness at R.0. The more documents need to be retrieved to achieve full recall, the more there occur relevant documents where some query plan facets are expressed implicitly. The results are presented in Fig 6. For Exh= full recall was possible in all but one test topic for both recall bases. At higher exhaustivity levels, the number of test topics where full recall is possible fell much faster in the large recall base. Above results are just examples from the case study findings to illustrate the potential uses of the proposed method. High precision searching was also studied by applying DCVs as standard points of operation. It turned out, for instance, that the database size alone does not induce efficiency problems at low DCVs. On the contrary, highest precision was achieved in the large & dense database. It was also shown that earlier results indicating the superiority of proximity operators over the AND operator in high precision searching are invalid. Queries optimized separately for both operators show similar average performance. For details, see [26]. 4. JUSTIFICATION OF THE METHOD Evaluation methods should themselves be evaluated in regard to appropriateness, validity, reliability, and efficiency [24, 29]. The appropriateness of a method was verified in the case study by showing that new results could be gained. Validity, reliability, and efficiency are more complex issues to evaluate. The main concerns were directed at the unique operations: inclusive query planning and query optimization. 4. Facet Selection Test Three subjects having good knowledge of text retrieval and indexing were asked to make a facet identification test using a sample of 4 test topics. The results showed that the exhaustivity of inclusive query plans used in the case experiment were not biased downwards (enough exhaustivity tuning space). The test also verified earlier results that the consistency in the selection of query facets is high between search experts. 4.2 Facet Representation Test The facet analysis of all relevant documents in the sample of 8 search topics showed that the original query designer had 8 Large recall base Small recall base 0

7 missed or neglected about one third of the available expressions in the relevant documents. However, the effect of missed query terms was regarded as marginal since their occurrences in documents mostly overlapped with other expressions already covered by the query plan. The effect was shown to be much smaller than the effect of implicit expressions. In the interactive query optimization test (see next section), precision was observed to drop less than 4 %. 4.3 Interactive Query Optimization Test The idea of the interactive query optimization test was to replace the automatic optimization operation by an expert searcher, and compare the achieved performance levels as well as query structures. A special WWW-based tool, the IR Game [27], designed for rapid analysis of query results was used in this test. When interfaced to a laboratory test collection, the tool offers immediate performance feedback at the level of individual queries in the form of recall-precision curves, and a visualization of actual query results. The searcher is able to study, in a convenient and effortless way, the effects of query changes. An experienced searcher was recruited to run the interactive query optimization test. A group of three control searchers were used to test the overall capability of the test searcher. The test searcher was working for a period of.5 months trying to find optimal queries for the sample of 8 test topics for which the full data of facet analysis was available. In practice, the test searcher did not face any time constraints. The results showed that the algorithm was performing better than or equally with the test searcher in 98 % out of the 98 test cases. This can be regarded as an advantageous result for a first version of a heuristic algorithm. 4.4 Efficiency of the Method The investment in inclusive query planning was justified to be reasonable in the context of a test collection. It was also shown that the growth of running time of the optimization algorithm can be characterized by O(n log n), and that it is manageable for all EQ sets of finite size. 5. CONCLUSIONS AND DISCUSSION The main goal of this study was to design, demonstrate and evaluate a new evaluation method for measuring the performance of Boolean queries across a wide operational range. Three unique characteristics of the method help to comprehend its potential:. Performance can be measured at any selected point across the whole operational range, and different standard points of operation (SPO) may be applied. 2. Queries under consideration estimate optimal performance at each SPO, and query structures are free to change within the defined query tuning space in search of the optimum. 3. The expertise of professional searchers could be brought into a system-oriented evaluation framework in a controlled way. The domain of the method can be characterized by illustrating the kinds of research variables that can be appropriately studied by applying the method. Query precision, exhaustivity and extent are used as dependent variables, and the standard points of operation as the control variable. Independent variables may relate to:. documents (e.g. type, length, degree of relevance) 2. databases (e.g. size, density) 3. database indexes (e.g. type of indexing, linguistic normalization of words) 4. search topics (e.g. complexity, broadness, type) 5. matching operations (e.g. different operators). The proposed method offers clear advantages over traditional evaluation methods. It helps to acquire new information about the phenomena observed and challenge present findings because it is more accurate (averaging at defined SPOs). The method is also economical in experiments where a complex query tuning space is studied. The query tuning space contains all potential candidates for optimal queries, but data are collected only on those queries that turn out to be optimal at a particular SPO. The proposed method yielded two major innovations: inclusive query planning, and query optimization. The former innovation is more universal since it can be used both in Boolean as well as in best match experiments, see [4]. The query optimization operation in the proposed form is restricted to the Boolean IR model since it presumes that the query results are distinct sets. The inclusive query planning idea is easier to exploit since its outcome, the representation of the available query tuning space, can also be exploited in experiments on best-match IR systems. Traditional test collections were provided with complete relevance data. Inclusive query plans are a similar data set that can be used in measuring ultimate performance limits of different matching algorithms. Inclusive query plans help also in categorizing test topics according to their properties, e.g. complex vs. simple (exhaustivity tuning dimension), and broad vs. narrow (extent tuning dimension). This opens a way to create experimental settings that are more sensitive to situational factors, the issue that has been raised in the Boolean/best-match comparisons [, 20]. 6. ACKNOWLEDGMENTS I am grateful to my supervisor Kalervo Järvelin, and to the FIRE group: Heikki Keskustalo, Jaana Kekäläinen, and others. 7. REFERENCES [] Arnold, B.H. (962). Logic and Boolean algebra. Eaglewood Cliffs: Prentice-Hall. [2] Belkin, N.J. & Croft, W.B. (987). Retrieval Techniques. In: Williams, M.E., Annual Review of Information Science and Technology 22(), 09-45, New York: Elsevier & ASIS. [3] Blair, D.C. & Maron, M.E. (985). An evaluation of retrieval effectiveness for a full-text document retrieval system. Comm. of the ACM (28)3, [4] Chvátal, V. (983). Linear Programming. New York: W.H. Freeman. [5] Cleverdon, C.W. (967). The Cranfield tests on index language devices. Aslib Proceedings 9(6),

8 [6] Fidel, R. (99). Searcher s Selection of Search Keys. Journal of the American Society for Information Science 42(7), , 50-54, [7] Frants, V.I., Shapiro, J., et al. (999). Boolean Search: Current State and Perspectives. Journal of the American Society for Information Science 50(), [8] Harman, D. (993). The First Text Retrieval Conference (TREC-). Gaithersburg: National Institute of Standards and Technology. (NIST Spec. Publ ). [9] Harter, S.P. (986). Online Information retrieval. Orlando: Academic Press. [0] Harter, S.P. (990). Search Term Combinations and Retrieval Overlap: A Proposed Methodology and Case Study. Journal of the American Society for Information Science 4(2), [] Hersh, W.R. & Hickam, D.H. (995). An Evaluation of Interactive Boolean and Natural Language Searching with Online Medical Textbook. Journal of the American Society for Information Science 48(7), [2] Iivonen, M. (995). Consistency in the selection of search concepts and search terms. Information Processing & Management 3(2), [3] Ingwersen, P. & Willett, P. (995). An Introduction to Algorithmic and Cognitive Approaches for Information Retrieval. Libri 45(), [4] Järvelin, K., Kristensen, J., et al. (996). A Deductive Data Model for Query Expansion. In: Proceedings of the 9th International ACM SIGIR Conference, Zürich, Switzerland, August 8-22, 996. [5] Lancaster, F.W. (968). Information Retrieval Systems: Characteristics, Testing, and Evaluation. New York: John Wiley. [6] Lancaster, F.W. & Warner, A.J. (993). Information Retrieval Today. Arlington: Information Resources Press. [7] Martello, S. & Toth, P. (990). Knapsack Problems. Algorithms and Computer Implementations. Guildford: John Wiley & Sons. [8] McKinin, E.J., Sievert, M.E., et al. (99). The Medline Full-Text Project. Journal of the American Society for Information Science 42(4), [9] Newell, A. (968). Heuristic programming: Ill-structured problems. In: Arofonsky, J. (Ed.). Progress in Operations Research, Vol III, New York. [20] Paris, L.A.H. & Tibbo, H.R. (998). Freestyle vs. Boolean: A comparison of partial and exact match retrieval systems. Information Processing & Management 34(2/3), [2] Salton, G. (972). A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART). Journal of the American Society for Information Science 23(March-April), [22] Salton, G. (986). Another look at automatic textretrieval systems. Communications of the ACM 29(7), [23] Salton, G. & McGill, M.J. (983). Introduction to Modern Information Retrieval. Singapore: McGraw-Hill. [24] Saracevic, T. (995). Evaluation of evaluation in information retrieval. In: Fox, E.A. et al. (Eds.), SIGIR 95 - Proceedings of the 8th Annual International ACM SIGIR Conference. Washington July 9-3, 995, p [25] Saracevic, T., Kantor. P. et al. (988). A Study of Information Seeking and Retrieving. Journal of the American Society for Information Science 39(3), pp. 6-76, 77-96, and [26] Sormunen, E. (2000). A Method for measuring Wide Range Performance of Boolean Queries in Full-Text Databases. Doctoral Thesis. Tampere: University of Tampere. Acta Electronica Universitatis Tamperensis, ISBN: , 23 p. URL: [27] Sormunen, E., Laaksonen, J., et al. (998). The IR Game - A Tool for Rapid Query Analysis in Cross-Language IR Experiments. PRICAI '98 Workshop on Cross Language Issues in Artificial Intelligence. Singapore, Nov 22-24, 998, p [28] Sparck-Jones, K. (98). Information retrieval experiment. London: Butterworths. [29] Tague-Sutcliffe, J. (992). The pragmatics of information retrieval experimentation, revisited. Information Processing & Management 28(4), [30] Turtle, H. (994). Natural Language vs. Boolean Query Evaluation: A Comparison of Retrieval Performance. In: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference. London: Springer-Verlag. p

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES

EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES DEFINING SEARCH SUCCESS: EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES by Barbara M. Wildemuth Associate Professor, School of Information and Library Science University of North Carolina at Chapel

More information

Speed and Accuracy using Four Boolean Query Systems

Speed and Accuracy using Four Boolean Query Systems From:MAICS-99 Proceedings. Copyright 1999, AAAI (www.aaai.org). All rights reserved. Speed and Accuracy using Four Boolean Query Systems Michael Chui Computer Science Department and Cognitive Science Program

More information

Inter and Intra-Document Contexts Applied in Polyrepresentation

Inter and Intra-Document Contexts Applied in Polyrepresentation Inter and Intra-Document Contexts Applied in Polyrepresentation Mette Skov, Birger Larsen and Peter Ingwersen Department of Information Studies, Royal School of Library and Information Science Birketinget

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ

ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ - 1 - ITERATIVE SEARCHING IN AN ONLINE DATABASE Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ 07962-1910 ABSTRACT An experiment examined how people use

More information

Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter

Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter university of copenhagen Københavns Universitet Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter Published in: Advances

More information

OPTIMIZATION, OPTIMAL DESIGN AND DE NOVO PROGRAMMING: DISCUSSION NOTES

OPTIMIZATION, OPTIMAL DESIGN AND DE NOVO PROGRAMMING: DISCUSSION NOTES OPTIMIZATION, OPTIMAL DESIGN AND DE NOVO PROGRAMMING: DISCUSSION NOTES MILAN ZELENY Introduction Fordham University, New York, USA mzeleny@fordham.edu Many older texts, with titles like Globally Optimal

More information

Web document summarisation: a task-oriented evaluation

Web document summarisation: a task-oriented evaluation Web document summarisation: a task-oriented evaluation Ryen White whiter@dcs.gla.ac.uk Ian Ruthven igr@dcs.gla.ac.uk Joemon M. Jose jj@dcs.gla.ac.uk Abstract In this paper we present a query-biased summarisation

More information

Information Retrieval Tools for Efficient Data Searching using Big Data

Information Retrieval Tools for Efficient Data Searching using Big Data ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 4 Issue 4; July-August-2017; Page No. 06-12 Information Retrieval Tools

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Tilburg University Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Publication date: 2006 Link to publication Citation for published

More information

When Information Retrieval Measures Agree About the Relative Quality of Document Rankings

When Information Retrieval Measures Agree About the Relative Quality of Document Rankings When Information Retrieval Measures Agree About the Relative Quality of Document Rankings Robert M. Losee SILS, Manning Hall, CB#3360, University of North Carolina-Chapel Hill, Chapel Hill, NC 27599-3360.

More information

Document Structure Analysis in Associative Patent Retrieval

Document Structure Analysis in Associative Patent Retrieval Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,

More information

The impact of query structure and query expansion on retrieval performance

The impact of query structure and query expansion on retrieval performance The impact of query structure and query expansion on retrieval performance Jaana Kekäläinen & Kalervo Järvelin Department of Information Studies University of Tampere Published in Croft, W.B. & Moffat,

More information

Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation

Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation Praveen Pathak Michael Gordon Weiguo Fan Purdue University University of Michigan pathakp@mgmt.purdue.edu mdgordon@umich.edu

More information

The Design Space of Software Development Methodologies

The Design Space of Software Development Methodologies The Design Space of Software Development Methodologies Kadie Clancy, CS2310 Term Project I. INTRODUCTION The success of a software development project depends on the underlying framework used to plan and

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

Robust Relevance-Based Language Models

Robust Relevance-Based Language Models Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new

More information

INFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model. Final Group Projects

INFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model. Final Group Projects INFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model Peter Brusilovsky http://www2.sis.pitt.edu/~peterb/2140-051/ Final Group Projects Groups of variable

More information

Evaluating the effectiveness of content-oriented XML retrieval

Evaluating the effectiveness of content-oriented XML retrieval Evaluating the effectiveness of content-oriented XML retrieval Norbert Gövert University of Dortmund Norbert Fuhr University of Duisburg-Essen Gabriella Kazai Queen Mary University of London Mounia Lalmas

More information

Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions

Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions Preprint from: Järvelin, K. & Price, S. & Delcambre, L. & Nielsen, M. (2008). Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions. In: Ruthven, I. & al. (Eds.), Proc. of the 30th European

More information

A World Wide Web-based HCI-library Designed for Interaction Studies

A World Wide Web-based HCI-library Designed for Interaction Studies A World Wide Web-based HCI-library Designed for Interaction Studies Ketil Perstrup, Erik Frøkjær, Maria Konstantinovitz, Thorbjørn Konstantinovitz, Flemming S. Sørensen, Jytte Varming Department of Computing,

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces A user study on features supporting subjective relevance for information retrieval interfaces Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S. (2006). Proc. 9th International Conference of Asian Digital Libraries

More information

Modeling Systems Using Design Patterns

Modeling Systems Using Design Patterns Modeling Systems Using Design Patterns Jaroslav JAKUBÍK Slovak University of Technology Faculty of Informatics and Information Technologies Ilkovičova 3, 842 16 Bratislava, Slovakia jakubik@fiit.stuba.sk

More information

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es

More information

Percent Perfect Performance (PPP)

Percent Perfect Performance (PPP) Percent Perfect Performance (PPP) Information Processing & Management, 43 (4), 2007, 1020-1029 Robert M. Losee CB#3360 University of North Carolina Chapel Hill, NC 27599-3360 email: losee at unc period

More information

TREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood

TREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System Paul Thompson Howard Turtle Bokyung Yang James Flood West Publishing Company Eagan, MN 55123 1 Introduction The WIN retrieval engine

More information

Equivalence Detection Using Parse-tree Normalization for Math Search

Equivalence Detection Using Parse-tree Normalization for Math Search Equivalence Detection Using Parse-tree Normalization for Math Search Mohammed Shatnawi Department of Computer Info. Systems Jordan University of Science and Tech. Jordan-Irbid (22110)-P.O.Box (3030) mshatnawi@just.edu.jo

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

More information

IR evaluation methods for retrieving highly relevant documents

IR evaluation methods for retrieving highly relevant documents IR evaluation methods for retrieving highly relevant documents Kalervo J~irvelin & Jaana Kekiil~iinen University of Tampere Department of Information Studies FIN-33014 University of Tampere FINLAND Emaih

More information

Outline of the module

Outline of the module Evolutionary and Heuristic Optimisation (ITNPD8) Lecture 2: Heuristics and Metaheuristics Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ Computing Science and Mathematics, School of Natural Sciences University

More information

INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT

INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT ABSTRACT INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT BHASKAR KARN Assistant Professor Department of MIS Birla Institute of Technology Mesra, Ranchi The paper presents the basic

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

City, University of London Institutional Repository

City, University of London Institutional Repository City Research Online City, University of London Institutional Repository Citation: Vakkari, P., Jones, S., MacFarlane, A. & Sormunen, E. (2004). Query exhaustivity, relevance feedback and search success

More information

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems Dorothee Blocks Hypermedia Research Unit School of Computing University of Glamorgan, UK NKOS workshop

More information

Retrieval Evaluation. Hongning Wang

Retrieval Evaluation. Hongning Wang Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User

More information

A NEW PERFORMANCE EVALUATION TECHNIQUE FOR WEB INFORMATION RETRIEVAL SYSTEMS

A NEW PERFORMANCE EVALUATION TECHNIQUE FOR WEB INFORMATION RETRIEVAL SYSTEMS A NEW PERFORMANCE EVALUATION TECHNIQUE FOR WEB INFORMATION RETRIEVAL SYSTEMS Fidel Cacheda, Francisco Puentes, Victor Carneiro Department of Information and Communications Technologies, University of A

More information

COPYRIGHTED MATERIAL. Number Systems. 1.1 Analogue Versus Digital

COPYRIGHTED MATERIAL. Number Systems. 1.1 Analogue Versus Digital 1 Number Systems The study of number systems is important from the viewpoint of understanding how data are represented before they can be processed by any digital system including a digital computer. It

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Appears in WWW 04 Workshop: Measuring Web Effectiveness: The User Perspective, New York, NY, May 18, 2004 Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Anselm

More information

Reactive Ranking for Cooperative Databases

Reactive Ranking for Cooperative Databases Reactive Ranking for Cooperative Databases Berthier A. Ribeiro-Neto Guilherme T. Assis Computer Science Department Federal University of Minas Gerais Brazil berthiertavares @dcc.ufmg.br Abstract A cooperative

More information

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Information Processing and Management 43 (2007) 1044 1058 www.elsevier.com/locate/infoproman Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Anselm Spoerri

More information

Design of Experiments for Coatings

Design of Experiments for Coatings 1 Rev 8/8/2006 Design of Experiments for Coatings Mark J. Anderson* and Patrick J. Whitcomb Stat-Ease, Inc., 2021 East Hennepin Ave, #480 Minneapolis, MN 55413 *Telephone: 612/378-9449 (Ext 13), Fax: 612/378-2152,

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Indexing and Retrieving Medical Literature

Indexing and Retrieving Medical Literature Evaluation of SAPHIRE: An Automated Approach to Indexing and Retrieving Medical Literature William Hersh, M.D. David H. Hickam, M.D., M.P.H. Oregon Health Sciences University Portland, Oregon, USA R. Brian

More information

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:

More information

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014. A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish

More information

Boolean Model. Hongning Wang

Boolean Model. Hongning Wang Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer

More information

M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l

M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l Anette Hulth, Lars Asker Dept, of Computer and Systems Sciences Stockholm University [hulthi asker]ø dsv.su.s e Jussi Karlgren Swedish

More information

A New Measure of the Cluster Hypothesis

A New Measure of the Cluster Hypothesis A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer

More information

A Simulation Based Comparative Study of Normalization Procedures in Multiattribute Decision Making

A Simulation Based Comparative Study of Normalization Procedures in Multiattribute Decision Making Proceedings of the 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases, Corfu Island, Greece, February 16-19, 2007 102 A Simulation Based Comparative Study of Normalization

More information

Chapter 9. Software Testing

Chapter 9. Software Testing Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of

More information

A probabilistic description-oriented approach for categorising Web documents

A probabilistic description-oriented approach for categorising Web documents A probabilistic description-oriented approach for categorising Web documents Norbert Gövert Mounia Lalmas Norbert Fuhr University of Dortmund {goevert,mounia,fuhr}@ls6.cs.uni-dortmund.de Abstract The automatic

More information

From Scratch to the Web: Terminological Theses at the University of Innsbruck

From Scratch to the Web: Terminological Theses at the University of Innsbruck Peter Sandrini University of Innsbruck From Scratch to the Web: Terminological Theses at the University of Innsbruck Terminology Diploma Theses (TDT) have been well established in the training of translators

More information

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University http://disa.fi.muni.cz The Cranfield Paradigm Retrieval Performance Evaluation Evaluation Using

More information

Interactive segmentation, Combinatorial optimization. Filip Malmberg

Interactive segmentation, Combinatorial optimization. Filip Malmberg Interactive segmentation, Combinatorial optimization Filip Malmberg But first... Implementing graph-based algorithms Even if we have formulated an algorithm on a general graphs, we do not neccesarily have

More information

The Relationships between Domain Specific and General- Purpose Languages

The Relationships between Domain Specific and General- Purpose Languages The Relationships between Domain Specific and General- Purpose Languages Oded Kramer and Arnon Sturm Department of Information Systems Engineering, Ben-Gurion University of the Negev Beer-Sheva, Israel

More information

Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Enhancing Internet Search Engines to Achieve Concept-based Retrieval Enhancing Internet Search Engines to Achieve Concept-based Retrieval Fenghua Lu 1, Thomas Johnsten 2, Vijay Raghavan 1 and Dennis Traylor 3 1 Center for Advanced Computer Studies University of Southwestern

More information

Applying Fuzzy Sets and Rough Sets as Metric for Vagueness and Uncertainty in Information Retrieval Systems

Applying Fuzzy Sets and Rough Sets as Metric for Vagueness and Uncertainty in Information Retrieval Systems Applying Fuzzy Sets and Rough Sets as Metric for Vagueness and Uncertainty in Information Retrieval Systems Nancy Mehta,Neera Bawa Lect. In CSE, JCDV college of Engineering. (mehta_nancy@rediffmail.com,

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

Query Modifications Patterns During Web Searching

Query Modifications Patterns During Web Searching Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan

More information

Integrating Probabilistic Reasoning with Constraint Satisfaction

Integrating Probabilistic Reasoning with Constraint Satisfaction Integrating Probabilistic Reasoning with Constraint Satisfaction IJCAI Tutorial #7 Instructor: Eric I. Hsu July 17, 2011 http://www.cs.toronto.edu/~eihsu/tutorial7 Getting Started Discursive Remarks. Organizational

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Informativeness for Adhoc IR Evaluation:

Informativeness for Adhoc IR Evaluation: Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,

More information

H. W. Kuhn. Bryn Mawr College

H. W. Kuhn. Bryn Mawr College VARIANTS OF THE HUNGARIAN METHOD FOR ASSIGNMENT PROBLEMS' H. W. Kuhn Bryn Mawr College The author presents a geometrical modelwhich illuminates variants of the Hungarian method for the solution of the

More information

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 106 Self-organizing behavior of Wireless Ad Hoc Networks T. Raghu Trivedi, S. Giri Nath Abstract Self-organization

More information

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Markus Krötzsch Pascal Hitzler Marc Ehrig York Sure Institute AIFB, University of Karlsruhe, Germany; {mak,hitzler,ehrig,sure}@aifb.uni-karlsruhe.de

More information

Automatic Generation of Query Sessions using Text Segmentation

Automatic Generation of Query Sessions using Text Segmentation Automatic Generation of Query Sessions using Text Segmentation Debasis Ganguly, Johannes Leveling, and Gareth J.F. Jones CNGL, School of Computing, Dublin City University, Dublin-9, Ireland {dganguly,

More information

Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6

Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Russell Swan James Allan Don Byrd Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Iteration vs Recursion in Introduction to Programming Classes: An Empirical Study

Iteration vs Recursion in Introduction to Programming Classes: An Empirical Study BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 4 Sofia 2016 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2016-0068 Iteration vs Recursion in Introduction

More information

Information Retrieval

Information Retrieval Information Retrieval Test Collections Gintarė Grigonytė gintare@ling.su.se Department of Linguistics and Philology Uppsala University Slides based on previous IR course given by K.F. Heppin 2013-15 and

More information

Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization

Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization 10 th World Congress on Structural and Multidisciplinary Optimization May 19-24, 2013, Orlando, Florida, USA Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization Sirisha Rangavajhala

More information

Context based Re-ranking of Web Documents (CReWD)

Context based Re-ranking of Web Documents (CReWD) Context based Re-ranking of Web Documents (CReWD) Arijit Banerjee, Jagadish Venkatraman Graduate Students, Department of Computer Science, Stanford University arijitb@stanford.edu, jagadish@stanford.edu}

More information

Development of a tool for the easy determination of control factor interaction in the Design of Experiments and the Taguchi Methods

Development of a tool for the easy determination of control factor interaction in the Design of Experiments and the Taguchi Methods Development of a tool for the easy determination of control factor interaction in the Design of Experiments and the Taguchi Methods IKUO TANABE Department of Mechanical Engineering, Nagaoka University

More information

Using Clusters on the Vivisimo Web Search Engine

Using Clusters on the Vivisimo Web Search Engine Using Clusters on the Vivisimo Web Search Engine Sherry Koshman and Amanda Spink School of Information Sciences University of Pittsburgh 135 N. Bellefield Ave., Pittsburgh, PA 15237 skoshman@sis.pitt.edu,

More information

Adding user context to IR test collections

Adding user context to IR test collections Adding user context to IR test collections Birger Larsen Information Systems and Interaction Design Royal School of Library and Information Science Copenhagen, Denmark blar @ iva.dk Outline RSLIS and ISID

More information

Using Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department

Using Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department Using Statistical Properties of Text to Create Metadata Grace Crowder crowder@cs.umbc.edu Charles Nicholas nicholas@cs.umbc.edu Computer Science and Electrical Engineering Department University of Maryland

More information

Higher Order Refinement Heuristics for Rule Validation

Higher Order Refinement Heuristics for Rule Validation From: FLAIRS-02 Proceedings. Copyright 2002, AAAI (www.aaai.org). All rights reserved. Higher Order Refinement Heuristics for Rule Validation Hans - Werner Kelbassa Email: kelbassa@uni-paderborn.de Abstract

More information

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion

More information

Assignment 1. Assignment 2. Relevance. Performance Evaluation. Retrieval System Evaluation. Evaluate an IR system

Assignment 1. Assignment 2. Relevance. Performance Evaluation. Retrieval System Evaluation. Evaluate an IR system Retrieval System Evaluation W. Frisch Institute of Government, European Studies and Comparative Social Science University Vienna Assignment 1 How did you select the search engines? How did you find the

More information

Organizing Information. Organizing information is at the heart of information science and is important in many other

Organizing Information. Organizing information is at the heart of information science and is important in many other Dagobert Soergel College of Library and Information Services University of Maryland College Park, MD 20742 Organizing Information Organizing information is at the heart of information science and is important

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION ABSTRACT A Framework for Multi-Agent Multimedia Indexing Bernard Merialdo Multimedia Communications Department Institut Eurecom BP 193, 06904 Sophia-Antipolis, France merialdo@eurecom.fr March 31st, 1995

More information

Mobile Query Interfaces

Mobile Query Interfaces Mobile Query Interfaces Matthew Krog Abstract There are numerous alternatives to the application-oriented mobile interfaces. Since users use their mobile devices to manage personal information, a PIM interface

More information

2 Approaches to worldwide web information retrieval

2 Approaches to worldwide web information retrieval The WEBFIND tool for finding scientific papers over the worldwide web. Alvaro E. Monge and Charles P. Elkan Department of Computer Science and Engineering University of California, San Diego La Jolla,

More information

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following

More information

FORMALIZED SOFTWARE DEVELOPMENT IN AN INDUSTRIAL ENVIRONMENT

FORMALIZED SOFTWARE DEVELOPMENT IN AN INDUSTRIAL ENVIRONMENT FORMALIZED SOFTWARE DEVELOPMENT IN AN INDUSTRIAL ENVIRONMENT Otthein Herzog IBM Germany, Dept. 3100 P.O.Box 80 0880 D-7000 STUTTGART, F. R. G. ABSTRACT tn the IBM Boeblingen Laboratory some software was

More information

Taxonomies and controlled vocabularies best practices for metadata

Taxonomies and controlled vocabularies best practices for metadata Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley

More information

CHAPTER 5 Querying of the Information Retrieval System

CHAPTER 5 Querying of the Information Retrieval System 5.1 Introduction CHAPTER 5 Querying of the Information Retrieval System Information search and retrieval involves finding out useful documents from a store of information. In any information search and

More information

Subjective Relevance: Implications on Interface Design for Information Retrieval Systems

Subjective Relevance: Implications on Interface Design for Information Retrieval Systems Subjective : Implications on interface design for information retrieval systems Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S (2005). Proc. 8th International Conference of Asian Digital Libraries (ICADL2005),

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information

Joho, H. and Jose, J.M. (2006) A comparative study of the effectiveness of search result presentation on the web. Lecture Notes in Computer Science 3936:pp. 302-313. http://eprints.gla.ac.uk/3523/ A Comparative

More information