A NOVEL METHOD FOR THE EVALUATION OF BOOLEAN QUERY EFFECTIVENESS ACROSS A WIDE OPERATIONAL RANGE
|
|
- Annabelle Joseph
- 6 years ago
- Views:
Transcription
1 A NOVEL METHOD FOR THE EVALUATION OF BOOLEAN QUERY EFFECTIVENESS ACROSS A WIDE OPERATIONAL RANGE Eero Sormunen Department of Information Studies, University of Tampere P.O. Box 607, FIN 330 Tampere, Finland Tel Mail eero.sormunen@uta.fi ABSTRACT Traditional methods for the system-oriented evaluation of Boolean IR systems suffer from validity and reliability problems. Laboratory-based research neglects the searcher and studies suboptimal queries. Research on operational systems fails to make a distinction between searcher performance and system performance. This approach is neither capable of measuring performance at standard points of operation (e.g. across R0.0-R.0). A new laboratory-based evaluation method for Boolean IR systems is proposed. It is based on a controlled formulation of inclusive query plans, on an automatic conversion of query plans into elementary queries, and on combining elementary queries into optimal queries at standard points of operation. Major results of a large case experiment are reported. The validity, reliability, and efficiency of the method are considered in the light of empirical and analytical test data. Keywords evaluation (general), structured queries, testing methodology, test collections. INTRODUCTION The mainstream of the evaluative IR research has followed the Cranfield paradigm. The major focus has been on the best match IR models, see e.g. [2, 23]. The low interest in studying the Boolean IR model can be seen in the low volume of research output (see e.g. [8] and other TREC reports), and also in the slow development of system-oriented evaluation methods for the Boolean IR model. Research on operational systems has focused on Boolean IR systems but the contribution on the development of methods has been very slight [3, 28]. Research within the Cranfield paradigm has shared a very critical attitude towards the Boolean IR model [7]. The studies of Salton [2] and Turtle [30] are examples of attempts to show empirically the overall superiority of the best match IR models over the Boolean IR model. The results of some recent comparisons, have suggested that studying the overall superiority of one model over the other may be a naive approach [, 20]. Boolean queries seem to perform better in some situations, and best match queries in other situations. It may be more reasonable to focus on studying performance of different IR models under changing operational constraints. New methods are needed to draw a more detailed picture of query effectiveness in different IR models.. Methodological Problems in Boolean IR Experiments The Boolean IR model has three features that cause methodological problems for experimental research [3]:. The formulation of Boolean queries requires a trained person to translate the user request into a query. 2. The searcher has very little control over the size of the output produced by a particular query. 3. The Boolean IR model does not support ranking of documents in order of decreasing probability of relevance. The necessity to use a human expert in query formulation is a potential source of validity and reliability problems. It is very difficult to separate the effects of a technical IR system from those of a human searcher. For instance, in the well known STAIRS study, the searchers had a predefined goal to locate at least 75 per cent of all relevant documents. It turned out that only less than 20 per cent of relevant documents were found. On the other hand, the average precision of the test queries was as high as 79 per cent [3]. The searchers were obviously formulating high-precision queries although they were asked to work towards high recall. The latter two features (no ranking, little control over the output size) of the Boolean IR model cause problems in measuring the performance at the standard point of operation (SPO, e.g. at fixed recall levels or document cut-off values). Typically, only one query (from an arbitrary operational level) is formulated per search request. Performance is measured using single recall/precision values. and precision are averaged separately over all requests. As Lancaster has shown [5], the distribution of recall and precision values for a large set of requests is very wide. It is very difficult see how the averaged recall and precision values should or could be interpreted, since averaging mixes queries from different operational levels. The coordination level method developed for the Cranfield 2 project, is a traditional approach to omit the trained searcher from the query formulation, to rank output, and to measure the wide range performance of a Boolean system [5]. Unfortunately, replacing the cognitive effort of a searcher by a mechanical query term selection procedure leads to a
2 Facet A [Information retrieval] Facet B [Search process] (information retrieval OR online systems OR online(w)search?) AND (tactic? OR heuristic? OR trial(w)error OR expert systems OR artificial intelligence OR attitudes/de OR behavior?/de,id,ti OR cognitive/de) Figure. An example of a high recall Oriented query used by Harter [0] to illustrate the facet based query planning approach. fundamental validity problem. Queries exploit the Boolean IR model in a suboptimal way..2 Harter s Idea: the Most Rational Path Harter [0] introduced an idea for an evaluation method based on the notion of elementary queries (EQ). Harter used a single search topic to illustrate how the method could be applied. He designed a high recall oriented query plan (see Fig ). Harter applied the building block search strategy which quite commonly used by professional searchers [6, 9, 2, 6]. The major steps of the building blocks strategy are ) Identify major facets and their logical relationships with one another. 2) Identify query terms that represent each facet: words, phrases, etc. 3) Combine the query terms of a facet by disjunction (OR operation). 4) Combine the facets by conjunction or negation (AND or ANDNOT operation) [9]. The notion of facet is important in query planning. It is a concept that is identified from, and defines one exclusive aspect of a search topic. In step 2, a typical goal is to discover all plausible query terms appropriate in representing the selected facet. Next, Harter retrieved all documents matching the conjunction of facets A and B represented by the disjunction of all selected query terms, and assessed the relevance of resulting 37 documents. In addition, all conjunctions of two query terms (called elementary queries) from the query plan representing facets A and B in Fig. were composed and executed. A sample from the 24 elementary queries and the summary of their retrieval results are presented in Table. Harter [0] demonstrated the procedure of constructing optimal queries (called the most rational path). An estimate for maximum precision across the whole relative recall range was determined by applying a simple incremental algorithm:. To create the initial optimal query, choose the EQ that achieves the highest precision. Eq # Elementary queries # of Docs s information retrieval AND tactic? 8 s2 information retrieval AND heuristic? 7 s3 information retrieval AND trial(w)error s22 online(w)search? AND attitudes/de 9 s23 online(w)search? AND behavior?/de,id,ti 8 s24 online(w)search? AND cognitive/de 0 s25 s-s24/or 37 # of Rel Docs Precision 2. Create in turn the disjunction of each of the remaining EQs with the current optimal query. Select the disjunction with the EQ that maximizes precision. The disjunction of the current optimal query and the selected EQ creates a new optimal query. 3. Repeat step 2 until all elementary queries have been exhausted. Precision and recall values for the 24 elementary queries and the respective curve for the optimal queries are presented in Fig 2. Harter never reported full-scale evaluation results based on the idea of the most rational path except this single example. He did neither develop operational guidelines for a fluent use of the method in practice Table. Retrieval results for the 24 elementary queries in the case search by Harter (990). Precision,00 0,90 0,80 0,70 0,60 0,50 0,40 0,30 0,20 0,0 0,00 s3 s3 or s8 s3 or s8 or s24 s s2 0,00 0,0 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90,00 Most rational path Elementary queries Figure 2. and precision of the 24 elementary queries and the most rational path in the case search presented by Harter [0]. Actually Harter talked about elementary postings sets. This is very confusing since it applies set-based terminology to address queries as logical statements.
3 .3 Research Goals The main goal of the study was to create an evaluation method for measuring performance of Boolean queries across a wide operational range by elaborating the ideas introduced by Harter [0]. The method is presented and argued using the framework suggested by Newell M={domain, procedure, justification} [9]:. The domain of the method specifies the appropriate application area for the method. 2. The procedure of the method consists of the ordered set of operations required in the proper use of the method. Especially, two major operations unique to the procedure need to be elaborated: a) Query formulation. How the set of elementary queries is composed from a search topic? b) Query optimization. What algorithm should be used for combining the elementary queries to find the optimal query for different operational levels? 3. The justification of the method. The appropriateness, validity, reliability and efficiency of the method within the specified domain must be justified. The structure of this paper is the following: First, some basic concepts and the procedure of the method are introduced. Second, a case experiment is briefly reported to illustrate the domain and the use of the proposed method in a concrete experimental setting. Third, the other justification issues of the method: validity, reliability and efficiency are discussed. Several empirical tests were carried out to assess the potential validity and reliability problems in applying the method. 2. OUTLINE FOR THE METHOD The aim of this section is to introduce a sound theoretical framework for the procedure of the method and to formulate operational guidelines for exercising it. 2. Query Structures and Query Tuning Spaces IR models address the issue of comparing a query as a representation of a request for information with representations of texts. The Boolean IR model supports rich query structures, a (simple) binary representation of texts, and an exact match technique for comparing queries and text representations [2]. A Boolean query consists of query terms and operators. Query terms are usually words, phrases, or other character strings typical of natural language texts. The Boolean query structures are based on three logic connectives conjunction ( ), disjunction ( ), negation ( ), and on the use of parentheses. A query expresses the combination of terms that retrieved documents have to contain. If we want to generate all possible Boolean queries for a particular request, we have to identify all query terms that might be useful, and to generate all logically reasonable query structures. Facet, as defined in section.2, is a very useful notion in representing relationships between Boolean query structures and the search topic. Terms within a facet are naturally combined by disjunctions. Facets themselves present the exclusive aspects of desired documents, and are naturally combined by Boolean conjunction or negation. [9]. Expert searchers tend to formulate query plans applying the notion of facet [9, 6]. Resulting query plans are usually in a standard form, the conjunctive normal form (CNF) (for a formal definition, see []). The structure of a Boolean query can be easily characterized in CNF queries: Query exhaustivity (Exh) is the number of facets that are exploited. Query extent (QE) characterizes the broadness of a query, and can be measured, e.g. as the average number of query terms per facet. For instance, in the query plan designed by Harter Exh=2 and QE=5.5 (see Fig. ). The changes made in query exhaustivity and extent to achieve appropriate retrieval goals are called here query tuning. The range within which query exhaustivity and query extent can change sets the boundaries for query tuning. The set of all elementary queries and their feasible combinations composed at all available exhaustivity and extent levels form the query tuning space. In the example by Harter (Fig ), seven different disjunctions of query terms can be generated from facet A (=2 3 -) and 255 from facet B (=2 8 -). The total number of possible EQ combinations is then 7 x 255 =,785 at Exh= 2. In addition, 7 and 255 EQ combinations can be formed at Exh= from facets A and B, respectively. Thus, the total number of EQ combinations creating the query tuning space across exhaustivity levels and 2 for the sample query plan is 2, The Procedure of the Method The procedure of the proposed method consists of eight operations at three stages: STAGE I. INCLUSIVE QUERY PLANNING. Design inclusive query plans. Experienced searchers formulate inclusive query plans for each given search topic. It yields a comprehensive representation of the query tuning space available for a search topic. 2. Execute extensive queries. The goal of extensive queries is to gain reliable recall base estimates. 3. Determine the order of facets. The facet order of inclusive query plans is determined by ranking the facets according to their measured recall power, i.e. their capability to retrieve relevant documents. STAGE II. QUERY OPTIMISATION 4. Generate the set of elementary queries (EQ). Inclusive query plans in the conjunctive normal form (CNF) at different exhaustivity levels are transformed into the disjunctive normal form (DNF) where the elementary conjunctions create the set of elementary queries. All elementary queries are executed to find the set of relevant and non-relevant documents associated with each EQ. 5. Select standard points of operation (SPO). Both fixed recall levels R0.,,R.0 and fixed document cut-off values, e.g. DCV2, DCV5,,DCV500 may be used as SPOs. 6. Optimization of queries. An optimisation algorithm is used to compose the combinations of EQs performing optimally at each selected SPO. STAGE III. EVALUATION OF RESULTS 7. Measure precision at each SPO. Precision can be used as a performance measure. Precision is averaged over all search topics at each SPO.
4 8. Analyse the characteristics of optimal queries. The optimal queries are analysed to explain the changes in the performance of an IR system. The above steps describe the ordered set of operations constituting the procedure of the proposed method. Inclusive query planning (steps -3) and the search for the optimal set of elementary queries (steps 4-6), are in the focus of this study..3 Inclusive Query Planning The techniques of query planning are routinely taught to novice searchers [9, 6]. A common feature in different query planning techniques is that they emphasize the analysis and identification of searchable facets, and the representation of each facet as an exhaustive disjunction of query terms. The goal of inclusive query planning is similar, but the thoroughness of identification task is stressed even more. In inclusive query planning, the goal is to identify. all searchable facets of a search topic, and 2. all plausible query terms for each facet. A major doubt in using human experts to design queries is probably associated with the reliability of experimental designs. For instance, the average inter-searcher overlap in selection of query terms (measured character-by-character) is usually around 30 per cent [25]. Fortunately, the situation is not so bad when facets are considered. For instance, in a study by Iivonen [2], the average concept-consistency rose up to 88 per cent, and experienced searchers were even more consistent. This indicates that expert searchers are able to identify the facets of a topic consistently although the overlap of queries at string level may be low. The identification of all plausible query terms for each identified facet is another task requiring searching expertise. Basically, the comprehensiveness of facet representations is mostly a question of how much effort are used to identify potential query terms. The query designer is freed from the needs to make compromised query term selections typical of practical search situations. The optimization operation will automatically reject ill-behaving query terms. The process can be improved by appropriate tools (dictionaries, thesauri, browsing tools for database indexes, etc.). The final step is to decide the order of facets in the query plan. In the case of a laboratory test collection, full relevance data (or at least its justified estimate) is available. The facets of an inclusive query plan can be ranked in the descending order of recall. The disjunction of all query terms identified for a facet is used to measure recall values..4 Search for the Optimal Set of EQs The size of the query tuning space increases exponentially as a function of the number of EQs. We are obviously facing the risk of combinatorial explosion since we do not know the upper limit of query exhaustivity and, especially, query extent in inclusive query plans. Solving the optimization problem by blind search algorithms could lead to unmanageably long running times. The search for the optimal set of EQs is a NPhard problem. Harter [0] introduced a simple heuristic algorithm but he did not define it formally. Query optimization resembles a traditional integer programming case called the Knapsack Problem. The problem is to fill a container with a set of items so that the value of the cargo is maximized, and the weight limit for the cargo is not exceeded [4]. The special case where each item is selected once only (like EQs), is called the 0- Knapsack Problem. Efficient approximation algorithms have been developed to find a feasible lower bound for the optimum [7]. The problem of finding the optimal query from the query tuning space can be formally defined by applying the definitions of the 0- Knapsack Problem as follows: Select a set of EQs so as to maximise z = subject to and DCV n rixi i= n n ixi i= DCV, if eqi is selected where xi = 0, otherwise ri = no of relevant documents retrieved by eqi ni = no of documents retrieved by eqi j = selected document cut j - off value The above definition of the optimization problem is in its maximization version. The number of relevant documents is maximized while the total number of retrieved documents is restricted by the given DCV j. In the minimization version of the problem, the goal is to minimize the total number of documents while requiring that the number of relevant documents exceeds some minimum value (a fixed recall level). Unfortunately, standard algorithms designed for physical objects would not work properly with EQs. Different EQs tend to overlap and retrieve at least some joint documents. This means that, in a disjunction of elementary queries, the profit r i and the weight n i of the elementary query eq i have dynamically changing effective values that depend on the EQs selected earlier. The effect of overlap in a combination of several query sets is hard to predict. A simple heuristic procedure for an incremental construction of the optimal queries was designed applying the notion of efficiency list [7]. The maximization version of the algorithm contains seven steps: Remove all elementary queries eq i a) retrieving more documents than the upper limit for the number of documents (i.e. n i > residual document cutoff value DCV', starting from DCV' = DCV j ) or b) retrieving no relevant documents (r i =0). 2. Stop, if no elementary queries eq i are available. 3. Calculate the efficiency list using precision values r i /n i for remaining m elementary queries and sort elementary queries in order of descending efficiency. In the case of equal values, use the number of relevant documents (r i ) retrieved as the second sorting criterion. 4. Move eq at the top of the efficiency list to the optimal query.
5 5. Remove all documents retrieved by eq from the result sets of remaining elementary queries eq 2,..., eq m. 6. Calculate the new value for free space DCV'. 7. Continue from step one. The basic algorithm favors narrowly formulated EQs retrieving a few relevant documents with high precision at the expense of broader queries retrieving many relevant documents with medium precision. The problem can be reduced by running the optimization in an alternative mode differing only in step four of the first iteration round: eq i retrieving the largest set of relevant documents is selected from the efficiency list instead of eq. The alternative mode is called the largest first optimization and the basic mode the precision first optimization. 3. A CASE EXPERIMENT The goal of the case experiment was to elucidate the potential uses of the proposed method, to clarify the types of research questions that can be effectively solved by the method, and to explicate the operational pragmatics of the method. 3. Research Questions The case experiment focused on the mechanism of falling effectiveness of Boolean queries in free-text searching of largefull-text databases. The work was inspired by the debate concerning the results of the STAIRS study [3, 22]. The goal was to draw a more detailed picture of system performance and optimal query structures in search situations typical of large databases. Assuming an ideally performing searcher, the main question was: What is the difference in maximum performance of Boolean queries between a small database and two types of large databases? The large & dense database contained a larger volume of documents than the small database but the density of relevant documents (generality) was the same. In the large & sparse database, both the volume of documents was higher and the density of relevant documents was lower than in the small database. Twelve hypotheses were formulated concerning effectiveness, exhaustivity and proportional query extent of queries in large databases. For details, see [26]. 3.2 Data and Methods 3.2. Optimization Algorithm The optimization algorithm described in Section 2.5 was programmed in C for Unix. Both a maximization version exploiting a standard set of document cut-off values (DCV 2, DCV 5,, DCV 500 ) and a minimization version exploiting fixed recall levels (R 0. R.0 ) were implemented. At each SPO, the iteration round (called optimization lap) was executed ten times starting each round by selecting a different top EQ from the efficiency list: five laps in the largest first mode, and five in the precision first mode. The alternative results at a particular SPO achieved by the algorithm in different optimization laps were sorted to find the most optimal queries for further analysis Test Collection The Finnish Full-Text Test Collection developed at the University of Tampere was used in the case experiment [4]. The test database contains about 54,000 newspaper articles from three Finnish newspapers. A set of 35 search topics are available including verbal topic descriptions and relevance assessments. The test database is implemented for the TRIP retrieval systems 2. The test database played the role of the large & dense database. Other databases, the small database and the large & sparse database, were created through sampling from EQ result sets. The large & sparse database was created by deleting about 80 % of the relevant documents, and the small database by deleting about 80 % of all documents of the EQ result sets. Thus, the EQ result sets for the small database contained the same relevant documents as those for the large & sparse database. Query optimization was done separately on these three EQ data sets Inclusive Query Plans The initial versions of inclusive query plans were designed by an experienced search analyst working for three months on the project. Query planning was an interactive process based on thorough test queries and on the use of vocabulary sources. Later parallel experiments (probabilistic queries) revealed that the initial query plans failed to retrieve some relevant documents. These documents were analyzed, and some new query terms were added to represent the facets comprehensively. The final inclusive query plans were capable to retrieve 270 (99,3 %) out of the 278 known relevant documents at exhaustivity level one. In total, inclusive query plans contained 34 facets. The average exhaustivity of query plans was 3.8 ranging from 2 to 5. The total number of query terms identified was 2,330 (67 per query plan and 8 per facet). The number of terms ranged from 23 to 69 per query plan, and from to 74 per facet. The wide variation in the number of query terms per facet characterizes the difference between specific concepts (e.g. named persons or organizations) and general concepts (e.g., domains or processes) Data Collection and Analysis Precision, query exhaustivity and query extent data were collected for the optimal queries at SPOs. The sensitivity of results to changes in search topic characteristics like the size of a recall base, the number of facets identified, etc. were analyzed. Also the searchable expressions referring to query plan facets were identified in all relevant documents of a sample of 8 test topics to find explanations for the observed performance differences. Statistical tests were applied to all major results. 3.3 Sample Results Figures 3-5 summarize the comparisons between the small, large & dense, and large & sparse databases: average precision, exhaustivity and proportional extent of optimal queries at recall levels R 0. -R.0. 3 The case experiment could reveal interesting performance characteristics of Boolean queries in large databases. The average precision across R 0. -R.0 was about 3 % lower in the 2 TRIP by TietoEnator, Inc. 3 Proportional query extent (PQE) was measured only for high recall and high precision searching because of research economical reasons. PQE is the share of query terms actually used of the available terms in inclusive query plans (average over facets).
6 Precision Exhaustivity,00 0,90 0,80 0,70 0,60 0,50 0,40 0,30 0,20 0,0 0,00 0,00 0,20 0,40 0,60 0,80,00 Figure 3. Average precision at fixed recall levels in optimal queries for small, large&dense and large&sparse databases. 5,0 4,0 3,0 2,0,0 Figure 4. Exhaustivity of high recall queries optimised for small, large&dense and large&sparse databases. Proportional query extent 0,0 0,00 0,20 0,40 0,60 0,80,00 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0, 0,0 0,0 0,20 0,30 0, ,00 Figure 5. Proportional query extent (PQE) of optimal queries in the small, large&dense, and large&sparse databases. Small db L&d db L&s db Small db L&d db L&s db Small db L&d db L&s db large & dense database (database size effect), and about 40 % lower in the large & sparse database (database size + density effect) than in the small database (see Fig 3). The average exhaustivity of optimal queries was higher in the large databases than in the small one, but the level of precision could not be maintained. Proportional query extent was highest in the large & dense database suggesting that more query terms are needed per facet when a larger number of documents have to be retrieved. The number of topics Figure 6. The number of search topics where full recall can be achieved as a function of query exhaustivity in the small and large recall bases (8 topics in total) (8 topics) 2 (8) 3 (7) 4 (2) 5 (5) Query exhaustivity A very interesting deviation was identified in the precision and exhaustivity curves at the highest recall levels. In the large & dense database, the precision and exhaustivity of optimal queries fell dramatically between R 0.9 and R.0. The results of the facet analysis of all relevant documents in a sample of 8 test topics clarified the role of the recall base size in falling effectiveness at R.0. The more documents need to be retrieved to achieve full recall, the more there occur relevant documents where some query plan facets are expressed implicitly. The results are presented in Fig 6. For Exh= full recall was possible in all but one test topic for both recall bases. At higher exhaustivity levels, the number of test topics where full recall is possible fell much faster in the large recall base. Above results are just examples from the case study findings to illustrate the potential uses of the proposed method. High precision searching was also studied by applying DCVs as standard points of operation. It turned out, for instance, that the database size alone does not induce efficiency problems at low DCVs. On the contrary, highest precision was achieved in the large & dense database. It was also shown that earlier results indicating the superiority of proximity operators over the AND operator in high precision searching are invalid. Queries optimized separately for both operators show similar average performance. For details, see [26]. 4. JUSTIFICATION OF THE METHOD Evaluation methods should themselves be evaluated in regard to appropriateness, validity, reliability, and efficiency [24, 29]. The appropriateness of a method was verified in the case study by showing that new results could be gained. Validity, reliability, and efficiency are more complex issues to evaluate. The main concerns were directed at the unique operations: inclusive query planning and query optimization. 4. Facet Selection Test Three subjects having good knowledge of text retrieval and indexing were asked to make a facet identification test using a sample of 4 test topics. The results showed that the exhaustivity of inclusive query plans used in the case experiment were not biased downwards (enough exhaustivity tuning space). The test also verified earlier results that the consistency in the selection of query facets is high between search experts. 4.2 Facet Representation Test The facet analysis of all relevant documents in the sample of 8 search topics showed that the original query designer had 8 Large recall base Small recall base 0
7 missed or neglected about one third of the available expressions in the relevant documents. However, the effect of missed query terms was regarded as marginal since their occurrences in documents mostly overlapped with other expressions already covered by the query plan. The effect was shown to be much smaller than the effect of implicit expressions. In the interactive query optimization test (see next section), precision was observed to drop less than 4 %. 4.3 Interactive Query Optimization Test The idea of the interactive query optimization test was to replace the automatic optimization operation by an expert searcher, and compare the achieved performance levels as well as query structures. A special WWW-based tool, the IR Game [27], designed for rapid analysis of query results was used in this test. When interfaced to a laboratory test collection, the tool offers immediate performance feedback at the level of individual queries in the form of recall-precision curves, and a visualization of actual query results. The searcher is able to study, in a convenient and effortless way, the effects of query changes. An experienced searcher was recruited to run the interactive query optimization test. A group of three control searchers were used to test the overall capability of the test searcher. The test searcher was working for a period of.5 months trying to find optimal queries for the sample of 8 test topics for which the full data of facet analysis was available. In practice, the test searcher did not face any time constraints. The results showed that the algorithm was performing better than or equally with the test searcher in 98 % out of the 98 test cases. This can be regarded as an advantageous result for a first version of a heuristic algorithm. 4.4 Efficiency of the Method The investment in inclusive query planning was justified to be reasonable in the context of a test collection. It was also shown that the growth of running time of the optimization algorithm can be characterized by O(n log n), and that it is manageable for all EQ sets of finite size. 5. CONCLUSIONS AND DISCUSSION The main goal of this study was to design, demonstrate and evaluate a new evaluation method for measuring the performance of Boolean queries across a wide operational range. Three unique characteristics of the method help to comprehend its potential:. Performance can be measured at any selected point across the whole operational range, and different standard points of operation (SPO) may be applied. 2. Queries under consideration estimate optimal performance at each SPO, and query structures are free to change within the defined query tuning space in search of the optimum. 3. The expertise of professional searchers could be brought into a system-oriented evaluation framework in a controlled way. The domain of the method can be characterized by illustrating the kinds of research variables that can be appropriately studied by applying the method. Query precision, exhaustivity and extent are used as dependent variables, and the standard points of operation as the control variable. Independent variables may relate to:. documents (e.g. type, length, degree of relevance) 2. databases (e.g. size, density) 3. database indexes (e.g. type of indexing, linguistic normalization of words) 4. search topics (e.g. complexity, broadness, type) 5. matching operations (e.g. different operators). The proposed method offers clear advantages over traditional evaluation methods. It helps to acquire new information about the phenomena observed and challenge present findings because it is more accurate (averaging at defined SPOs). The method is also economical in experiments where a complex query tuning space is studied. The query tuning space contains all potential candidates for optimal queries, but data are collected only on those queries that turn out to be optimal at a particular SPO. The proposed method yielded two major innovations: inclusive query planning, and query optimization. The former innovation is more universal since it can be used both in Boolean as well as in best match experiments, see [4]. The query optimization operation in the proposed form is restricted to the Boolean IR model since it presumes that the query results are distinct sets. The inclusive query planning idea is easier to exploit since its outcome, the representation of the available query tuning space, can also be exploited in experiments on best-match IR systems. Traditional test collections were provided with complete relevance data. Inclusive query plans are a similar data set that can be used in measuring ultimate performance limits of different matching algorithms. Inclusive query plans help also in categorizing test topics according to their properties, e.g. complex vs. simple (exhaustivity tuning dimension), and broad vs. narrow (extent tuning dimension). This opens a way to create experimental settings that are more sensitive to situational factors, the issue that has been raised in the Boolean/best-match comparisons [, 20]. 6. ACKNOWLEDGMENTS I am grateful to my supervisor Kalervo Järvelin, and to the FIRE group: Heikki Keskustalo, Jaana Kekäläinen, and others. 7. REFERENCES [] Arnold, B.H. (962). Logic and Boolean algebra. Eaglewood Cliffs: Prentice-Hall. [2] Belkin, N.J. & Croft, W.B. (987). Retrieval Techniques. In: Williams, M.E., Annual Review of Information Science and Technology 22(), 09-45, New York: Elsevier & ASIS. [3] Blair, D.C. & Maron, M.E. (985). An evaluation of retrieval effectiveness for a full-text document retrieval system. Comm. of the ACM (28)3, [4] Chvátal, V. (983). Linear Programming. New York: W.H. Freeman. [5] Cleverdon, C.W. (967). The Cranfield tests on index language devices. Aslib Proceedings 9(6),
8 [6] Fidel, R. (99). Searcher s Selection of Search Keys. Journal of the American Society for Information Science 42(7), , 50-54, [7] Frants, V.I., Shapiro, J., et al. (999). Boolean Search: Current State and Perspectives. Journal of the American Society for Information Science 50(), [8] Harman, D. (993). The First Text Retrieval Conference (TREC-). Gaithersburg: National Institute of Standards and Technology. (NIST Spec. Publ ). [9] Harter, S.P. (986). Online Information retrieval. Orlando: Academic Press. [0] Harter, S.P. (990). Search Term Combinations and Retrieval Overlap: A Proposed Methodology and Case Study. Journal of the American Society for Information Science 4(2), [] Hersh, W.R. & Hickam, D.H. (995). An Evaluation of Interactive Boolean and Natural Language Searching with Online Medical Textbook. Journal of the American Society for Information Science 48(7), [2] Iivonen, M. (995). Consistency in the selection of search concepts and search terms. Information Processing & Management 3(2), [3] Ingwersen, P. & Willett, P. (995). An Introduction to Algorithmic and Cognitive Approaches for Information Retrieval. Libri 45(), [4] Järvelin, K., Kristensen, J., et al. (996). A Deductive Data Model for Query Expansion. In: Proceedings of the 9th International ACM SIGIR Conference, Zürich, Switzerland, August 8-22, 996. [5] Lancaster, F.W. (968). Information Retrieval Systems: Characteristics, Testing, and Evaluation. New York: John Wiley. [6] Lancaster, F.W. & Warner, A.J. (993). Information Retrieval Today. Arlington: Information Resources Press. [7] Martello, S. & Toth, P. (990). Knapsack Problems. Algorithms and Computer Implementations. Guildford: John Wiley & Sons. [8] McKinin, E.J., Sievert, M.E., et al. (99). The Medline Full-Text Project. Journal of the American Society for Information Science 42(4), [9] Newell, A. (968). Heuristic programming: Ill-structured problems. In: Arofonsky, J. (Ed.). Progress in Operations Research, Vol III, New York. [20] Paris, L.A.H. & Tibbo, H.R. (998). Freestyle vs. Boolean: A comparison of partial and exact match retrieval systems. Information Processing & Management 34(2/3), [2] Salton, G. (972). A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART). Journal of the American Society for Information Science 23(March-April), [22] Salton, G. (986). Another look at automatic textretrieval systems. Communications of the ACM 29(7), [23] Salton, G. & McGill, M.J. (983). Introduction to Modern Information Retrieval. Singapore: McGraw-Hill. [24] Saracevic, T. (995). Evaluation of evaluation in information retrieval. In: Fox, E.A. et al. (Eds.), SIGIR 95 - Proceedings of the 8th Annual International ACM SIGIR Conference. Washington July 9-3, 995, p [25] Saracevic, T., Kantor. P. et al. (988). A Study of Information Seeking and Retrieving. Journal of the American Society for Information Science 39(3), pp. 6-76, 77-96, and [26] Sormunen, E. (2000). A Method for measuring Wide Range Performance of Boolean Queries in Full-Text Databases. Doctoral Thesis. Tampere: University of Tampere. Acta Electronica Universitatis Tamperensis, ISBN: , 23 p. URL: [27] Sormunen, E., Laaksonen, J., et al. (998). The IR Game - A Tool for Rapid Query Analysis in Cross-Language IR Experiments. PRICAI '98 Workshop on Cross Language Issues in Artificial Intelligence. Singapore, Nov 22-24, 998, p [28] Sparck-Jones, K. (98). Information retrieval experiment. London: Butterworths. [29] Tague-Sutcliffe, J. (992). The pragmatics of information retrieval experimentation, revisited. Information Processing & Management 28(4), [30] Turtle, H. (994). Natural Language vs. Boolean Query Evaluation: A Comparison of Retrieval Performance. In: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference. London: Springer-Verlag. p
TEXT CHAPTER 5. W. Bruce Croft BACKGROUND
41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia
More informationEVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES
DEFINING SEARCH SUCCESS: EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES by Barbara M. Wildemuth Associate Professor, School of Information and Library Science University of North Carolina at Chapel
More informationSpeed and Accuracy using Four Boolean Query Systems
From:MAICS-99 Proceedings. Copyright 1999, AAAI (www.aaai.org). All rights reserved. Speed and Accuracy using Four Boolean Query Systems Michael Chui Computer Science Department and Cognitive Science Program
More informationInter and Intra-Document Contexts Applied in Polyrepresentation
Inter and Intra-Document Contexts Applied in Polyrepresentation Mette Skov, Birger Larsen and Peter Ingwersen Department of Information Studies, Royal School of Library and Information Science Birketinget
More informationDesigning and Building an Automatic Information Retrieval System for Handling the Arabic Data
American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far
More informationITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ
- 1 - ITERATIVE SEARCHING IN AN ONLINE DATABASE Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ 07962-1910 ABSTRACT An experiment examined how people use
More informationDeveloping a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter
university of copenhagen Københavns Universitet Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter Published in: Advances
More informationOPTIMIZATION, OPTIMAL DESIGN AND DE NOVO PROGRAMMING: DISCUSSION NOTES
OPTIMIZATION, OPTIMAL DESIGN AND DE NOVO PROGRAMMING: DISCUSSION NOTES MILAN ZELENY Introduction Fordham University, New York, USA mzeleny@fordham.edu Many older texts, with titles like Globally Optimal
More informationWeb document summarisation: a task-oriented evaluation
Web document summarisation: a task-oriented evaluation Ryen White whiter@dcs.gla.ac.uk Ian Ruthven igr@dcs.gla.ac.uk Joemon M. Jose jj@dcs.gla.ac.uk Abstract In this paper we present a query-biased summarisation
More informationInformation Retrieval Tools for Efficient Data Searching using Big Data
ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 4 Issue 4; July-August-2017; Page No. 06-12 Information Retrieval Tools
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationAn Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst
An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationTilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval
Tilburg University Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Publication date: 2006 Link to publication Citation for published
More informationWhen Information Retrieval Measures Agree About the Relative Quality of Document Rankings
When Information Retrieval Measures Agree About the Relative Quality of Document Rankings Robert M. Losee SILS, Manning Hall, CB#3360, University of North Carolina-Chapel Hill, Chapel Hill, NC 27599-3360.
More informationDocument Structure Analysis in Associative Patent Retrieval
Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,
More informationThe impact of query structure and query expansion on retrieval performance
The impact of query structure and query expansion on retrieval performance Jaana Kekäläinen & Kalervo Järvelin Department of Information Studies University of Tampere Published in Croft, W.B. & Moffat,
More informationEffective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation
Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation Praveen Pathak Michael Gordon Weiguo Fan Purdue University University of Michigan pathakp@mgmt.purdue.edu mdgordon@umich.edu
More informationThe Design Space of Software Development Methodologies
The Design Space of Software Development Methodologies Kadie Clancy, CS2310 Term Project I. INTRODUCTION The success of a software development project depends on the underlying framework used to plan and
More informationDiscrete Optimization. Lecture Notes 2
Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The
More informationRobust Relevance-Based Language Models
Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new
More informationINFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model. Final Group Projects
INFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model Peter Brusilovsky http://www2.sis.pitt.edu/~peterb/2140-051/ Final Group Projects Groups of variable
More informationEvaluating the effectiveness of content-oriented XML retrieval
Evaluating the effectiveness of content-oriented XML retrieval Norbert Gövert University of Dortmund Norbert Fuhr University of Duisburg-Essen Gabriella Kazai Queen Mary University of London Mounia Lalmas
More informationDiscounted Cumulated Gain based Evaluation of Multiple Query IR Sessions
Preprint from: Järvelin, K. & Price, S. & Delcambre, L. & Nielsen, M. (2008). Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions. In: Ruthven, I. & al. (Eds.), Proc. of the 30th European
More informationA World Wide Web-based HCI-library Designed for Interaction Studies
A World Wide Web-based HCI-library Designed for Interaction Studies Ketil Perstrup, Erik Frøkjær, Maria Konstantinovitz, Thorbjørn Konstantinovitz, Flemming S. Sørensen, Jytte Varming Department of Computing,
More informationINFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE
15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find
More informationA User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces
A user study on features supporting subjective relevance for information retrieval interfaces Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S. (2006). Proc. 9th International Conference of Asian Digital Libraries
More informationModeling Systems Using Design Patterns
Modeling Systems Using Design Patterns Jaroslav JAKUBÍK Slovak University of Technology Faculty of Informatics and Information Technologies Ilkovičova 3, 842 16 Bratislava, Slovakia jakubik@fiit.stuba.sk
More informationQUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL
QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es
More informationPercent Perfect Performance (PPP)
Percent Perfect Performance (PPP) Information Processing & Management, 43 (4), 2007, 1020-1029 Robert M. Losee CB#3360 University of North Carolina Chapel Hill, NC 27599-3360 email: losee at unc period
More informationTREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood
TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System Paul Thompson Howard Turtle Bokyung Yang James Flood West Publishing Company Eagan, MN 55123 1 Introduction The WIN retrieval engine
More informationEquivalence Detection Using Parse-tree Normalization for Math Search
Equivalence Detection Using Parse-tree Normalization for Math Search Mohammed Shatnawi Department of Computer Info. Systems Jordan University of Science and Tech. Jordan-Irbid (22110)-P.O.Box (3030) mshatnawi@just.edu.jo
More informationResPubliQA 2010
SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first
More informationQuery Likelihood with Negative Query Generation
Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer
More informationIR evaluation methods for retrieving highly relevant documents
IR evaluation methods for retrieving highly relevant documents Kalervo J~irvelin & Jaana Kekiil~iinen University of Tampere Department of Information Studies FIN-33014 University of Tampere FINLAND Emaih
More informationOutline of the module
Evolutionary and Heuristic Optimisation (ITNPD8) Lecture 2: Heuristics and Metaheuristics Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ Computing Science and Mathematics, School of Natural Sciences University
More informationINFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT
ABSTRACT INFORMATION RETRIEVAL SYSTEM USING FUZZY SET THEORY - THE BASIC CONCEPT BHASKAR KARN Assistant Professor Department of MIS Birla Institute of Technology Mesra, Ranchi The paper presents the basic
More informationImproving Suffix Tree Clustering Algorithm for Web Documents
International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal
More informationCity, University of London Institutional Repository
City Research Online City, University of London Institutional Repository Citation: Vakkari, P., Jones, S., MacFarlane, A. & Sormunen, E. (2004). Query exhaustivity, relevance feedback and search success
More informationA model of information searching behaviour to facilitate end-user support in KOS-enhanced systems
A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems Dorothee Blocks Hypermedia Research Unit School of Computing University of Glamorgan, UK NKOS workshop
More informationRetrieval Evaluation. Hongning Wang
Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User
More informationA NEW PERFORMANCE EVALUATION TECHNIQUE FOR WEB INFORMATION RETRIEVAL SYSTEMS
A NEW PERFORMANCE EVALUATION TECHNIQUE FOR WEB INFORMATION RETRIEVAL SYSTEMS Fidel Cacheda, Francisco Puentes, Victor Carneiro Department of Information and Communications Technologies, University of A
More informationCOPYRIGHTED MATERIAL. Number Systems. 1.1 Analogue Versus Digital
1 Number Systems The study of number systems is important from the viewpoint of understanding how data are represented before they can be processed by any digital system including a digital computer. It
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationEffect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching
Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna
More informationConceptual document indexing using a large scale semantic dictionary providing a concept hierarchy
Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence
More informationWEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS
1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,
More informationEnabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines
Appears in WWW 04 Workshop: Measuring Web Effectiveness: The User Perspective, New York, NY, May 18, 2004 Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Anselm
More informationReactive Ranking for Cooperative Databases
Reactive Ranking for Cooperative Databases Berthier A. Ribeiro-Neto Guilherme T. Assis Computer Science Department Federal University of Minas Gerais Brazil berthiertavares @dcc.ufmg.br Abstract A cooperative
More informationExamining the Authority and Ranking Effects as the result list depth used in data fusion is varied
Information Processing and Management 43 (2007) 1044 1058 www.elsevier.com/locate/infoproman Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Anselm Spoerri
More informationDesign of Experiments for Coatings
1 Rev 8/8/2006 Design of Experiments for Coatings Mark J. Anderson* and Patrick J. Whitcomb Stat-Ease, Inc., 2021 East Hennepin Ave, #480 Minneapolis, MN 55413 *Telephone: 612/378-9449 (Ext 13), Fax: 612/378-2152,
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationIndexing and Retrieving Medical Literature
Evaluation of SAPHIRE: An Automated Approach to Indexing and Retrieving Medical Literature William Hersh, M.D. David H. Hickam, M.D., M.P.H. Oregon Health Sciences University Portland, Oregon, USA R. Brian
More informationMetaheuristic Optimization with Evolver, Genocop and OptQuest
Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:
More informationInternational Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.
A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More informationM erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l
M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l Anette Hulth, Lars Asker Dept, of Computer and Systems Sciences Stockholm University [hulthi asker]ø dsv.su.s e Jussi Karlgren Swedish
More informationA New Measure of the Cluster Hypothesis
A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer
More informationA Simulation Based Comparative Study of Normalization Procedures in Multiattribute Decision Making
Proceedings of the 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases, Corfu Island, Greece, February 16-19, 2007 102 A Simulation Based Comparative Study of Normalization
More informationChapter 9. Software Testing
Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of
More informationA probabilistic description-oriented approach for categorising Web documents
A probabilistic description-oriented approach for categorising Web documents Norbert Gövert Mounia Lalmas Norbert Fuhr University of Dortmund {goevert,mounia,fuhr}@ls6.cs.uni-dortmund.de Abstract The automatic
More informationFrom Scratch to the Web: Terminological Theses at the University of Innsbruck
Peter Sandrini University of Innsbruck From Scratch to the Web: Terminological Theses at the University of Innsbruck Terminology Diploma Theses (TDT) have been well established in the training of translators
More informationAdvanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University
Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University http://disa.fi.muni.cz The Cranfield Paradigm Retrieval Performance Evaluation Evaluation Using
More informationInteractive segmentation, Combinatorial optimization. Filip Malmberg
Interactive segmentation, Combinatorial optimization Filip Malmberg But first... Implementing graph-based algorithms Even if we have formulated an algorithm on a general graphs, we do not neccesarily have
More informationThe Relationships between Domain Specific and General- Purpose Languages
The Relationships between Domain Specific and General- Purpose Languages Oded Kramer and Arnon Sturm Department of Information Systems Engineering, Ben-Gurion University of the Negev Beer-Sheva, Israel
More informationEnhancing Internet Search Engines to Achieve Concept-based Retrieval
Enhancing Internet Search Engines to Achieve Concept-based Retrieval Fenghua Lu 1, Thomas Johnsten 2, Vijay Raghavan 1 and Dennis Traylor 3 1 Center for Advanced Computer Studies University of Southwestern
More informationApplying Fuzzy Sets and Rough Sets as Metric for Vagueness and Uncertainty in Information Retrieval Systems
Applying Fuzzy Sets and Rough Sets as Metric for Vagueness and Uncertainty in Information Retrieval Systems Nancy Mehta,Neera Bawa Lect. In CSE, JCDV college of Engineering. (mehta_nancy@rediffmail.com,
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationQuery Modifications Patterns During Web Searching
Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan
More informationIntegrating Probabilistic Reasoning with Constraint Satisfaction
Integrating Probabilistic Reasoning with Constraint Satisfaction IJCAI Tutorial #7 Instructor: Eric I. Hsu July 17, 2011 http://www.cs.toronto.edu/~eihsu/tutorial7 Getting Started Discursive Remarks. Organizational
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationInformativeness for Adhoc IR Evaluation:
Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,
More informationH. W. Kuhn. Bryn Mawr College
VARIANTS OF THE HUNGARIAN METHOD FOR ASSIGNMENT PROBLEMS' H. W. Kuhn Bryn Mawr College The author presents a geometrical modelwhich illuminates variants of the Hungarian method for the solution of the
More informationInternational Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 106 Self-organizing behavior of Wireless Ad Hoc Networks T. Raghu Trivedi, S. Giri Nath Abstract Self-organization
More informationCategory Theory in Ontology Research: Concrete Gain from an Abstract Approach
Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Markus Krötzsch Pascal Hitzler Marc Ehrig York Sure Institute AIFB, University of Karlsruhe, Germany; {mak,hitzler,ehrig,sure}@aifb.uni-karlsruhe.de
More informationAutomatic Generation of Query Sessions using Text Segmentation
Automatic Generation of Query Sessions using Text Segmentation Debasis Ganguly, Johannes Leveling, and Gareth J.F. Jones CNGL, School of Computing, Dublin City University, Dublin-9, Ireland {dganguly,
More informationEvaluating a Visual Information Retrieval Interface: AspInquery at TREC-6
Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Russell Swan James Allan Don Byrd Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:
IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T
More informationIteration vs Recursion in Introduction to Programming Classes: An Empirical Study
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 4 Sofia 2016 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2016-0068 Iteration vs Recursion in Introduction
More informationInformation Retrieval
Information Retrieval Test Collections Gintarė Grigonytė gintare@ling.su.se Department of Linguistics and Philology Uppsala University Slides based on previous IR course given by K.F. Heppin 2013-15 and
More informationInclusion of Aleatory and Epistemic Uncertainty in Design Optimization
10 th World Congress on Structural and Multidisciplinary Optimization May 19-24, 2013, Orlando, Florida, USA Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization Sirisha Rangavajhala
More informationContext based Re-ranking of Web Documents (CReWD)
Context based Re-ranking of Web Documents (CReWD) Arijit Banerjee, Jagadish Venkatraman Graduate Students, Department of Computer Science, Stanford University arijitb@stanford.edu, jagadish@stanford.edu}
More informationDevelopment of a tool for the easy determination of control factor interaction in the Design of Experiments and the Taguchi Methods
Development of a tool for the easy determination of control factor interaction in the Design of Experiments and the Taguchi Methods IKUO TANABE Department of Mechanical Engineering, Nagaoka University
More informationUsing Clusters on the Vivisimo Web Search Engine
Using Clusters on the Vivisimo Web Search Engine Sherry Koshman and Amanda Spink School of Information Sciences University of Pittsburgh 135 N. Bellefield Ave., Pittsburgh, PA 15237 skoshman@sis.pitt.edu,
More informationAdding user context to IR test collections
Adding user context to IR test collections Birger Larsen Information Systems and Interaction Design Royal School of Library and Information Science Copenhagen, Denmark blar @ iva.dk Outline RSLIS and ISID
More informationUsing Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department
Using Statistical Properties of Text to Create Metadata Grace Crowder crowder@cs.umbc.edu Charles Nicholas nicholas@cs.umbc.edu Computer Science and Electrical Engineering Department University of Maryland
More informationHigher Order Refinement Heuristics for Rule Validation
From: FLAIRS-02 Proceedings. Copyright 2002, AAAI (www.aaai.org). All rights reserved. Higher Order Refinement Heuristics for Rule Validation Hans - Werner Kelbassa Email: kelbassa@uni-paderborn.de Abstract
More informationDOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA
Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion
More informationAssignment 1. Assignment 2. Relevance. Performance Evaluation. Retrieval System Evaluation. Evaluate an IR system
Retrieval System Evaluation W. Frisch Institute of Government, European Studies and Comparative Social Science University Vienna Assignment 1 How did you select the search engines? How did you find the
More informationOrganizing Information. Organizing information is at the heart of information science and is important in many other
Dagobert Soergel College of Library and Information Services University of Maryland College Park, MD 20742 Organizing Information Organizing information is at the heart of information science and is important
More informationABSTRACT 1. INTRODUCTION
ABSTRACT A Framework for Multi-Agent Multimedia Indexing Bernard Merialdo Multimedia Communications Department Institut Eurecom BP 193, 06904 Sophia-Antipolis, France merialdo@eurecom.fr March 31st, 1995
More informationMobile Query Interfaces
Mobile Query Interfaces Matthew Krog Abstract There are numerous alternatives to the application-oriented mobile interfaces. Since users use their mobile devices to manage personal information, a PIM interface
More information2 Approaches to worldwide web information retrieval
The WEBFIND tool for finding scientific papers over the worldwide web. Alvaro E. Monge and Charles P. Elkan Department of Computer Science and Engineering University of California, San Diego La Jolla,
More informationMultimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency
Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following
More informationFORMALIZED SOFTWARE DEVELOPMENT IN AN INDUSTRIAL ENVIRONMENT
FORMALIZED SOFTWARE DEVELOPMENT IN AN INDUSTRIAL ENVIRONMENT Otthein Herzog IBM Germany, Dept. 3100 P.O.Box 80 0880 D-7000 STUTTGART, F. R. G. ABSTRACT tn the IBM Boeblingen Laboratory some software was
More informationTaxonomies and controlled vocabularies best practices for metadata
Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley
More informationCHAPTER 5 Querying of the Information Retrieval System
5.1 Introduction CHAPTER 5 Querying of the Information Retrieval System Information search and retrieval involves finding out useful documents from a store of information. In any information search and
More informationSubjective Relevance: Implications on Interface Design for Information Retrieval Systems
Subjective : Implications on interface design for information retrieval systems Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S (2005). Proc. 8th International Conference of Asian Digital Libraries (ICADL2005),
More informationA RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH
A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements
More informationJoho, H. and Jose, J.M. (2006) A comparative study of the effectiveness of search result presentation on the web. Lecture Notes in Computer Science 3936:pp. 302-313. http://eprints.gla.ac.uk/3523/ A Comparative
More information