The impact of query structure and query expansion on retrieval performance

Size: px
Start display at page:

Download "The impact of query structure and query expansion on retrieval performance"

Transcription

1 The impact of query structure and query expansion on retrieval performance Jaana Kekäläinen & Kalervo Järvelin Department of Information Studies University of Tampere Published in Croft, W.B. & Moffat, A. & van Rijsbergen, C.J. & Wilkinson, R. & Zobel, J. (Eds.), Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR '98), Melbourne, Australia, August 24-28, New York, NY: ACM Press, pp

2 The impact of query structure and query expansion on retrieval performance Jaana Kekäläinen Dept. of Information Studies University of Tampere Abstract The effects of query structures and query expansion (QE) on retrieval performance were tested with a best match retrieval system (INQUERY 1 ). Query structure means the use of operators to express the relations between search keys. Eight different structures were tested, representing weak structures (averages and weighted averages of the weights of the keys) and strong structures (e.g., queries with more elaborated search key relations). QE was based on concepts, which were first selected from a conceptual model, and then expanded by semantic relationships given in the model. The expansion levels were (a) no expansion, (b) a synonym expansion, (c) a narrower concept expansion, (d) an associative concept expansion, and (e) a cumulative expansion of all other expansions. With weak structures and Boolean structured queries, QE was not very effective. The best performance was achieved with one of the strong structures at the largest expansion level. 1 Introduction Query formulation and expansion have been studied extensively because the selection of good search keys is difficult but crucial for good search results. Query formulation is typically based on search keys given by the user. The original query does not always give satisfactory results, thus it may be reformulated by adding search keys with or without reweighting, which process is known as query expansion (QE, for short). QE may be based either on search results or knowledge structures [10]. Relevance feedback is a QE method based on search results, which has often proved to be beneficial, but its effectiveness depends on queries, ranking function and number of relevant items known, i.e., on the quality of the search results [2], [3], [6], [11], [33]. QE based on knowledge structures (e.g., thesauri) does not depend on search output, but it has not been found unambiguously useful. Statistical, collection based QE methods have been prevalent in research. Jing and Croft [17] report that an automatically constructed association thesaurus improves retrieval performance. Voorhees [32] and Jones and others [18] tested general, intellectually constructed thesauri in QE. They found no notable improvement in retrieval performance. QE with a domain specific thesaurus was tested by Kristensen [21] and Järvelin and others [19]. They report that thesaurus based QE enhances retrieval performance both in Boolean and best match environments. Nevertheless, these studies do not consider the interaction of query structure, Permission to make digital/hard copy of all part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or fee. SIGIR 98, Melbourne, Australia 1998 ACM /98 $5.00. Kalervo Järvelin Dept. of Information Studies University of Tampere query expansion, and the number of search concepts and search keys. Thus, we suspect that the usefulness of different types of thesauri (i.e., collection dependent or domain specific, automatically or intellectually constructed) in QE cannot be settled without studying the structural aspects of queries. Characteristics of requests and queries are usually not reported in detail. Requests seem to be of the type 'conscious topical' [16], and often rather verbose, like early TREC topics. The effect of this is that queries have usually been broad, i.e., they include many search keys, since they are automatically produced from written requests. They are not similar to real users' short queries, which typically include only a few words. Lu and Keefer [22] tested the effect of query size on retrieval effectiveness by reducing and expanding TREC queries. They report that reduction was detrimental but QE based on automatically constructed associative thesaurus enhanced retrieval performance. In addition, they found some evidence that the effectiveness of different indexing/retrieval methods was dependent on the number of search keys in queries. In other tests the broadness of unexpanded queries has been varied and, quite logically, shorter queries gain most from QE [17], [32]. In a best match retrieval system the retrieved documents are scored by combining (e.g., by product or sum) the weights of the keys appearing in a document and a query [20], [28]. The obvious ways to change the scoring are changing the index key weighting, changing the calculation of the scores, or weighting the search keys. Croft and Das [9] tested tf.idf weighting modifications and search key weighting with phrase information and QE. They found that search key weighting and QE improve performance. Their approach has similarities to ours, but they tested different types of weighting schemes, not different query structures, and the source of the QE vocabulary was different. In the literature a difference is typically made between natural language and Boolean queries, the former treating all search keys equally, the latter showing relations between the search keys. Because these relationships as the structure of a query are thought to be useful, Boolean operators are given interpretations in best match systems (e.g., [25], [29]). With query structure we refer to the calculation of scores from individual weights of the keys, which is guided by the query, explicitly shown by operators. The structure of queries may be described as weak (queries with a single operator, no differentiated relations between search keys) or strong (queries with several operators, different relationships between search keys). More precisely, strong query structures are based on facets. Each facet indicates one aspect of the request, and is represented by a set of concepts, which, in turn, are expressed 1 The INQUERY software was provided by the Information Retrieval Laboratory, University of Massachusetts Computer Science Department, Amherst, MA, USA.

3 by a set of search keys. Some evidence of better performance for strong structured queries exists (e.g., [5], [15], [29]), though not in QE studies. Queries with different types of operators, e.g., 'soft' interpretations of the Boolean operators and types of 'probabilistic' operators, have been combined [4], [5], [24], [26]. This method leads to rather complicated structures, but improves performance. Because neither the number nor the overlap of search keys in different queries is reported, it is hard to judge the possible effects of the structure and broadness on performance. In this study we analyse retrieval performance of different query formulation methods in a probabilistic retrieval system. The interaction and effects of the following variables are tested: the number of search concepts the number of search strings representing concepts QE with different semantic relationships the combination of weights in scoring, or query structures. Query formulation is based on concepts, and QE is based on a thesaurus. First, concepts representing requests are selected from the thesaurus, second, queries are formulated in which the number of concepts and the number of search strings representing the concepts are varied by QE, and the query structure of strings for scoring is varied. This study, however, is not about thesauri but, instead, about query structures and expansion extent the thesaurus is just a source for expansion keys. Our attempt is not to compare this approach to unaugmented human performance, nor do we present claims for a conceptual approach. In this study it is used to provide various kinds of expansions in a controlled way. 2 Method 2.1 Test environment The test environment was a text database containing Finnish newspaper articles operated under the INQUERY retrieval system (version 3.1). The database contained articles published in three Finnish newspapers. The average article length was 233 words, and typical paragraphs were two or three sentences in length. The database index contained all keys in their morphological basic forms. In the basic word form analysis all compound words were split into their component words in their morphological basic forms. For the database there is a collection of 35 requests, which are 1-2 sentences long, in the form of written information need statements. For these requests there is a recall base of articles. It is collected by pooling the result sets of hundreds of different queries formulated of the requests in different studies, using both exact and partial match retrieval. For a set of tests concerning query structures 30 requests were selected on the basis of their expandability, i.e., they provided possibilities for studying the interaction of the test parameters. For each request 78 different queries were formulated and retrieved with a result set size of 50 documents (document cut-off value [DCV] 50). The pool of these result sets contained 7010 articles that were not in the recall base. Being able to find through 2340 queries of varying structure and expansion including also various conceptual approaches 7010 novel articles which did not yield more than 42 new relevant, we strongly believe that practically all relevant documents in the database have been identified. We thus believe that our recall estimates are valid. The number of known relevant documents for the 30 queries is The INQUERY system was chosen for the test, because it has a wide range of operators, including probabilistic interpretations of the Boolean operators, and it allows search key weighting. Moreover, INQUERY has shown good performance in several tests (e.g., [1], [12], [33]). INQUERY is based on Bayesian inference networks (see, [1], [29]). All keys are attached with a belief value, which is approximated by the following tf.idf modification: log N tfij * tfij * dlj * dfi log(n +1. 0) adl where tfij = the frequency of the key i in the document j dlj = the length of document j (as a number of keys) adl = average document length in the collection N = collection size (as a number of documents) dfi = number of documents containing key i. The INQUERY query language provides a set of operators to specify relations between search keys. As with Boolean operators it is possible to construct facets, and mark relationships between concepts. The interpretations for the operators used in this study are given below: P and (Q 1, Q 2,..., Q n ) = p 1 *p 2 *... *p n P or (Q 1, Q 2,..., Q n ) = 1 (1 p 1 )*(1 p 2 )*... *(1 p n ) P sum (Q 1, Q 2,..., Q n ) = (p 1 +p p n )/ n P wsum (w s, w 1 Q 1, w 2 Q 2,..., w n Q n ) = w s (w 1 p 1 +w 2 p w n p n ) / (w 1 +w w n ) where P denotes probability, Q i is either a string or an INQUERY expression, p i, i = 1...n, is the belief value of Q i, w i, i = 1...n, is the weight of Q i, and w s is a weight given for a clause [24], [29]. The probability for operands connected by SYN operator is calculated by modifying the tf.idf function as follows: log N +0.5 tfij * i S tfij * dlj * dfs log(n +1.0) i S adl where tfij = the frequency of the key i in the document j S = a set of search keys within the SYN operator dlj = the length of document j (as a number of keys) adl = average document length in the collection N = collection size (as a number of documents) dfs = number of documents containing at least on key of the set S. 2.2 Concept based query formulation and expansion Three levels a conceptual, linguistic and string level can be differentiated in query formulation [30]. Differentiating between conceptual and linguistic levels has several advantages: it becomes obvious that a concept and a facet may be represented by several expressions; these expressions may be varied according to the search environment; the expressions may be selected from any language; the searcher may just select concepts without having to concern about search keys, which may be attached to each concept through a conceptual model and further automatically formed to a query. This idea is developed by Järvelin and others [19] to a thesaurus data model for QE. In the model concepts and their relations are represented at the conceptual level. Each concept is

4 represented by one or more natural language expressions at the linguistic level. The expressions for one concept are synonyms. At string level each expression is represented by one or more search strings, which also model, when needed, truncation and phrase component proximity. Matching in IR is done at the string, not at the linguistic level. We adopted the thesaurus data model for query formulation and expansion. The test thesaurus was aimed for QE in a database of Finnish newspaper articles, thus, its language was Finnish. The vocabulary was collected from exhaustive Boolean queries constructed for the test requests in another study [27], and from dictionaries. The thesaurus included 832 concepts, 1345 expressions for the concepts, and 1558 search strings for the expressions. Concepts have hierarchic (generic, partitive and instance) and association relationships. Expressions representing concepts are each other's synonyms or quasi-synonyms, i.e., equivalence exists between expressions at the linguistic level. The most typical or obvious of the synonyms, in linguistic sense, was chosen as a principal expression, term, of the concept. The relations between concepts and between expressions are valid for the whole subject area, i.e., they are standard thesaurus or semantic relations. Each expression may be represented by several strings, which are spelling variants, or, in case of phrases, formed with different proximity operations. There is no variation caused by word truncation because the words are in their basic forms in the database index. The thesaurus is a database managed with the ExpansionTool, which is a tool for concept based query construction and expansion (see, [19]). In query formulation the researchers recognised facets representing different aspects of the request and selected concepts for the facets from the thesaurus in accordance with the request. Concepts were organised in facets according to a Boolean block search strategy, i.e. concepts representing the same aspect form an OR block, which blocks are combined with the AND operator. Real searchers were not involved in the query formulation because this study seeks to test the effect of structural and representation parameters on the search performance, not to find out how real searchers would have used the thesaurus, nor to evaluate the performance of their queries. In this study the number of facets in a query is called complexity, and the total number of concepts in facets is coverage. The number of facets was fixed in each request, i.e., the queries included all aspects of the request. The coverage of an unexpanded query was also determined by the request. Coverage was varied in QE by adding semantically related concepts of the original concepts to the query, first narrower and then associative concepts (conceptual expansion). At the linguistic level concepts were represented by expressions. In the unexpanded query each concept was represented by its term. In synonym expansion all synonymous expressions of the term were added to the query. Expressions were then represented by search strings. All strings corresponding to an expression were always added to the query, thus, there is no expansion at the string level. The total number of search strings in a query is called broadness. Although the relation between expressions and strings was not one to one but one to many (1.2 strings per expression, on the average), we do not consider the number of expressions separately. Broadness was counted at the string level because matching is based on search strings. We refer to search strings as (search) keys. The different levels of query formulation and expansion are demonstrated with an example 1. Let us assume the following request: Storage of radioactive waste produced in nuclear power plants. Examples of risks and accidents. At the conceptual level four facets with altogether seven concepts are recognised from the request. But conceptual level is an abstraction, because concepts cannot be discussed without names, thus when the query is represented as a Boolean block search, it is at the linguistic level, and concepts are represented by terms. A sample conceptual query plan follows: nuclear power plants AND radioactive waste AND storage AND (risk OR accident) At the string level terms are replaced by search keys. The query is expressed with syntax of the query language. Our sample query is first formulated into the Boolean query structure using the query language of INQUERY. An unexpanded query, (E0), contains the search concepts selected on the basis of the request. The concepts are represented by terms. The sample unexpanded query is following: #and(#3nuclear power plant) #3(radioactive waste) storage #or(risk accident)) 2 With the synonym expansion (Es) synonyms of the terms are added to the query. The sample query with the synonym expansion is the following: #and(#or(#3(nuclear power plant) #3(nuclear station) #3(atomic power plant)) #or(#3(radioactive waste) #3(nuclear waste)) #or(storage store) #or(risk accident danger hazard)) In the narrower concept expansion (En) the expressions terms and their synonyms of the narrower concepts of the original search concept are added to the query. Thus, synonyms are also included in this expansion. The sample query is following: #and(#or(#3(nuclear power plant) #3(atomic power plant) #3(nuclear station) #3(atomic reactor) #3(nuclear reactor)) #or(#3(radioactive waste) #3(nuclear waste) #3(high active waste) #3(low active waste)) #or(storage store) #or(risk accident danger hazard disaster catastrophe)) In the next expansion (Ea) the expressions of the associative concepts of the original search concepts are added to the original unexpanded query, that is expressions representing the narrower concepts are not included in this query. The sample query with the associative concept expansion is as follows: #and(#or(#3(nuclear power plant) #3(atomic power plant) #3(nuclear station) #3(nuclear energy) #3(nuclear power)) #or(#3(radioactive waste) #3(nuclear waste) #3(spent fuel) #3(fission product)) #or(storage store) #or(risk accident danger hazard)) All the previous expansions are accumulated in the largest expansion (Et). Practically, Et expansion is a bag of En and Ea expansions, because these include all search keys. The sample query with the largest expansion is following: 1 Test queries were in Finnish. The translation of the example may not be exact, but should clarify the idea of the test. The example is shortened, i.e., the expanded sample queries are not complete with all search keys. 2 NB. #3 is a proximity operator in INQUERY query language. The keys within an #N operator must be found within N words of each other in the text in order to contribute to the document's belief value.

5 #and(#or(#3(nuclear power plant) #3(atomic power plant) #3(nuclear station) #3(atomic reactor) #3(nuclear reactor) #3(nuclear energy) #3(nuclear power)) #or(#3(radioactive waste) #3(nuclear waste) #3(high active waste) #3(low active waste) #3(spent fuel) #3(fission product)) #or(storage store) #or(risk accident danger hazard disaster catastrophe)) In highly agglutinative languages, like Finnish and German, compound words are often spelled together (e.g., ydinvoimalaitos, atomkraftwerk, which mean nuclear power plant). Compound word splitting makes all parts of the word searchable. In the test setting the database index contains both the whole compound words and their parts. Compound word splitting has also a query expansion effect. Many hierarchical relations in Finnish are based on compound words (e.g., tehdas -> paperitehdas -> hienopaperitehdas which means factory -> paper mill -> paper mill producing fine paper). When compound words are split, the use of any component word as a search key will match all compound words that include the component. Typically searchers might use heads (like tehdas/mill) to get all the narrower concepts. Usually there is no way to control whether the search key matches only heads, or any other compound part, thus, false matches occur. However, as the compound words contain the search key, many of them have a right kind of association to the search key. To some extent compound word splitting causes automatically narrower concept expansion and associative concept expansion to queries. This reduces the effects of these expansion types, but on the other hand the test setting resembles more other, less agglutinative languages. 2.3 Query structures The structures used to combine the search keys are explained in the following. Examples of the different structures based on the sample request are given with the narrower concept expansion. SUM (average of the weights of keys) and WSUM1 (weighted average of the weights of keys) queries represented weak structures. An unexpanded SUM query was constructed of the original concepts of a request, and each concept was represented by a single key or a set of keys corresponding to the term but without phrases. In the expansions all expressions were added as single words, i.e., no phrases were included. (Other structures included phrases expressed with proximity operators.) Weighting original keys higher than expansion keys is a usual method in QE. In WSUM queries expansion keys and the keys of equivalent expressions (i.e., synonyms for the terms of the original concepts) were given smaller weights (1) than the keys the original concepts (2). In WSUM2 queries keys were weighted according to the type of expansion they belonged to: the weight of term keys was 10, of synonym keys 9, of narrower concept keys 7, and of associative concept keys 5. Examples of these structures are given below: SUM #sum(nuclear power plant nuclear station atomic power plant atomic reactor nuclear reactor radioactive waste nuclear waste storage store risk accident danger hazard disaster catastrophe) WSUM1 #wsum(1 2 #3(nuclear power plant) 1 #3(nuclear station) 1 #3(atomic power plant) 1 #3(atomic reactor) 1 #3(nuclear reactor) 2 #3(radioactive waste) 1 #3(nuclear waste) 2 storage 1 store 2 risk 2 accident 1 danger 1 hazard 1 disaster 1 catastrophe) WSUM2 #wsum(1 10 #3(nuclear power plant) 9 #3(nuclear station) 9 #3(atomic power plant) 7 #3(atomic reactor) 7 #3(nuclear reactor) 10 #3(radioactive waste) 9 #3(nuclear waste) 10 storage 9 store 10 risk 10 accident 9 danger 9 hazard 7 disaster 7 catastrophe) A query with Boolean operators (BOOL ) was constructed using the block search strategy. It is a typical facet based query structure. The interpretations of Boolean operators are given above, i.e., queries were not strict Boolean queries. Earlier studies by Turtle [29], Belkin and others [5] and Hull [15] provide evidence that queries with Boolean operators are more effective than weak structured queries in best match retrieval systems. An example of Boolean query structure is given above (see, Chapter 2.2). In a SUM-of-synonym-groups-query (SSYN) each facet formed a clause with the SYN operator. SYN clauses were combined with the SUM operator. All keys within the SYN operator are treated as instances of one key. ASYN queries were similar to SSYN queries, but SYN groups were combined with the probabilistic AND operator, i.e., a product of facets was calculated instead of an average. SSYN #sum( #syn (#3(nuclear power plant) #3(atomic power plant) #3(nuclear station) #3(atomic reactor) #3(nuclear reactor)) #syn(#3(radioactive waste) #3(nuclear waste)) #syn(storage store) #syn(risk accident danger hazard disaster catastrophe)) ASYN #and(#syn(#3(nuclear power plant) #3(atomic power plant) #3(nuclear station) #3(atomic reactor) #3(nuclear reactor)) #syn(#3(radioactive waste) #3(nuclear waste)) #syn(storage store) #syn(risk accident danger hazard disaster catastrophe)) The XSUM query structure was a modification of the Boolean structure: facets were combined with SUM operator instead of AND operator. An average is less discriminating than product, thus, a failure or a lack in expansion of one facet does not affect the total score dramatically. In OR clauses extra weight was given to term keys through structure. All expansion keys for one facet formed a SUM clause which was combined to an OR clause with the term key(s) of the facet. XSUM #sum(#or (#3(nuclear power plant) #sum(#3(atomic power plant) #3(nuclear station) #3(atomic reactor) #3(nuclear reactor))) #or(#3(radioactive waste) #sum(#3(nuclear waste))) #or(storage #sum(store)) #or(risk accident #sum(danger hazard disaster catastrophe))) An OSUM query was a combination of Boolean queries consisting of term keys of different facets and SYN clauses consisting of expansion keys for each facet. Boolean queries and SYN clauses were combined with the SUM operator. This structure gave extra weight for search strings representing the original, unexpanded query. This approach was similar to query combination explored in some earlier studies (e.g., [4], [5], [26]). The unexpanded OSUM query was identical to the unexpanded BOOL query. OSUM #sum(#and(#3(nuclear power plant) #3(radioactive waste) storage #or(risk accident)) #syn(#3(atomic power plant) #3(nuclear station) #3(atomic reactor) #3(nuclear reactor))

6 #syn(#3(nuclear waste)) #syn(store) #syn(danger hazard disaster catastrophe)) In all there were eight query structure types, three of them representing weak structures (SUM, WSUM1-2) and five representing strong structures which were also facet structures (BOOL, SSYN, ASYN, OSUM, XSUM). Queries with different structures were constructed without expansion (E0) and expanded at all four expansion levels (Es, En, Ea, Et), except WSUM2 queries, which were either unexpanded or fully expanded (E0/Et) because of the weighting scheme. This setting generated 35 structurally different queries for each of the 30 requests, i.e., in all 1050 queries were run and evaluated for the present study. 3 Results The complexity of the queries ranged from 3 to 5, the average being 3.7. The average coverage of the unexpanded queries 4.9. The broadness of unexpanded queries when no phrases were marked (i.e., SUM structure) was 6.1 on the average, and for expanded queries without phrases, on the average, as follows: Es 18.5, En 30.6, Ea 49.5, Et The broadness of queries with phrases was E0 5.4, Es 14.1, En 24.4, Ea 42.1, Et 52.4, on the average. The number of documents in the result set, or document cut-off value (DCV), was fixed to 50. Average precision was then calculated over 11 cut-off points (1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50). Besides, average precision at 10 recall points (10-100) and ranks of query structures and expansion type combinations were calculated. DCV precision With three query structures QE proved beneficial at all expansion levels (SSYN, ASYN, XSUM), and the largest expansion (Et) was the best (Table 1). The precision scores of both SSYN or ASYN queries were very similar, and with QE notably better than precision scores of the other structures. The effect of QE on OSUM and WSUM1 queries was mixed. The Es, En and Ea expansions decreased the average precision of OSUM queries compared to the E0 level, but the Et expansion increased precision. The En expansion decreased the precision of WSUM1 queries, otherwise QE was beneficial. QE at any level was detrimental for three query structures (SUM, WSUM2, BOOL). Weighting the original search keys (terms) higher than expansion keys gave middle range results. Giving equal weights to all expansion keys (WSUM1) was more effective than giving weights according to the semantic relations (WSUM2). Without expansion the SUM queries were the most effective (Table 1). This is somewhat unexpected: queries without phrase information gave better precision than the queries with phrases. An explanation might be that a phrase as the only expression for a concept was a too strict condition, single search keys were more effective. The splitting of compound words for the database index might also have an effect because single words match to the components of compound words. Another explanation might be that the repetitive search keys, which resulted from relaxing the phrases, were not removed from queries. In INQUERY this kind of repetition within SUM or WSUM structures gives extra weight to the repeated keys. However, the advantage was lost with expansions, because the expansion without phrases increases the possibility of arbitrary search key combinations. All expansions were unbalanced since SUM structure does not take into account the relative importance of the search keys, nor does it recognise concepts or facets. In the WSUM queries the balance in expansions was sought by weighting original search keys higher than expansion keys, as suggested in literature. The ranking is based on weighted averages of the search key weights. The weighting supported somewhat QE, but the overall performance was not markedly better than the performance of the SUM queries. Thus, the problem of arbitrary search key combinations remained. For Boolean structured queries the result can be explained with the interpretation given to the AND and OR operators. If a search key is not present in a document it gets a default weight of 0.4, if the key is present it is given the tf.idf weight. The weight for all keys within the OR operator is calculated as follows: 1 (1 p 1 )*(1 p 2 )*... *(1 p n ), where p i is the weight of the search key i. The more search keys the OR clause includes, the higher weight it gets, whether the search keys are present or not. The weights of OR clauses are combined as a product. This leads to ranking where the OR clause or clauses with fewest keys are decisive, i.e., in the shortest OR clauses the presence or absence of search keys becomes decisive irrespective of the importance of the keys for the query. Query structures E-levels SUM WSUM1 WSUM2 BOOL SSYN ASYN OSUM XSUM E Es En Ea Et Table 1. Average precision over DCVs 1-50 for the query structures at varying QE levels. (N=30) Query structures QE-levels SUM WSUM1 WSUM2 BOOL SSYN ASYN OSUM XSUM E Es En Ea Et Table 2. Average ranks based on average precision over DCVs The OSUM and XSUM structures give extra weight for original search keys through structure. In the OSUM queries the Boolean structured clauses consisting of the original search keys ruled the performance over the expansion keys

7 formed to SYN clauses by facets. This held until the largest expansion, but even then the performance did not improve very much. In the XSUM structure the effects of QE remain rather modest. The SSYN and ASYN structures combine weights of search keys of a facet in a SYN clause, in which all search keys are treated as instances of one key. An average or a product is then calculated of the weights of SYN clauses. The facet structure formed into a SYN clause is the decisive component, the operation used to combine the weights of SYN clauses does not seem so important. Ranks and statistical significance Hull (1996) points out that when the number of relevant documents is very different for different requests, the variability in precision may be somewhat misleading. In this test the number of relevant documents ranged from 6 to 98. Hull proposes that the unequal variance in evaluation measure can be normalised by replacing the score for each method by ranking the scores of different methods with respect to each request. Table 2 gives the average ranks based on average precision scores over DCVs (NB. The best rank is 35, because the number of comparisons was five expansion levels times eight structures minus four missing cases, while three expansion levels are missing for WSUM2, and unexpanded queries for BOOL and OSUM, and WSUM1 and WSUM2 are identical.) To decide whether the differences between query structures with different coverage are statistically significant, we ran the Friedman test, which is based on ranks, i.e., a nonparametric alternative for comparing more than two related samples [8], [13]. The Friedman test showed that there was a significant difference between different structure and expansion combinations (p < 0.001). Pairwise comparisons were conducted to reveal the significant differences between combinations. Table 3 shows the differences when the best expansion was compared against others within each query structure. In all other structures, except XSUM, the difference between the best and the worst expansion was significant. Within the SUM and BOOL structures expansion led to several significantly worse combinations. SUM BOOL SSYN ASYN OSUM E0 > En Ea E0 >> En; Et >> E0 Et >> E0 Et > En Et E0 > Ea Et Table 3. Results of the Friedman test pairwise comparisons (the expression x > y refers to p < 0.01; x >> y refers to p < 0.001). The best expansions were then compared over the structures. Significant differences were found between the best SYN expansions and the following other groups: SSYN >> SUM WSUM1-2 BOOL OSUM XSUM; ASYN > XSUM; ASYN >> SUM WSUM1-2 BOOL OSUM. Precision at 10 recall levels Figure 1 visualises the results. It is based on average precision at 10 recall points and shows the worst query structure and expansion combination, and the best expansion of each query structure type. The worst case is the query with Boolean structure with the narrower concepts expansion (BOOL/En). In the middle there is a group of very similarly behaving structure and expansion combinations, the best Boolean and OSUM being slightly worse than SUM, WSUM1, and XSUM. Best SSYN and ASYN queries are high above others all the way, the difference being largest between % recall. Precision SSYN/Et WSUM/En ASYN/Et BOOL/E0 SUM/E0 WSUM/Et XSUM/Et BOOL/En Recall Figure 1. Precision-recall graph of a sample of query structure and QE combinations. Reading the results one should bear in mind that none of the structures or expansions was optimal for all requests. The smallest difference between best and worst query for one request was 14.4 percentage units, the largest was 89 percentage units. There were three requests for which the performance of unexpanded queries was better than the performance of any expanded query. The number of known relevant documents for these queries was rather low (8, 13, 14) which may partly explain the phenomenon: the variation in expressions discussing the topic might not have been great. The synonym expansion worked best for five queries, narrower concept expansion for five, associative concept expansion for nine, and largest expansion for 15 queries. (NB. Here the total number of queries is larger than 30 because some expansions gave the same result.) Among query structures, the SSYN and ASYN queries gave the highest precision for 18 requests, but at different expansion levels. SUM and WSUM1 were best structures for four requests, OSUM and BOOL for three requests, and XSUM for one request. Expanded Boolean queries were the worst performers for 19 requests. For three requests the ASYN or SSYN structured query was the worst. 4 Discussion and Conclusions We have tested concept based QE with different query structures in a best match retrieval system. The results show that the effects of QE on retrieval performance depend on the structure of the query. For weakly structured SUM and WSUM2 queries, and for Boolean structured queries QE was detrimental. The expanded Boolean queries gave the worst precision scores overall. The effect of QE on WSUM1 and OSUM queries varied at different expansion levels, but the largest expansion, including narrower and associative concepts, and synonyms, yielded the best precision score. With other query structures QE improved performance, and the

8 largest expansion was the best. The combination of the SSYN or ASYN structures with the largest expansion (Et) achieved the highest average precision scores. These were also significantly better than performance of any other query structure and expansion combination. In general, QE interacts with query structure: with a large expansion strong query structures seem necessary, but with a slight or no expansion weak structures perform well. Our results are supported in these Proceedings by Pirkola [23]. He shows in cross-language IR environment that query structures have effects on retrieval performance. Our results question earlier findings (e.g., [31], [32]) suggesting lower weighting of expansion keys. Our findings show that query structure is an essential factor in determining the effects of expansion. They also allow the view that expansion keys are not necessarily worse sources of evidence than the original request keys. This, again, is compatible with earlier findings ([18], [21]) about the small overlap of relevant document sets retrieved through different key combinations: relevant documents may have different vocabularies. QE within weak structures punishes relevant documents in scoring for the missing expansion keys. With strong (SYN) structures this does not happen and relevant documents with different vocabularies score high. It seems that the semantic division of relationships typical for thesauri was not particularly useful in QE. In most cases the best performance was obtained by the largest expansion including all semantic relationships. When the largest expansion did not work the explanation might be either that the thesaurus did not provide accurate relations and expressions for the concepts, or that the request was very precise, or the vocabulary of the topic was not very variable. Since the largest expansion performed generally best, it seems that any keys associated with the original keys, disregarding their relationship type, should be considered for QE. In statistical thesaurus construction statistically associated keys are identified but the type of the relationship cannot be ascertained. Statistical thesauri, combined with proper query structures, might supply us with a good query strategy. This is suggested by the results, though not shown by the present study, because no statistical thesaurus was compared. In a further study we will compare the expansion keys provided by a statistical and an intellectual thesaurus, as well as the effectiveness of QE based on both types of thesauri. Another theme to test is whether the effectiveness of relevance feedback could be enhanced by concept based QE before feedback. Acknowledgements. This study was funded by Emil Aaltosen Säätiö, Pirkanmaan Kulttuurirahasto, and National Multimedia Project, KaMu-Tila. References [1] J. Allan, J. Callan, B. Croft, L. Ballestros, J. Broglio, J. Xu & H. Shu. INQUERY at TREC 5. In: D.K. Harman & E.M. Voorhees (eds.) Information technology: The Fifth Text REtrieval Conference (TREC-5). Gaithersburg, MD: National Institute of Standards and Technology, 1997, [2] M.M. Beaulieu, M. Gatford, X. Huang, S.E. Robertson, S. Walker & P. Williams. Okapi at TREC 5. In: D.K. Harman & E.M. Voorhees (eds.) Information technology: The Fifth Text REtrieval Conference (TREC-5). Gaithersburg, MD: National Institute of Standards and Technology, 1997, [3] M. Beaulieu, S. Robertson & E. Rasmussen. Evaluating Interactive Systems in TREC. Journal of the American Society for Information Science, 47(1): 1996, [4] N. Belkin, C. Cool, W.B. Croft & J.P. Callan. The effect of multiple query representations of information retrieval performance. In: R. Korfhage, E.M. Rasmussen, & P. Willett, P. (eds.) Proc. of the 16th International Conference on Research and Development in Information Retrieval. New York, NY: ACM, 1993, [5] N. Belkin, P. Kantor, E.A. Fox & J.A. Shaw. Combining evidence of multiple query representations for information retrieval. Information Processing & Management, 31(3): 1995, [6] C. Buckley, G. Salton & J. Allan. The effect of adding relevance information in a relevance feedback environment. In: W.B. Croft & C.J. van Rijsbergen (eds.) Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, 1994, [7] J.P. Callan, W.B. Croft & S.M. Harding. The INQUERY retrieval system. In: Proceedings of the Third International Conference on Databases and Expert Systems Applications. Berlin: Springer Verlag, 1992, [8] W.J. Conover. Practical nonparametric statistics. 2nd ed. New York: John Wiley & Sons, [9] W.B. Croft & R. Das. Experiments with query acquisition and use in document retrieval systems. In: J-L. Vidick (ed.) Proc. of the 13th International Conference on Research and Development in Information Retrieval. Bruxelles: ACM, 1990, [10] E.N. Efthimiadis. Query expansion. In: M.E. Williams (ed.) Annual Review of Information Science and Technology, vol. 31. Medford, NJ: Information Today, 1996, [11] D.K. Harman. Relevance feedback revisited. In N. Belkin, P. Ingwersen & A. Mark Pejtersen (eds.) Proc. of the 15th Aannual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, 1992, [12] D.K. Harman (s.a.). Overview of the fourth text retrieval conference (TREC-4) [online]. [Cited ] Available from: <URL: [13] D. Hull. Using statistical testing in the evaluation of retrieval experiments. In: Korfhage, R., Rasmussen, E.M. & Willett, P. (eds.) Proc. of the 16th International Conference on Research and Development in Information Retrieval. New York, NY: ACM, 1993, [14] D. Hull. Stemming algorithms: a case study for detailed evaluation. Journal of the American Society for Information Science, 47(1): 1996, [15] D. Hull. Using structured queries for disambiguation in cross-language information retrieval. In: AAAI Spring Symposium on Cross-Language Text and Speech Retrieval Electronic Working Notes [online]. Stanford University, March 24-26, [Cited ] Available from: <URL: papers/ hull3.ps> [16] P. Ingwersen & P. Willett. An introduction to algorithmic and cognitive approaches for information retrieval. Libri, 45: 1995, [17] Y. Jing & W.B. Croft. An association thesaurus for information retrieval [online] [Cited ] Available from: <URL: biblo.html#1> [18] S. Jones, M. Gatford, S. Robertson, M. Hancock- Beaulieu & J. Secker. Interactive thesaurus navigation:

9 Intelligence rules ok? Journal of the American Society for Information Science 46(1): 1995, [19] K. Järvelin, J. Kristensen, T. Niemi, E. Sormunen & H. Keskustalo. A deductive data model for query expansion. In: H.-P. Frei, D. Harman, P. Schäuble & R. Wilkinson (eds.) Proc. of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: ACM, 1996, [20] P. Kantor. Information retrieval techniques. In M.E. Williams (ed.) Annual Review of Information Science and Technology. Vol. 29. Medford, NJ: Learned Information, 1994, [21] J. Kristensen. Expanding end-users query statements for free text searching with a search-aid thesaurus. Information Processing & Management, 29(6): 1993, [22] X.A. Lu & R.B. Keefer. Query expansion/reduction and its impact on retrieval effectiveness. In: D.K. Harman (ed.) The Third Text REtrieval Conference (TREC 3). Gaithersburg, MD: National Institute of Standards and Technology, 1995, [23] A. Pirkola. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, August , [24] T.B. Rajashekar & W.B. Croft. Combining automatic and manual index representations in probabilistic retrieval. Journal of the American Society for Information Science, 46(4): 1995, [25] G. Salton, E. Fox & H. Wu. Extended boolean information retrieval. Communications of the ACM, 26(11): 1983, [26] J.A. Shaw & E.A. Fox. Combination of multiple searches. In: D.K. Harman (ed.) The Third Text REtrieval Conference (TREC 3). Gaithersburg, MD: National Institute of Standards and Technology, 1995, [27] E. Sormunen. Vapaatekstihaun tehokkuus ja siihen vaikuttavat tekijät sanomalehtiaineistoa sisältävässä tekstikannassa [Free-text searching efficiency and factors affecting it in a newspaper article database]. Espoo, Finland: Valtion Teknillinen Tutkimuskeskus, VTT Julkaisuja 790, [In Finnish] [28] K. Sparck Jones. Reflection on TREC. Information Processing & Management, 31(3): 1995, [29] H.R. Turtle. Inference networks for document retrieval. Ph.D. Thesis. Computer and information Science Department, University of Massachusetts. COINS Technical Report 90 92, [30] UMLS. UMLS Knowledge Sources. 5th Experimental Edition. Bethesda, MD: National Library of Medicine, [31] Y.-C. Wang, J. Vandendorpe & M. Evans. Relational thesauri in information retrieval. Journal of the American Society for Information Science, 36(1): 1985, [32] E. Voorhees. Query expansion using lexical-semantic relations. In: W. Bruce Croft & C.J. van Rijsbergen (eds.) Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY:ACM, 1994, [33] J. Xu & W.B. Croft. Query expansion using local and global document analysis. In: H.-P. Frei, D. Harman, P. Schäuble, and R. Wilkinson (eds.) Proc. of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY:ACM, 1996, 4 11.

Inter and Intra-Document Contexts Applied in Polyrepresentation

Inter and Intra-Document Contexts Applied in Polyrepresentation Inter and Intra-Document Contexts Applied in Polyrepresentation Mette Skov, Birger Larsen and Peter Ingwersen Department of Information Studies, Royal School of Library and Information Science Birketinget

More information

IR evaluation methods for retrieving highly relevant documents

IR evaluation methods for retrieving highly relevant documents IR evaluation methods for retrieving highly relevant documents Kalervo J~irvelin & Jaana Kekiil~iinen University of Tampere Department of Information Studies FIN-33014 University of Tampere FINLAND Emaih

More information

TREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood

TREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System Paul Thompson Howard Turtle Bokyung Yang James Flood West Publishing Company Eagan, MN 55123 1 Introduction The WIN retrieval engine

More information

RMIT University at TREC 2006: Terabyte Track

RMIT University at TREC 2006: Terabyte Track RMIT University at TREC 2006: Terabyte Track Steven Garcia Falk Scholer Nicholas Lester Milad Shokouhi School of Computer Science and IT RMIT University, GPO Box 2476V Melbourne 3001, Australia 1 Introduction

More information

A New Measure of the Cluster Hypothesis

A New Measure of the Cluster Hypothesis A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer

More information

ExpansionTool: CONCEPT-BASED QUERY EXPANSION AND CONSTRUCTION

ExpansionTool: CONCEPT-BASED QUERY EXPANSION AND CONSTRUCTION ExpansionTool: CONCEPT-BASED QUERY EXPANSION AND CONSTRUCTION Kalervo Järvelin, Jaana Kekäläinen, Timo Niemi # Dept. of Information Studies, # Dept. of Computer and Information Sciences University of Tampere

More information

A Practical Passage-based Approach for Chinese Document Retrieval

A Practical Passage-based Approach for Chinese Document Retrieval A Practical Passage-based Approach for Chinese Document Retrieval Szu-Yuan Chi 1, Chung-Li Hsiao 1, Lee-Feng Chien 1,2 1. Department of Information Management, National Taiwan University 2. Institute of

More information

City, University of London Institutional Repository

City, University of London Institutional Repository City Research Online City, University of London Institutional Repository Citation: Vakkari, P., Jones, S., MacFarlane, A. & Sormunen, E. (2004). Query exhaustivity, relevance feedback and search success

More information

An Attempt to Identify Weakest and Strongest Queries

An Attempt to Identify Weakest and Strongest Queries An Attempt to Identify Weakest and Strongest Queries K. L. Kwok Queens College, City University of NY 65-30 Kissena Boulevard Flushing, NY 11367, USA kwok@ir.cs.qc.edu ABSTRACT We explore some term statistics

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

Term Frequency Normalisation Tuning for BM25 and DFR Models

Term Frequency Normalisation Tuning for BM25 and DFR Models Term Frequency Normalisation Tuning for BM25 and DFR Models Ben He and Iadh Ounis Department of Computing Science University of Glasgow United Kingdom Abstract. The term frequency normalisation parameter

More information

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst

More information

Information Retrieval Research

Information Retrieval Research ELECTRONIC WORKSHOPS IN COMPUTING Series edited by Professor C.J. van Rijsbergen Jonathan Furner, School of Information and Media Studies, and David Harper, School of Computer and Mathematical Studies,

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

Improving the Effectiveness of Information Retrieval with Local Context Analysis

Improving the Effectiveness of Information Retrieval with Local Context Analysis Improving the Effectiveness of Information Retrieval with Local Context Analysis JINXI XU BBN Technologies and W. BRUCE CROFT University of Massachusetts Amherst Techniques for automatic query expansion

More information

Percent Perfect Performance (PPP)

Percent Perfect Performance (PPP) Percent Perfect Performance (PPP) Information Processing & Management, 43 (4), 2007, 1020-1029 Robert M. Losee CB#3360 University of North Carolina Chapel Hill, NC 27599-3360 email: losee at unc period

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

A NOVEL METHOD FOR THE EVALUATION OF BOOLEAN QUERY EFFECTIVENESS ACROSS A WIDE OPERATIONAL RANGE

A NOVEL METHOD FOR THE EVALUATION OF BOOLEAN QUERY EFFECTIVENESS ACROSS A WIDE OPERATIONAL RANGE A NOVEL METHOD FOR THE EVALUATION OF BOOLEAN QUERY EFFECTIVENESS ACROSS A WIDE OPERATIONAL RANGE Eero Sormunen Department of Information Studies, University of Tampere P.O. Box 607, FIN 330 Tampere, Finland

More information

Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions

Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions Preprint from: Järvelin, K. & Price, S. & Delcambre, L. & Nielsen, M. (2008). Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions. In: Ruthven, I. & al. (Eds.), Proc. of the 30th European

More information

Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter

Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter university of copenhagen Københavns Universitet Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter Published in: Advances

More information

Real-time Query Expansion in Relevance Models

Real-time Query Expansion in Relevance Models Real-time Query Expansion in Relevance Models Victor Lavrenko and James Allan Center for Intellignemt Information Retrieval Department of Computer Science 140 Governor s Drive University of Massachusetts

More information

This literature review provides an overview of the various topics related to using implicit

This literature review provides an overview of the various topics related to using implicit Vijay Deepak Dollu. Implicit Feedback in Information Retrieval: A Literature Analysis. A Master s Paper for the M.S. in I.S. degree. April 2005. 56 pages. Advisor: Stephanie W. Haas This literature review

More information

Automatic Translation in Cross-Lingual Access to Legislative Databases

Automatic Translation in Cross-Lingual Access to Legislative Databases Automatic Translation in Cross-Lingual Access to Legislative Databases Catherine Bounsaythip, Aarno Lehtola, Jarno Tenni VTT Information Technology P. Box 1201, FIN-02044 VTT, Finland Phone: +358 9 456

More information

Making Retrieval Faster Through Document Clustering

Making Retrieval Faster Through Document Clustering R E S E A R C H R E P O R T I D I A P Making Retrieval Faster Through Document Clustering David Grangier 1 Alessandro Vinciarelli 2 IDIAP RR 04-02 January 23, 2004 D a l l e M o l l e I n s t i t u t e

More information

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Tilburg University Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Publication date: 2006 Link to publication Citation for published

More information

Amit Singhal, Chris Buckley, Mandar Mitra. Department of Computer Science, Cornell University, Ithaca, NY 14853

Amit Singhal, Chris Buckley, Mandar Mitra. Department of Computer Science, Cornell University, Ithaca, NY 14853 Pivoted Document Length Normalization Amit Singhal, Chris Buckley, Mandar Mitra Department of Computer Science, Cornell University, Ithaca, NY 8 fsinghal, chrisb, mitrag@cs.cornell.edu Abstract Automatic

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

The Utrecht Blend: Basic Ingredients for an XML Retrieval System

The Utrecht Blend: Basic Ingredients for an XML Retrieval System The Utrecht Blend: Basic Ingredients for an XML Retrieval System Roelof van Zwol Centre for Content and Knowledge Engineering Utrecht University Utrecht, the Netherlands roelof@cs.uu.nl Virginia Dignum

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Information Processing and Management 43 (2007) 1044 1058 www.elsevier.com/locate/infoproman Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Anselm Spoerri

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

Performance Measures for Multi-Graded Relevance

Performance Measures for Multi-Graded Relevance Performance Measures for Multi-Graded Relevance Christian Scheel, Andreas Lommatzsch, and Sahin Albayrak Technische Universität Berlin, DAI-Labor, Germany {christian.scheel,andreas.lommatzsch,sahin.albayrak}@dai-labor.de

More information

IITH at CLEF 2017: Finding Relevant Tweets for Cultural Events

IITH at CLEF 2017: Finding Relevant Tweets for Cultural Events IITH at CLEF 2017: Finding Relevant Tweets for Cultural Events Sreekanth Madisetty and Maunendra Sankar Desarkar Department of CSE, IIT Hyderabad, Hyderabad, India {cs15resch11006, maunendra}@iith.ac.in

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

Indexing and Query Processing

Indexing and Query Processing Indexing and Query Processing Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu January 28, 2013 Basic Information Retrieval Process doc doc doc doc doc information need document representation

More information

A Fusion Approach to XML Structured Document Retrieval

A Fusion Approach to XML Structured Document Retrieval A Fusion Approach to XML Structured Document Retrieval Ray R. Larson School of Information Management and Systems University of California, Berkeley Berkeley, CA 94720-4600 ray@sims.berkeley.edu 17 April

More information

ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ

ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ - 1 - ITERATIVE SEARCHING IN AN ONLINE DATABASE Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ 07962-1910 ABSTRACT An experiment examined how people use

More information

This paper studies methods to enhance cross-language retrieval of domain-specific

This paper studies methods to enhance cross-language retrieval of domain-specific Keith A. Gatlin. Enhancing Cross-Language Retrieval of Comparable Corpora Through Thesaurus-Based Translation and Citation Indexing. A master s paper for the M.S. in I.S. degree. April, 2005. 23 pages.

More information

A probabilistic description-oriented approach for categorising Web documents

A probabilistic description-oriented approach for categorising Web documents A probabilistic description-oriented approach for categorising Web documents Norbert Gövert Mounia Lalmas Norbert Fuhr University of Dortmund {goevert,mounia,fuhr}@ls6.cs.uni-dortmund.de Abstract The automatic

More information

Noida institute of engineering and technology,greater noida

Noida institute of engineering and technology,greater noida Impact Of Word Sense Ambiguity For English Language In Web IR Prachi Gupta 1, Dr.AnuragAwasthi 2, RiteshRastogi 3 1,2,3 Department of computer Science and engineering, Noida institute of engineering and

More information

Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation

Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation Praveen Pathak Michael Gordon Weiguo Fan Purdue University University of Michigan pathakp@mgmt.purdue.edu mdgordon@umich.edu

More information

Melbourne University at the 2006 Terabyte Track

Melbourne University at the 2006 Terabyte Track Melbourne University at the 2006 Terabyte Track Vo Ngoc Anh William Webber Alistair Moffat Department of Computer Science and Software Engineering The University of Melbourne Victoria 3010, Australia Abstract:

More information

Pseudo-Relevance Feedback and Title Re-Ranking for Chinese Information Retrieval

Pseudo-Relevance Feedback and Title Re-Ranking for Chinese Information Retrieval Pseudo-Relevance Feedback and Title Re-Ranking Chinese Inmation Retrieval Robert W.P. Luk Department of Computing The Hong Kong Polytechnic University Email: csrluk@comp.polyu.edu.hk K.F. Wong Dept. Systems

More information

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

VIDEO SEARCHING AND BROWSING USING VIEWFINDER VIDEO SEARCHING AND BROWSING USING VIEWFINDER By Dan E. Albertson Dr. Javed Mostafa John Fieber Ph. D. Student Associate Professor Ph. D. Candidate Information Science Information Science Information Science

More information

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University of Maryland, College Park, MD 20742 oard@glue.umd.edu

More information

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University CS6200 Information Retrieval Jesse Anderton College of Computer and Information Science Northeastern University Major Contributors Gerard Salton! Vector Space Model Indexing Relevance Feedback SMART Karen

More information

Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization

Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization Jakub Kanis, Luděk Müller University of West Bohemia, Department of Cybernetics, Univerzitní 8, 306 14 Plzeň, Czech Republic {jkanis,muller}@kky.zcu.cz

More information

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano

More information

Evaluating the effectiveness of content-oriented XML retrieval

Evaluating the effectiveness of content-oriented XML retrieval Evaluating the effectiveness of content-oriented XML retrieval Norbert Gövert University of Dortmund Norbert Fuhr University of Duisburg-Essen Gabriella Kazai Queen Mary University of London Mounia Lalmas

More information

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,

More information

The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval

The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval University of Massachusetts Amherst ScholarWorks@UMass Amherst Computer Science Department Faculty Publication Series Computer Science 1997 The Effectiveness of a Dictionary-Based Technique for Indonesian-English

More information

A Search Relevancy Tuning Method Using Expert Results Content Evaluation

A Search Relevancy Tuning Method Using Expert Results Content Evaluation A Search Relevancy Tuning Method Using Expert Results Content Evaluation Boris Mark Tylevich Chair of System Integration and Management Moscow Institute of Physics and Technology Moscow, Russia email:boris@tylevich.ru

More information

Retrieval and Feedback Models for Blog Distillation

Retrieval and Feedback Models for Blog Distillation Retrieval and Feedback Models for Blog Distillation Jonathan Elsas, Jaime Arguello, Jamie Callan, Jaime Carbonell Language Technologies Institute, School of Computer Science, Carnegie Mellon University

More information

Query Expansion for Noisy Legal Documents

Query Expansion for Noisy Legal Documents Query Expansion for Noisy Legal Documents Lidan Wang 1,3 and Douglas W. Oard 2,3 1 Computer Science Department, 2 College of Information Studies and 3 Institute for Advanced Computer Studies, University

More information

Using a Medical Thesaurus to Predict Query Difficulty

Using a Medical Thesaurus to Predict Query Difficulty Using a Medical Thesaurus to Predict Query Difficulty Florian Boudin, Jian-Yun Nie, Martin Dawes To cite this version: Florian Boudin, Jian-Yun Nie, Martin Dawes. Using a Medical Thesaurus to Predict Query

More information

Information filtering and information retrieval: Two sides of the same coin? by Nicholas J. Belkin and W. Bruce Croft

Information filtering and information retrieval: Two sides of the same coin? by Nicholas J. Belkin and W. Bruce Croft Communications of the ACM Dec 1992 v35 n12 p29(10) Page 1 by Nicholas J. Belkin and W. Bruce Croft Information filtering systems are designed for unstructured or semistructured data, as opposed to database

More information

Robust Relevance-Based Language Models

Robust Relevance-Based Language Models Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new

More information

Document Structure Analysis in Associative Patent Retrieval

Document Structure Analysis in Associative Patent Retrieval Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,

More information

Using XML Logical Structure to Retrieve (Multimedia) Objects

Using XML Logical Structure to Retrieve (Multimedia) Objects Using XML Logical Structure to Retrieve (Multimedia) Objects Zhigang Kong and Mounia Lalmas Queen Mary, University of London {cskzg,mounia}@dcs.qmul.ac.uk Abstract. This paper investigates the use of the

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

More information

Document Filtering Method Using Non-Relevant Information Profile

Document Filtering Method Using Non-Relevant Information Profile Document Filtering Method Using Non-Relevant Information Profile Keiichiro Hoashi Kazunori Matsumoto Naomi Inoue Kazuo Hashimoto KDD R&D Laboratories, Inc. 2-1-15 Ohaxa Kamifukuoka, Saitama 356-8502 JAPAN

More information

Quoogle: A Query Expander for Google

Quoogle: A Query Expander for Google Quoogle: A Query Expander for Google Michael Smit Faculty of Computer Science Dalhousie University 6050 University Avenue Halifax, NS B3H 1W5 smit@cs.dal.ca ABSTRACT The query is the fundamental way through

More information

A Corpus Analysis Approach for Automatic Query Expansion and Its Extension to Multiple Databases

A Corpus Analysis Approach for Automatic Query Expansion and Its Extension to Multiple Databases A Corpus Analysis Approach for Automatic Query Expansion and Its Extension to Multiple Databases SUSAN GAUCH, JIANYING WANG, and SATYA MAHESH RACHAKONDA University of Kansas Searching online text collections

More information

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems Dorothee Blocks Hypermedia Research Unit School of Computing University of Glamorgan, UK NKOS workshop

More information

Evaluation of Web Search Engines with Thai Queries

Evaluation of Web Search Engines with Thai Queries Evaluation of Web Search Engines with Thai Queries Virach Sornlertlamvanich, Shisanu Tongchim and Hitoshi Isahara Thai Computational Linguistics Laboratory 112 Paholyothin Road, Klong Luang, Pathumthani,

More information

The Effectiveness of Thesauri-Aided Retrieval

The Effectiveness of Thesauri-Aided Retrieval The Effectiveness of Thesauri-Aided Retrieval Kazem Taghva, Julie Borsack, and Allen Condit Information Science Research Institute University of Nevada, Las Vegas Las Vegas, NV 89154-4021 USA ABSTRACT

More information

Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6

Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Russell Swan James Allan Don Byrd Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Relevance in XML Retrieval: The User Perspective

Relevance in XML Retrieval: The User Perspective Relevance in XML Retrieval: The User Perspective Jovan Pehcevski School of CS & IT RMIT University Melbourne, Australia jovanp@cs.rmit.edu.au ABSTRACT A realistic measure of relevance is necessary for

More information

Searchers Selection of Search Keys: III. Searching Styles

Searchers Selection of Search Keys: III. Searching Styles Searchers Selection of Search Keys: III. Searching Styles Raya Fidel Graduate School of Library and information Science, University of Washington, Seattle, WA 98195 Individual searching style has a primary

More information

Retrieval Quality vs. Effectiveness of Relevance-Oriented Search in XML Documents

Retrieval Quality vs. Effectiveness of Relevance-Oriented Search in XML Documents Retrieval Quality vs. Effectiveness of Relevance-Oriented Search in XML Documents Norbert Fuhr University of Duisburg-Essen Mohammad Abolhassani University of Duisburg-Essen Germany Norbert Gövert University

More information

Report on the TREC-4 Experiment: Combining Probabilistic and Vector-Space Schemes

Report on the TREC-4 Experiment: Combining Probabilistic and Vector-Space Schemes Report on the TREC-4 Experiment: Combining Probabilistic and Vector-Space Schemes Jacques Savoy, Melchior Ndarugendamwo, Dana Vrajitoru Faculté de droit et des sciences économiques Université de Neuchâtel

More information

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

An Investigation of Basic Retrieval Models for the Dynamic Domain Task An Investigation of Basic Retrieval Models for the Dynamic Domain Task Razieh Rahimi and Grace Hui Yang Department of Computer Science, Georgetown University rr1042@georgetown.edu, huiyang@cs.georgetown.edu

More information

number of documents in global result list

number of documents in global result list Comparison of different Collection Fusion Models in Distributed Information Retrieval Alexander Steidinger Department of Computer Science Free University of Berlin Abstract Distributed information retrieval

More information

CLARIT Compound Queries and Constraint-Controlled Feedback in TREC-5 Ad-Hoc Experiments

CLARIT Compound Queries and Constraint-Controlled Feedback in TREC-5 Ad-Hoc Experiments CLARIT Compound Queries and Constraint-Controlled Feedback in TREC-5 Ad-Hoc Experiments Natasa Milic-Frayling 1, Xiang Tong 2, Chengxiang Zhai 2, David A. Evans 1 1 CLARITECH Corporation 2 Laboratory for

More information

A Comparison of Text Retrieval Models

A Comparison of Text Retrieval Models A Comparison of Text Retrieval Models HOWARD R. TURTLE 1 AND W. BRUCE CROFT 2 1 West Publishing Company, St Paul, MN 55164, USA 'Computer Science Department, University of Massachusetts, Amherst, MA 01002,

More information

Experiments with Automatic Query Formulation in the Extended Boolean Model

Experiments with Automatic Query Formulation in the Extended Boolean Model Experiments with Automatic Query Formulation in the Extended Boolean Model Lucie Skorkovská and Pavel Ircing University of West Bohemia, Faculty of Applied Sciences, Dept. of Cybernetics Univerzitní 8,

More information

Fault Class Prioritization in Boolean Expressions

Fault Class Prioritization in Boolean Expressions Fault Class Prioritization in Boolean Expressions Ziyuan Wang 1,2 Zhenyu Chen 1 Tsong-Yueh Chen 3 Baowen Xu 1,2 1 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093,

More information

Query-Sensitive Similarity Measure for Content-Based Image Retrieval

Query-Sensitive Similarity Measure for Content-Based Image Retrieval Query-Sensitive Similarity Measure for Content-Based Image Retrieval Zhi-Hua Zhou Hong-Bin Dai National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China {zhouzh, daihb}@lamda.nju.edu.cn

More information

University of Santiago de Compostela at CLEF-IP09

University of Santiago de Compostela at CLEF-IP09 University of Santiago de Compostela at CLEF-IP9 José Carlos Toucedo, David E. Losada Grupo de Sistemas Inteligentes Dept. Electrónica y Computación Universidad de Santiago de Compostela, Spain {josecarlos.toucedo,david.losada}@usc.es

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Access-Ordered Indexes

Access-Ordered Indexes Access-Ordered Indexes Steven Garcia Hugh E. Williams Adam Cannane School of Computer Science and Information Technology, RMIT University, GPO Box 2476V, Melbourne 3001, Australia. {garcias,hugh,cannane}@cs.rmit.edu.au

More information

Cross-Language Information Retrieval using Dutch Query Translation

Cross-Language Information Retrieval using Dutch Query Translation Cross-Language Information Retrieval using Dutch Query Translation Anne R. Diekema and Wen-Yuan Hsiao Syracuse University School of Information Studies 4-206 Ctr. for Science and Technology Syracuse, NY

More information

M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l

M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l Anette Hulth, Lars Asker Dept, of Computer and Systems Sciences Stockholm University [hulthi asker]ø dsv.su.s e Jussi Karlgren Swedish

More information

CS54701: Information Retrieval

CS54701: Information Retrieval CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful

More information

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces A user study on features supporting subjective relevance for information retrieval interfaces Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S. (2006). Proc. 9th International Conference of Asian Digital Libraries

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Effects of Advanced Search on User Performance and Search Efforts: A Case Study with Three Digital Libraries

Effects of Advanced Search on User Performance and Search Efforts: A Case Study with Three Digital Libraries 2012 45th Hawaii International Conference on System Sciences Effects of Advanced Search on User Performance and Search Efforts: A Case Study with Three Digital Libraries Xiangmin Zhang Wayne State University

More information

A World Wide Web-based HCI-library Designed for Interaction Studies

A World Wide Web-based HCI-library Designed for Interaction Studies A World Wide Web-based HCI-library Designed for Interaction Studies Ketil Perstrup, Erik Frøkjær, Maria Konstantinovitz, Thorbjørn Konstantinovitz, Flemming S. Sørensen, Jytte Varming Department of Computing,

More information

Verbose Query Reduction by Learning to Rank for Social Book Search Track

Verbose Query Reduction by Learning to Rank for Social Book Search Track Verbose Query Reduction by Learning to Rank for Social Book Search Track Messaoud CHAA 1,2, Omar NOUALI 1, Patrice BELLOT 3 1 Research Center on Scientific and Technical Information 05 rue des 03 frères

More information

Information Retrieval by Enhanced Boolean Model

Information Retrieval by Enhanced Boolean Model Information Retrieval by Enhanced Boolean Model K.Madhuri 1, L.Divya 2 Assistant Professor, Department of Information Technology, Vasavi College of Engineering, Hyderabad, Telangana, India 1 Assistant

More information

DCU at FIRE 2013: Cross-Language!ndian News Story Search

DCU at FIRE 2013: Cross-Language!ndian News Story Search DCU at FIRE 2013: Cross-Language!ndian News Story Search Piyush Arora, Jennifer Foster, and Gareth J. F. Jones CNGL Centre for Global Intelligent Content School of Computing, Dublin City University Glasnevin,

More information

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department

More information