ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ

Size: px

Start display at page:

Download "ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ"

Jeremy Stafford
5 years ago
Views:

1 - 1 - ITERATIVE SEARCHING IN AN ONLINE DATABASE Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ ABSTRACT An experiment examined how people use an online retrieval system. Subjects solved general topical search problems using a database containing the full text of news articles (e.g., find articles about the "Background of the new prime minister of Great Britain"). Time, accuracy and content of the searches were recorded. Of particular interest was the use of two iterative search methods available in the interface - a Lookup function that allowed users to explicitly specify an alternative query; and a LikeThese function that could be used to automatically generate a new query using articles the user marked as relevant. Results showed that subjects could easily use both query reformulation methods. Subjects generated much more effective LikeThese searches than Lookup searches. An analysis of individual subject differences suggests that the LikeThese method is more accessible to a wide range of users. Figure 1. Example of InfoSearch interface. Response of the system to the search problem "Leaders who figure in discussions of the future of the West German chancellorship" is shown. (a) Lookup search (b) LikeThese search Figure 2. Examples of InfoSearch iterative search functions. Only the List of Documents and Lookup Windows are shown. Panel (a) shows use of the Lookup function to enter the query "new west german chancellor"; Panel (b) shows a LikeThese search using documents 81 and 103.

2 - 2 - INTRODUCTION This paper describes an experiment examining how people use an online retrieval system. The InfoSearch interface (Dumais & Littman, 1990) was used to present a textual database of news articles to users. This interface incorporates features that have been shown to improve retrieval performance in simulations (e.g., Latent Semantic Indexing and iterative query specification). The experiment examines the extent to which these methods are effective in practice. Of particular interest are the strategies people use for modifying their initial requests. Before describing the experiment and results, we briefly review the Latent Semantic Indexing retrieval method and describe the InfoSearch interface. Latent Semantic Indexing (LSI) Latent Semantic Indexing (LSI) is a method that can improve people s access to textual information (Deerwester, et al. 1990; Dumais, et al., 1988). Most textual retrieval systems operate by matching words in users queries with words in database objects. Because of the tremendous variability in the words people use to describe objects or topics of interest, word-matching methods are far from perfect. The fact that the same word can be used to refer to many different things means that irrelevant objects will be retrieved (e.g., the word "mouse" means different things in different contexts). Conversely, the fact that different authors use different words to describe essentially the same idea means that many relevant objects will be missed (e.g., articles about mice, track balls, and pointing devices might also be relevant to someone asking about a mouse). LSI tries to overcome these problems by using statistical methods to model the associations of terms and objects, and to automatically construct a "semantic" space more appropriate for

3 - 3 - information retrieval. LSI provides several advantages over standard word-matching methods. First, LSI allows objects which share no words with a user s query to be quite similar to it, resulting in up to 30% improvement in retrieval performance. Second, in response to a query, LSI returns a list of all objects ranked from most similar to least similar, allowing the user to view as many as necessary for a particular task. Finally, since both terms and text objects are represented in the LSI space, any combination of words and objects can be used as a query. InfoSearch retrieval interface The InfoSearch Retrieval Interface is a program that allows users to see the results of an LSI search and to interactively specify new queries (Dumais & Littman, 1990; also see METHOD section below). Multiple tiled windows allow users to brief titles, to view the full text of selected objects, and to construct queries. Users specify initial queries by typing. A rank-ordered list of objects (based on LSI-matching) is returned. InfoSearch also provides two mechanisms for iterative query formulation. A Lookup function can be used to explicitly specify an alternative query. Essentially, this lets users try again. There is little data on the effectiveness of this method, although it is generally assumed that users can use the results of previous searches to modify subsequent attempts. In addition, a Like These function can be used to automatically generate a new query using the full text of objects the user has marked as relevant. If some of the initial responses are on the right track, users mark them and ask the system to find more "like these". Information retrieval simulations and psychological theory suggest that this so called relevance feedback can improve users ability to find relevant objects by 60% or more (Oddy, 1977; Salton & Buckley, 1990; Stanfill & Kahle, 1986; Williams, 1984).

4 - 4 - Design METHOD Fifty-seven college students took part in the experiment. The database was a collection of the full text of several hundred international news articles from 1963 often used in information science research. There were three experimental conditions designed to manipulate the search strategies subjects used. In the Lookup condition, subjects were encouraged to use the Lookup function to find additional articles. In the LikeThese condition, subjects were encouraged to use the LikeThese function. And, in the Both condition, both search strategies were given equal emphasis during training. Subjects were free to use either method at any time after training. Procedure Subjects were taught to use the InfoSearch interface and practiced on a small collection of information science articles. They were then given ten topical search problems that could be answered using the news database, and asked to find as many articles as they could that were relevant to each question. The questions were general topical searches - e.g., find articles about the "Background of the new prime minister of Great Britain" or find articles about the "Leaders who figure in discussions of the future of the West German chancellorship".

5 - 5 - At the beginning of each search problem, the display was initialized to what it would have looked like if subjects had literally typed the search problem as a query. Subjects searched until they thought they had found all relevant articles. The experiment was self-paced, with the average subject completing the experiment in three hours. All keystrokes were collected by the InfoSearch program. Measures of primary interest included problem solving time, accuracy and the content of subjects searches. On a separate day, demographic and technical aptitude information about the subjects was collected. Interface A screen dump of the InfoSearch retrieval interface is presented in Figure 1. This example shows the systems response to the query: "Leaders who figure in discussions of the future of the West German chancellorship". There are four main windows in the experimental system. (1) The List of Documents Window (upper left) displays a list of the titles of articles that best match the query. Articles are ranked from most to least similar to the query. The numbers at the far left (e.g., 0.84) are the LSI-based similarity between the query and each article. These numbers can range from 1.00 (indicating a perfect match between query and article) to The numbers in parentheses (e.g., 266) are article identification numbers. The scroll bar can be used to display the titles of additional articles. (2) The full text of the first article is shown in the

6 - 6 - large Page of Text Window (upper right). The full text of other articles can be displayed by pointing to the corresponding article in the List of Documents Window, or by scrolling through the text in the Text Window until the next article appears. (3) Queries are entered in the Lookup Window (bottom left). InfoSearch provides two mechanisms for query formulation - the Lookup and LikeThese buttons at the bottom of the window. When users select the Lookup button, a query window appears and they can enter any query by typing (Figure 2a). Alternatively, users can use the LikeThese function to search for additional articles. If some articles contain relevant information users can mark them and ask the system to find more articles "like these" (Figure 2b). In this case, articles 81 and 103 are marked as relevant. The system automatically constructs a query using the full text of these articles when the LikeThese function is selected. All previous queries are saved in the Lookup Window and users can easily re-execute them. Note that the query #31 is a shorthand for the search problem "Leaders who figure in discussions of the future of the West German chancellorship". (4) The Experimental Control Window (lower right) is used to present search problems to subjects and to collect their responses.

7 - 7 - Search strategies RESULTS Subjects in all conditions could easily use both query reformulation methods. On average, subjects tried more than four searches (Lookup or LikeThese) in addition to the original problem statement to answer each question. The experimental manipulation was effective in influencing the search strategies subjects used. The ratio of the number of LikeThese searches to the total number of searches was.62 in the LikeThese condition,.52 in the Both condition, and.26 in the Lookup condition (F (2,54)=18.2; p <.001). Effectiveness of searches Analyses were performed using answers provided by outside judges as target responses for each search problem. Subjects answers were compared with the judges "correct" answers. The proportion correct, the number of intrusions, and total time all favored the LikeThese condition, although none of the differences was statistically reliable. It is important to note, however, that since subjects were free to use either search method at any time this is a very weak comparison.

8 - 8 - A more sensitive measure of performance can be obtained by separately examining the quality of Lookup and LikeThese searches independent of condition. Because subjects generated several searches in solving each problem, it is difficult to know which particular searches resulted in which final answers. To examine the effectiveness of each search, we simply calculate the number of relevant articles in the top 10. That is, for each of the 10 search problems, we look at the articles returned in response to each Lookup and each LikeThese search and count the number of relevant articles among the first 10 articles returned. Table 1 summarizes the results of this analysis. On average subjects try more Lookup searches (2.5) than LikeThese searches (1.9). Lookup searches are, however, generally much less effective than LikeThese searches. The average Lookup search results in fewer relevant articles (3.3) that the average LikeThese search (4.4) - F (1,9)=27.8, p <.001. Similar advantages for LikeThese searches are obtained for the best and worst queries generated by each subject for each search problem - F (1,9)=9.2, p <.014; F (1,9)=56.5, p <.001 for the best and worst queries, respectively. The best Lookup search returns the same number of relevant articles as the worst LikeThese search. It is also interesting that only LikeThese searches reliably improve on performance obtained using the original problem statement as a query. The single best LikeThese search, for example, results in a 37% improvement over the original problem statement. Finally, we note that the average number of relevant articles is 6.8, so there is still room for improvement relative to even the best LikeThese searches which return 4.7 relevant articles among the top 10.

9 - 9 - original users users problem iterative iterative statement searches searches "Lookup" "LikeThese" number of searches number relevant in top 10: avg best worst Table 1. Effectiveness of Lookup vs. LikeThese searches - number of relevant articles in the top 10 articles returned. These results confirm informal observations that subjects find it difficult to generate effective Lookup search queries. This is in spite of the fact that InfoSearch is an interactive retrieval system in which results of previous

10 search attempts could be used to modify subsequent searches. We believe that part of the problem results from the fact that users typically generate short queries (an average of 3 words per Lookup search). Given the variability in the way different authors describe the same topic, many relevant articles will be missed with such short queries. The LikeThese method, on the other hand, provides users with an easy way to construct what is in effect a very rich query (the system automatically constructs a query using the full text of the selected articles), and this appears to be necessary for success. Individual differences There were large and interesting individual differences in performance in the experiment. For most dependent variables, a range of about 4:1 was observed between the best and worst subject. In general, technical aptitudes and background variables did not reliably predict performance, suggesting that success with the InfoSearch interface is not limited to people with high aptitudes or particular kinds of previous experience. (See Egan, 1989, for a review of other retrieval interfaces that require specific technical aptitudes or background characteristics for success.) LikeThese searches were particularly effective for most people, regardless of aptitude. For subjects who used LikeThese searches more frequently than Lookup searches (n =27), performance was predicted only by reading ability, and this is not surprising since they had to read the articles to answer the search problems. For subjects who preferred Lookup searches (n =27), performance depended on verbal fluency, and spatial ability, as well as reading ability. This pattern suggests that additional verbal and spatial abilities may be required when subjects must explicitly generate alternative queries. Figure 3 shows the average time per correct response plotted as a function of "associational fluency" for subjects who prefer to use Lookup searches (top curve), and for subjects who prefer to use LikeThese searches (bottom curve). Associational fluency is a composite factor reflecting the ability to quickly generate words that are semantically or phonemically related to target words (as measured by the Associational Fluency and Word Fluency tests from Ekstrom et al., 1976). This factor does not reflect general reading comprehension or vocabulary. The lines depict the regression of time per correct response on

11 associational fluency. The difference between the simple correlations for these two groups is reliable (z =1.95; p =.05). For subjects who prefer Lookup searches, performance depends on associational fluency - subjects with low fluency scores take 50% longer to find articles than subjects with high fluency scores. For subjects who prefer LikeThese searches, performance is independent of fluency and is generally better. This pattern suggests that LikeThese searches can be used more effectively by more people than Lookup searches.

12 Figure 3. Mean time per correct response is plotted as a function of "associational fluency" for subjects who prefer Lookup searches (top curve), and for those subjects who prefer LikeThese searches (bottom curve). DISCUSSION The InfoSearch interface to textual databases appears to be easy to use for novice searchers. Subjects use both available iterative retrieval mechanisms (Lookup and LikeThese). They are much more effective using LikeThese to find additional relevant articles than they are at explicitly constructing their own alternative searches using Lookup. These results support previous theoretical and simulation results suggesting that relevance feedback methods should improve performance (e.g., Oddy, 1977; Salton & Buckley, 1990; Stanfill & Kahle, 1986; Williams, 1984). In addition, an analysis of individual subject differences suggests that the LikeThese method may be more accessible to a wider range of users.

13 REFERENCES [1] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman R. A. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41(6), [2] Dumais, S. T. and Littman, M. L. InfoSearch: A program for iterative retrieval using Latent Semantic Indexing. Poster presented at CHI 90. [3] Dumais, S. T., Furnas, G. W., Landauer, T. K., and Deerwester, S.. Using latent semantic analysis to improve information retrieval. In CHI 88 Proceedings, 1988, [4] Egan, D. E. Individual differences in humancomputer interaction. In: M. Helander (Ed.), Handbook of Human-Computer Interaction, Elsevier Science Publishers (North-Holland), 1988, [5] Ekstrom, R. B., French, J. W., Harman, H. H., and Dermen, D. Manual for Kit of Factor-Referenced Cognitive Tests Princeton, NJ: Educational Testing Service, [6] Oddy, R. N. Information retrieval through manmachine dialogue. Journal of Documentation, 1977, 33, [7] Salton, G. and Buckley, C. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 1990, 41(4), [8] Stanfill, C. and Kahle, B. Parallel free-text search on the connection machine system. Communications of the ACM, 1986, 29(12), [9] Williams, M. D. What makes RABBIT run? International Journal of Man-Machine Studies, 1984, 21,

Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6

Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Russell Swan James Allan Don Byrd Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts