ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ

Size: px
Start display at page:

Download "ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ"

Transcription

1 - 1 - ITERATIVE SEARCHING IN AN ONLINE DATABASE Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ ABSTRACT An experiment examined how people use an online retrieval system. Subjects solved general topical search problems using a database containing the full text of news articles (e.g., find articles about the "Background of the new prime minister of Great Britain"). Time, accuracy and content of the searches were recorded. Of particular interest was the use of two iterative search methods available in the interface - a Lookup function that allowed users to explicitly specify an alternative query; and a LikeThese function that could be used to automatically generate a new query using articles the user marked as relevant. Results showed that subjects could easily use both query reformulation methods. Subjects generated much more effective LikeThese searches than Lookup searches. An analysis of individual subject differences suggests that the LikeThese method is more accessible to a wide range of users. Figure 1. Example of InfoSearch interface. Response of the system to the search problem "Leaders who figure in discussions of the future of the West German chancellorship" is shown. (a) Lookup search (b) LikeThese search Figure 2. Examples of InfoSearch iterative search functions. Only the List of Documents and Lookup Windows are shown. Panel (a) shows use of the Lookup function to enter the query "new west german chancellor"; Panel (b) shows a LikeThese search using documents 81 and 103.

2 - 2 - INTRODUCTION This paper describes an experiment examining how people use an online retrieval system. The InfoSearch interface (Dumais & Littman, 1990) was used to present a textual database of news articles to users. This interface incorporates features that have been shown to improve retrieval performance in simulations (e.g., Latent Semantic Indexing and iterative query specification). The experiment examines the extent to which these methods are effective in practice. Of particular interest are the strategies people use for modifying their initial requests. Before describing the experiment and results, we briefly review the Latent Semantic Indexing retrieval method and describe the InfoSearch interface. Latent Semantic Indexing (LSI) Latent Semantic Indexing (LSI) is a method that can improve people s access to textual information (Deerwester, et al. 1990; Dumais, et al., 1988). Most textual retrieval systems operate by matching words in users queries with words in database objects. Because of the tremendous variability in the words people use to describe objects or topics of interest, word-matching methods are far from perfect. The fact that the same word can be used to refer to many different things means that irrelevant objects will be retrieved (e.g., the word "mouse" means different things in different contexts). Conversely, the fact that different authors use different words to describe essentially the same idea means that many relevant objects will be missed (e.g., articles about mice, track balls, and pointing devices might also be relevant to someone asking about a mouse). LSI tries to overcome these problems by using statistical methods to model the associations of terms and objects, and to automatically construct a "semantic" space more appropriate for

3 - 3 - information retrieval. LSI provides several advantages over standard word-matching methods. First, LSI allows objects which share no words with a user s query to be quite similar to it, resulting in up to 30% improvement in retrieval performance. Second, in response to a query, LSI returns a list of all objects ranked from most similar to least similar, allowing the user to view as many as necessary for a particular task. Finally, since both terms and text objects are represented in the LSI space, any combination of words and objects can be used as a query. InfoSearch retrieval interface The InfoSearch Retrieval Interface is a program that allows users to see the results of an LSI search and to interactively specify new queries (Dumais & Littman, 1990; also see METHOD section below). Multiple tiled windows allow users to brief titles, to view the full text of selected objects, and to construct queries. Users specify initial queries by typing. A rank-ordered list of objects (based on LSI-matching) is returned. InfoSearch also provides two mechanisms for iterative query formulation. A Lookup function can be used to explicitly specify an alternative query. Essentially, this lets users try again. There is little data on the effectiveness of this method, although it is generally assumed that users can use the results of previous searches to modify subsequent attempts. In addition, a Like These function can be used to automatically generate a new query using the full text of objects the user has marked as relevant. If some of the initial responses are on the right track, users mark them and ask the system to find more "like these". Information retrieval simulations and psychological theory suggest that this so called relevance feedback can improve users ability to find relevant objects by 60% or more (Oddy, 1977; Salton & Buckley, 1990; Stanfill & Kahle, 1986; Williams, 1984).

4 - 4 - Design METHOD Fifty-seven college students took part in the experiment. The database was a collection of the full text of several hundred international news articles from 1963 often used in information science research. There were three experimental conditions designed to manipulate the search strategies subjects used. In the Lookup condition, subjects were encouraged to use the Lookup function to find additional articles. In the LikeThese condition, subjects were encouraged to use the LikeThese function. And, in the Both condition, both search strategies were given equal emphasis during training. Subjects were free to use either method at any time after training. Procedure Subjects were taught to use the InfoSearch interface and practiced on a small collection of information science articles. They were then given ten topical search problems that could be answered using the news database, and asked to find as many articles as they could that were relevant to each question. The questions were general topical searches - e.g., find articles about the "Background of the new prime minister of Great Britain" or find articles about the "Leaders who figure in discussions of the future of the West German chancellorship".

5 - 5 - At the beginning of each search problem, the display was initialized to what it would have looked like if subjects had literally typed the search problem as a query. Subjects searched until they thought they had found all relevant articles. The experiment was self-paced, with the average subject completing the experiment in three hours. All keystrokes were collected by the InfoSearch program. Measures of primary interest included problem solving time, accuracy and the content of subjects searches. On a separate day, demographic and technical aptitude information about the subjects was collected. Interface A screen dump of the InfoSearch retrieval interface is presented in Figure 1. This example shows the systems response to the query: "Leaders who figure in discussions of the future of the West German chancellorship". There are four main windows in the experimental system. (1) The List of Documents Window (upper left) displays a list of the titles of articles that best match the query. Articles are ranked from most to least similar to the query. The numbers at the far left (e.g., 0.84) are the LSI-based similarity between the query and each article. These numbers can range from 1.00 (indicating a perfect match between query and article) to The numbers in parentheses (e.g., 266) are article identification numbers. The scroll bar can be used to display the titles of additional articles. (2) The full text of the first article is shown in the

6 - 6 - large Page of Text Window (upper right). The full text of other articles can be displayed by pointing to the corresponding article in the List of Documents Window, or by scrolling through the text in the Text Window until the next article appears. (3) Queries are entered in the Lookup Window (bottom left). InfoSearch provides two mechanisms for query formulation - the Lookup and LikeThese buttons at the bottom of the window. When users select the Lookup button, a query window appears and they can enter any query by typing (Figure 2a). Alternatively, users can use the LikeThese function to search for additional articles. If some articles contain relevant information users can mark them and ask the system to find more articles "like these" (Figure 2b). In this case, articles 81 and 103 are marked as relevant. The system automatically constructs a query using the full text of these articles when the LikeThese function is selected. All previous queries are saved in the Lookup Window and users can easily re-execute them. Note that the query #31 is a shorthand for the search problem "Leaders who figure in discussions of the future of the West German chancellorship". (4) The Experimental Control Window (lower right) is used to present search problems to subjects and to collect their responses.

7 - 7 - Search strategies RESULTS Subjects in all conditions could easily use both query reformulation methods. On average, subjects tried more than four searches (Lookup or LikeThese) in addition to the original problem statement to answer each question. The experimental manipulation was effective in influencing the search strategies subjects used. The ratio of the number of LikeThese searches to the total number of searches was.62 in the LikeThese condition,.52 in the Both condition, and.26 in the Lookup condition (F (2,54)=18.2; p <.001). Effectiveness of searches Analyses were performed using answers provided by outside judges as target responses for each search problem. Subjects answers were compared with the judges "correct" answers. The proportion correct, the number of intrusions, and total time all favored the LikeThese condition, although none of the differences was statistically reliable. It is important to note, however, that since subjects were free to use either search method at any time this is a very weak comparison.

8 - 8 - A more sensitive measure of performance can be obtained by separately examining the quality of Lookup and LikeThese searches independent of condition. Because subjects generated several searches in solving each problem, it is difficult to know which particular searches resulted in which final answers. To examine the effectiveness of each search, we simply calculate the number of relevant articles in the top 10. That is, for each of the 10 search problems, we look at the articles returned in response to each Lookup and each LikeThese search and count the number of relevant articles among the first 10 articles returned. Table 1 summarizes the results of this analysis. On average subjects try more Lookup searches (2.5) than LikeThese searches (1.9). Lookup searches are, however, generally much less effective than LikeThese searches. The average Lookup search results in fewer relevant articles (3.3) that the average LikeThese search (4.4) - F (1,9)=27.8, p <.001. Similar advantages for LikeThese searches are obtained for the best and worst queries generated by each subject for each search problem - F (1,9)=9.2, p <.014; F (1,9)=56.5, p <.001 for the best and worst queries, respectively. The best Lookup search returns the same number of relevant articles as the worst LikeThese search. It is also interesting that only LikeThese searches reliably improve on performance obtained using the original problem statement as a query. The single best LikeThese search, for example, results in a 37% improvement over the original problem statement. Finally, we note that the average number of relevant articles is 6.8, so there is still room for improvement relative to even the best LikeThese searches which return 4.7 relevant articles among the top 10.

9 - 9 - original users users problem iterative iterative statement searches searches "Lookup" "LikeThese" number of searches number relevant in top 10: avg best worst Table 1. Effectiveness of Lookup vs. LikeThese searches - number of relevant articles in the top 10 articles returned. These results confirm informal observations that subjects find it difficult to generate effective Lookup search queries. This is in spite of the fact that InfoSearch is an interactive retrieval system in which results of previous

10 search attempts could be used to modify subsequent searches. We believe that part of the problem results from the fact that users typically generate short queries (an average of 3 words per Lookup search). Given the variability in the way different authors describe the same topic, many relevant articles will be missed with such short queries. The LikeThese method, on the other hand, provides users with an easy way to construct what is in effect a very rich query (the system automatically constructs a query using the full text of the selected articles), and this appears to be necessary for success. Individual differences There were large and interesting individual differences in performance in the experiment. For most dependent variables, a range of about 4:1 was observed between the best and worst subject. In general, technical aptitudes and background variables did not reliably predict performance, suggesting that success with the InfoSearch interface is not limited to people with high aptitudes or particular kinds of previous experience. (See Egan, 1989, for a review of other retrieval interfaces that require specific technical aptitudes or background characteristics for success.) LikeThese searches were particularly effective for most people, regardless of aptitude. For subjects who used LikeThese searches more frequently than Lookup searches (n =27), performance was predicted only by reading ability, and this is not surprising since they had to read the articles to answer the search problems. For subjects who preferred Lookup searches (n =27), performance depended on verbal fluency, and spatial ability, as well as reading ability. This pattern suggests that additional verbal and spatial abilities may be required when subjects must explicitly generate alternative queries. Figure 3 shows the average time per correct response plotted as a function of "associational fluency" for subjects who prefer to use Lookup searches (top curve), and for subjects who prefer to use LikeThese searches (bottom curve). Associational fluency is a composite factor reflecting the ability to quickly generate words that are semantically or phonemically related to target words (as measured by the Associational Fluency and Word Fluency tests from Ekstrom et al., 1976). This factor does not reflect general reading comprehension or vocabulary. The lines depict the regression of time per correct response on

11 associational fluency. The difference between the simple correlations for these two groups is reliable (z =1.95; p =.05). For subjects who prefer Lookup searches, performance depends on associational fluency - subjects with low fluency scores take 50% longer to find articles than subjects with high fluency scores. For subjects who prefer LikeThese searches, performance is independent of fluency and is generally better. This pattern suggests that LikeThese searches can be used more effectively by more people than Lookup searches.

12 Figure 3. Mean time per correct response is plotted as a function of "associational fluency" for subjects who prefer Lookup searches (top curve), and for those subjects who prefer LikeThese searches (bottom curve). DISCUSSION The InfoSearch interface to textual databases appears to be easy to use for novice searchers. Subjects use both available iterative retrieval mechanisms (Lookup and LikeThese). They are much more effective using LikeThese to find additional relevant articles than they are at explicitly constructing their own alternative searches using Lookup. These results support previous theoretical and simulation results suggesting that relevance feedback methods should improve performance (e.g., Oddy, 1977; Salton & Buckley, 1990; Stanfill & Kahle, 1986; Williams, 1984). In addition, an analysis of individual subject differences suggests that the LikeThese method may be more accessible to a wider range of users.

13 REFERENCES [1] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman R. A. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41(6), [2] Dumais, S. T. and Littman, M. L. InfoSearch: A program for iterative retrieval using Latent Semantic Indexing. Poster presented at CHI 90. [3] Dumais, S. T., Furnas, G. W., Landauer, T. K., and Deerwester, S.. Using latent semantic analysis to improve information retrieval. In CHI 88 Proceedings, 1988, [4] Egan, D. E. Individual differences in humancomputer interaction. In: M. Helander (Ed.), Handbook of Human-Computer Interaction, Elsevier Science Publishers (North-Holland), 1988, [5] Ekstrom, R. B., French, J. W., Harman, H. H., and Dermen, D. Manual for Kit of Factor-Referenced Cognitive Tests Princeton, NJ: Educational Testing Service, [6] Oddy, R. N. Information retrieval through manmachine dialogue. Journal of Documentation, 1977, 33, [7] Salton, G. and Buckley, C. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 1990, 41(4), [8] Stanfill, C. and Kahle, B. Parallel free-text search on the connection machine system. Communications of the ACM, 1986, 29(12), [9] Williams, M. D. What makes RABBIT run? International Journal of Man-Machine Studies, 1984, 21,

Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6

Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Russell Swan James Allan Don Byrd Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts

More information

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California Two-Dimensional Visualization for Internet Resource Discovery Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781 fshli, danzigg@cs.usc.edu

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate Searching Information Servers Based on Customized Proles Technical Report USC-CS-96-636 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California

More information

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072

More information

Joho, H. and Jose, J.M. (2006) A comparative study of the effectiveness of search result presentation on the web. Lecture Notes in Computer Science 3936:pp. 302-313. http://eprints.gla.ac.uk/3523/ A Comparative

More information

DATA-DRIVEN APPROACHES TO IMPROVING INFORMATION ACCESS

DATA-DRIVEN APPROACHES TO IMPROVING INFORMATION ACCESS Festschrift for Richard M. Shiffrin DATA-DRIVEN APPROACHES TO IMPROVING INFORMATION ACCESS Susan Dumais, Microsoft Research Overview From IU to Industry (Bell Labs 1979, MSR 1997) Themes Practical focus

More information

Subjective Relevance: Implications on Interface Design for Information Retrieval Systems

Subjective Relevance: Implications on Interface Design for Information Retrieval Systems Subjective : Implications on interface design for information retrieval systems Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S (2005). Proc. 8th International Conference of Asian Digital Libraries (ICADL2005),

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

Speed and Accuracy using Four Boolean Query Systems

Speed and Accuracy using Four Boolean Query Systems From:MAICS-99 Proceedings. Copyright 1999, AAAI (www.aaai.org). All rights reserved. Speed and Accuracy using Four Boolean Query Systems Michael Chui Computer Science Department and Cognitive Science Program

More information

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may

More information

Noida institute of engineering and technology,greater noida

Noida institute of engineering and technology,greater noida Impact Of Word Sense Ambiguity For English Language In Web IR Prachi Gupta 1, Dr.AnuragAwasthi 2, RiteshRastogi 3 1,2,3 Department of computer Science and engineering, Noida institute of engineering and

More information

A Model for Interactive Web Information Retrieval

A Model for Interactive Web Information Retrieval A Model for Interactive Web Information Retrieval Orland Hoeber and Xue Dong Yang University of Regina, Regina, SK S4S 0A2, Canada {hoeber, yang}@uregina.ca Abstract. The interaction model supported by

More information

This literature review provides an overview of the various topics related to using implicit

This literature review provides an overview of the various topics related to using implicit Vijay Deepak Dollu. Implicit Feedback in Information Retrieval: A Literature Analysis. A Master s Paper for the M.S. in I.S. degree. April 2005. 56 pages. Advisor: Stephanie W. Haas This literature review

More information

An Exploratory Analysis of Semantic Network Complexity for Data Modeling Performance

An Exploratory Analysis of Semantic Network Complexity for Data Modeling Performance An Exploratory Analysis of Semantic Network Complexity for Data Modeling Performance Abstract Aik Huang Lee and Hock Chuan Chan National University of Singapore Database modeling performance varies across

More information

Interaction Model to Predict Subjective Specificity of Search Results

Interaction Model to Predict Subjective Specificity of Search Results Interaction Model to Predict Subjective Specificity of Search Results Kumaripaba Athukorala, Antti Oulasvirta, Dorota Glowacka, Jilles Vreeken, Giulio Jacucci Helsinki Institute for Information Technology

More information

Visual Appeal vs. Usability: Which One Influences User Perceptions of a Website More?

Visual Appeal vs. Usability: Which One Influences User Perceptions of a Website More? 1 of 9 10/3/2009 9:42 PM October 2009, Vol. 11 Issue 2 Volume 11 Issue 2 Past Issues A-Z List Usability News is a free web newsletter that is produced by the Software Usability Research Laboratory (SURL)

More information

A Content Vector Model for Text Classification

A Content Vector Model for Text Classification A Content Vector Model for Text Classification Eric Jiang Abstract As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications.

More information

The Curated Web: A Recommendation Challenge. Saaya, Zurina; Rafter, Rachael; Schaal, Markus; Smyth, Barry. RecSys 13, Hong Kong, China

The Curated Web: A Recommendation Challenge. Saaya, Zurina; Rafter, Rachael; Schaal, Markus; Smyth, Barry. RecSys 13, Hong Kong, China Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title The Curated Web: A Recommendation Challenge

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

Title Core TIs Optional TIs Core Labs Optional Labs. All None 1.1.6, 1.1.7, and Network Math All None None 1.2.5, 1.2.6, and 1.2.

Title Core TIs Optional TIs Core Labs Optional Labs. All None 1.1.6, 1.1.7, and Network Math All None None 1.2.5, 1.2.6, and 1.2. CCNA 1 Plan for Academy Student Success (PASS) CCNA 1 v3.1 Instructional Update # 2006-1 This Instructional Update has been issued to provide guidance on the flexibility that Academy instructors now have

More information

Automated Cognitive Walkthrough for the Web (AutoCWW)

Automated Cognitive Walkthrough for the Web (AutoCWW) CHI 2002 Workshop: Automatically Evaluating the Usability of Web Sites Workshop Date: April 21-22, 2002 Automated Cognitive Walkthrough for the Web (AutoCWW) Position Paper by Marilyn Hughes Blackmon Marilyn

More information

Optimizing Search by Showing Results In Context

Optimizing Search by Showing Results In Context Optimizing Search by Showing Results In Context Susan Dumais and Edward Cutrell Microsoft Research One Microsoft Way Redmond, WA 98052 [sdumais cutrell]@microsoft.com ABSTRACT We developed and evaluated

More information

Application Use Strategies

Application Use Strategies Application Use Strategies Suresh K. Bhavnani Strategies for using complex computer applications such as word processors, and computer-aided drafting (CAD) systems, are general and goal-directed methods

More information

Only the original curriculum in Danish language has legal validity in matters of discrepancy

Only the original curriculum in Danish language has legal validity in matters of discrepancy CURRICULUM Only the original curriculum in Danish language has legal validity in matters of discrepancy CURRICULUM OF 1 SEPTEMBER 2007 FOR THE BACHELOR OF ARTS IN INTERNATIONAL BUSINESS COMMUNICATION (BA

More information

Evaluating usability of screen designs with layout complexity

Evaluating usability of screen designs with layout complexity Southern Cross University epublications@scu Southern Cross Business School 1995 Evaluating usability of screen designs with layout complexity Tim Comber Southern Cross University John R. Maltby Southern

More information

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Appears in WWW 04 Workshop: Measuring Web Effectiveness: The User Perspective, New York, NY, May 18, 2004 Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Anselm

More information

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department

More information

Decomposition. November 20, Abstract. With the electronic storage of documents comes the possibility of

Decomposition. November 20, Abstract. With the electronic storage of documents comes the possibility of Latent Semantic Indexing via a Semi-Discrete Matrix Decomposition Tamara G. Kolda and Dianne P. O'Leary y November, 1996 Abstract With the electronic storage of documents comes the possibility of building

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

Adaptive Search Engines Learning Ranking Functions with SVMs

Adaptive Search Engines Learning Ranking Functions with SVMs Adaptive Search Engines Learning Ranking Functions with SVMs CS478/578 Machine Learning Fall 24 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings

More information

Essential Dimensions of Latent Semantic Indexing (LSI)

Essential Dimensions of Latent Semantic Indexing (LSI) Essential Dimensions of Latent Semantic Indexing (LSI) April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville, PA 19426 Email: akontostathis@ursinus.edu Abstract

More information

Web personalization using Extended Boolean Operations with Latent Semantic Indexing

Web personalization using Extended Boolean Operations with Latent Semantic Indexing Web personalization using Extended Boolean Operations with Latent Semantic Indexing Preslav Nakov Bulgaria, Sofia, Studentski grad. bl.8/room 723 (preslav@rila.bg) Key words: Information Retrieval and

More information

The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval

The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval University of Massachusetts Amherst ScholarWorks@UMass Amherst Computer Science Department Faculty Publication Series Computer Science 1997 The Effectiveness of a Dictionary-Based Technique for Indonesian-English

More information

Information Retrieval CSCI

Information Retrieval CSCI Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1

More information

Document Clustering in Reduced Dimension Vector Space

Document Clustering in Reduced Dimension Vector Space Document Clustering in Reduced Dimension Vector Space Kristina Lerman USC Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292 Email: lerman@isi.edu Abstract Document clustering is

More information

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces A user study on features supporting subjective relevance for information retrieval interfaces Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S. (2006). Proc. 9th International Conference of Asian Digital Libraries

More information

The Person in Personal

The Person in Personal WWW Panel: Searching Personal Content The Person in Personal (Supporting the Person in Searching Personal Content) Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais Stuff I ve I Seen

More information

Learning Ranking Functions with SVMs

Learning Ranking Functions with SVMs Learning Ranking Functions with SVMs CS4780/5780 Machine Learning Fall 2014 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data EXERCISE Using Excel for Graphical Analysis of Data Introduction In several upcoming experiments, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

Iteration vs Recursion in Introduction to Programming Classes: An Empirical Study

Iteration vs Recursion in Introduction to Programming Classes: An Empirical Study BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 4 Sofia 2016 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2016-0068 Iteration vs Recursion in Introduction

More information

Organizing Information. Organizing information is at the heart of information science and is important in many other

Organizing Information. Organizing information is at the heart of information science and is important in many other Dagobert Soergel College of Library and Information Services University of Maryland College Park, MD 20742 Organizing Information Organizing information is at the heart of information science and is important

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

Please note: Only the original curriculum in Danish language has legal validity in matters of discrepancy. CURRICULUM

Please note: Only the original curriculum in Danish language has legal validity in matters of discrepancy. CURRICULUM Please note: Only the original curriculum in Danish language has legal validity in matters of discrepancy. CURRICULUM CURRICULUM OF 1 SEPTEMBER 2008 FOR THE BACHELOR OF ARTS IN INTERNATIONAL COMMUNICATION:

More information

Title Core TIs Optional TIs Core Labs Optional Labs. All None 1.1.4a, 1.1.4b, 1.1.4c, 1.1.5, WAN Technologies All None None None

Title Core TIs Optional TIs Core Labs Optional Labs. All None 1.1.4a, 1.1.4b, 1.1.4c, 1.1.5, WAN Technologies All None None None CCNA 4 Plan for Academy Student Success (PASS) CCNA 4 v3.1 Instructional Update # 2006-1 This Instructional Update has been issued to provide guidance to the Academy instructors on the flexibility that

More information

Internet Usage Transaction Log Studies: The Next Generation

Internet Usage Transaction Log Studies: The Next Generation Internet Usage Transaction Log Studies: The Next Generation Sponsored by SIG USE Dietmar Wolfram, Moderator. School of Information Studies, University of Wisconsin-Milwaukee Milwaukee, WI 53201. dwolfram@uwm.edu

More information

The Semantic Conference Organizer

The Semantic Conference Organizer 34 The Semantic Conference Organizer Kevin Heinrich, Michael W. Berry, Jack J. Dongarra, Sathish Vadhiyar University of Tennessee, Knoxville, USA CONTENTS 34.1 Background... 571 34.2 Latent Semantic Indexing...

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

A Breakdown of the Psychomotor Components of Input Device Usage

A Breakdown of the Psychomotor Components of Input Device Usage Page 1 of 6 February 2005, Vol. 7 Issue 1 Volume 7 Issue 1 Past Issues A-Z List Usability News is a free web newsletter that is produced by the Software Usability Research Laboratory (SURL) at Wichita

More information

Eight units must be completed and passed to be awarded the Diploma.

Eight units must be completed and passed to be awarded the Diploma. Diploma of Computing Course Outline Campus Intake CRICOS Course Duration Teaching Methods Assessment Course Structure Units Melbourne Burwood Campus / Jakarta Campus, Indonesia March, June, October 022638B

More information

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Information Processing and Management 43 (2007) 1044 1058 www.elsevier.com/locate/infoproman Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Anselm Spoerri

More information

Shedding Light on the Graph Schema

Shedding Light on the Graph Schema Shedding Light on the Graph Schema Raj M. Ratwani (rratwani@gmu.edu) George Mason University J. Gregory Trafton (trafton@itd.nrl.navy.mil) Naval Research Laboratory Abstract The current theories of graph

More information

Dynamic Visualization of Hubs and Authorities during Web Search

Dynamic Visualization of Hubs and Authorities during Web Search Dynamic Visualization of Hubs and Authorities during Web Search Richard H. Fowler 1, David Navarro, Wendy A. Lawrence-Fowler, Xusheng Wang Department of Computer Science University of Texas Pan American

More information

Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter

Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter university of copenhagen Københavns Universitet Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter Published in: Advances

More information

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,

More information

Clustered SVD strategies in latent semantic indexing q

Clustered SVD strategies in latent semantic indexing q Information Processing and Management 41 (5) 151 163 www.elsevier.com/locate/infoproman Clustered SVD strategies in latent semantic indexing q Jing Gao, Jun Zhang * Laboratory for High Performance Scientific

More information

Detection of Web-Site Usability Problems: Empirical Comparison of Two Testing Methods

Detection of Web-Site Usability Problems: Empirical Comparison of Two Testing Methods Detection of Web-Site Usability Problems: Empirical Comparison of Two Testing Methods Mikael B. Skov and Jan Stage Department of Computer Science Aalborg University Fredrik Bajers Vej 7 9220 Aalborg East,

More information

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles Topic Notes Multivariate Data & Tables and Graphs CS 7450 - Information Visualization Aug. 27, 2012 John Stasko Agenda Data and its characteristics Tables and graphs Design principles Fall 2012 CS 7450

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Heuristic Evaluation of Groupware. How to do Heuristic Evaluation of Groupware. Benefits

Heuristic Evaluation of Groupware. How to do Heuristic Evaluation of Groupware. Benefits Kimberly Tee ketee@ucalgary.ca CPSC 681 Topic Heuristic Evaluation of Groupware Heuristic evaluation [9] is a discount evaluation method for finding usability problems in a singleuser interface design.

More information

A Study for Documents Summarization based on Personal Annotation

A Study for Documents Summarization based on Personal Annotation A Study for Documents Summarization based on Personal Annotation Haiqin Zhang University of Science and Technology of China face@mail.ustc.edu. cn ZhengChen Wei-yingMa Microsoft Research Asia zhengc@microsoft.com

More information

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Deep Character-Level Click-Through Rate Prediction for Sponsored Search Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as

More information

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles Multivariate Data & Tables and Graphs CS 7450 - Information Visualization Aug. 24, 2015 John Stasko Agenda Data and its characteristics Tables and graphs Design principles Fall 2015 CS 7450 2 1 Data Data

More information

NPTEL Computer Science and Engineering Human-Computer Interaction

NPTEL Computer Science and Engineering Human-Computer Interaction M4 L5 Heuristic Evaluation Objective: To understand the process of Heuristic Evaluation.. To employ the ten principles for evaluating an interface. Introduction: Heuristics evaluation is s systematic process

More information

A Knowledge-Based Approach to Organizing Retrieved Documents

A Knowledge-Based Approach to Organizing Retrieved Documents A Knowledge-Based Approach to Organizing Retrieved Documents Wanda Pratt Information & Computer Science University of California, Irvine Irvine, CA 92697-3425 pratt@ics.uci.edu From: AAAI-99 Proceedings.

More information

Conceptions of Features and Semantic Clusters as Search Mechanisms: A Pilot Study 1

Conceptions of Features and Semantic Clusters as Search Mechanisms: A Pilot Study 1 Conceptions of Features and Semantic Clusters as Search Mechanisms: A Pilot Study 1 Barbara M. Wildemuth *, Meng Yang *, Gary Geisler, Tom Tolleson *, Jon Elsas *, Jei Luo *, and Gary Marchionini * * Open

More information

Title Core TIs Optional TIs Core Labs Optional Labs. 1.1 WANs All None None None. All None None None. All None 2.2.1, 2.2.4, 2.2.

Title Core TIs Optional TIs Core Labs Optional Labs. 1.1 WANs All None None None. All None None None. All None 2.2.1, 2.2.4, 2.2. CCNA 2 Plan for Academy Student Success (PASS) CCNA 2 v3.1 Instructional Update # 2006-1 This Instructional Update has been issued to provide guidance on the flexibility that Academy instructors now have

More information

Visualization of Text Document Corpus

Visualization of Text Document Corpus Informatica 29 (2005) 497 502 497 Visualization of Text Document Corpus Blaž Fortuna, Marko Grobelnik and Dunja Mladenić Jozef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia E-mail: {blaz.fortuna,

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

An Attempt to Identify Weakest and Strongest Queries

An Attempt to Identify Weakest and Strongest Queries An Attempt to Identify Weakest and Strongest Queries K. L. Kwok Queens College, City University of NY 65-30 Kissena Boulevard Flushing, NY 11367, USA kwok@ir.cs.qc.edu ABSTRACT We explore some term statistics

More information

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014. A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish

More information

Web document summarisation: a task-oriented evaluation

Web document summarisation: a task-oriented evaluation Web document summarisation: a task-oriented evaluation Ryen White whiter@dcs.gla.ac.uk Ian Ruthven igr@dcs.gla.ac.uk Joemon M. Jose jj@dcs.gla.ac.uk Abstract In this paper we present a query-biased summarisation

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

A Semi-Discrete Matrix Decomposition for Latent. Semantic Indexing in Information Retrieval. December 5, Abstract

A Semi-Discrete Matrix Decomposition for Latent. Semantic Indexing in Information Retrieval. December 5, Abstract A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval Tamara G. Kolda and Dianne P. O'Leary y December 5, 1996 Abstract The vast amount of textual information available

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Using Clusters on the Vivisimo Web Search Engine

Using Clusters on the Vivisimo Web Search Engine Using Clusters on the Vivisimo Web Search Engine Sherry Koshman and Amanda Spink School of Information Sciences University of Pittsburgh 135 N. Bellefield Ave., Pittsburgh, PA 15237 skoshman@sis.pitt.edu,

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Monroe Township Middle School Monroe Township, New Jersey

Monroe Township Middle School Monroe Township, New Jersey Monroe Township Middle School Monroe Township, New Jersey Middle School 8 th Grade *PREPARATION PACKET* Welcome to 8 th Grade Mathematics! Our 8 th Grade Mathematics Course is a comprehensive survey course

More information

Content-based Dimensionality Reduction for Recommender Systems

Content-based Dimensionality Reduction for Recommender Systems Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender

More information

MULTIMEDIA RETRIEVAL

MULTIMEDIA RETRIEVAL MULTIMEDIA RETRIEVAL Peter L. Stanchev *&**, Krassimira Ivanova ** * Kettering University, Flint, MI, USA 48504, pstanche@kettering.edu ** Institute of Mathematics and Informatics, BAS, Sofia, Bulgaria,

More information

Assessing the Impact of Sparsification on LSI Performance

Assessing the Impact of Sparsification on LSI Performance Accepted for the Grace Hopper Celebration of Women in Computing 2004 Assessing the Impact of Sparsification on LSI Performance April Kontostathis Department of Mathematics and Computer Science Ursinus

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

TREC 2017 Dynamic Domain Track Overview

TREC 2017 Dynamic Domain Track Overview TREC 2017 Dynamic Domain Track Overview Grace Hui Yang Zhiwen Tang Ian Soboroff Georgetown University Georgetown University NIST huiyang@cs.georgetown.edu zt79@georgetown.edu ian.soboroff@nist.gov 1. Introduction

More information

Methods for closed loop system identification in industry

Methods for closed loop system identification in industry Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2015, 7(1):892-896 Review Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Methods for closed loop system identification in industry

More information

Using Query History to Prune Query Results

Using Query History to Prune Query Results Using Query History to Prune Query Results Daniel Waegel Ursinus College Department of Computer Science dawaegel@gmail.com April Kontostathis Ursinus College Department of Computer Science akontostathis@ursinus.edu

More information

How to use indexing languages in searching

How to use indexing languages in searching Indexing, searching, and retrieval 6.3.1. How to use indexing languages in searching Overview This module explains how you can become a better searcher by exploiting the power of indexing and indexing

More information

99 /

99 / 99 / 2 3 : : / 90 ««: : Nbahreyni68@gmailcom ( Mmirzabeigi@gmailcom 2 Sotudeh@shirazuacir 3 8 / 00 : (203 «(2000 2 (998 985 3 5 8 202 7 2008 6 2007 Kinley 2 Wilson 3 Elm, & Woods Mcdonald, & Stevenson

More information

Understanding the Relationship between Searchers Queries and Information Goals

Understanding the Relationship between Searchers Queries and Information Goals Understanding the Relationship between Searchers Queries and Information Goals Doug Downey University of Washington Seattle, WA 9895 ddowney@cs.washington.edu Susan Dumais, Dan Liebling, Eric Horvitz Microsoft

More information

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

Collaborative Filtering based on User Trends

Collaborative Filtering based on User Trends Collaborative Filtering based on User Trends Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos Papadopoulos, and Yannis Manolopoulos Aristotle University, Department of Informatics, Thessalonii 54124,

More information

Information Retrieval. hussein suleman uct cs

Information Retrieval. hussein suleman uct cs Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , An Integrated Neural IR System. Victoria J. Hodge Dept. of Computer Science, University ofyork, UK vicky@cs.york.ac.uk Jim Austin Dept. of Computer Science, University ofyork, UK austin@cs.york.ac.uk Abstract.

More information

Chapter 8. Evaluating Search Engine

Chapter 8. Evaluating Search Engine Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information