Using Text Elements by Context to Display Search Results in Information Retrieval Systems Model and Research results

Similar documents
A World Wide Web-based HCI-library Designed for Interaction Studies

Using Clusters on the Vivisimo Web Search Engine

Using Text Analysis to Inform Clients of the Subject of a Document

Domain Specific Search Engine for Students

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

An Empirical Evaluation of User Interfaces for Topic Management of Web Sites

Shared Sensemaking: Enhancing the Value of Collaborative Web Search Tools

How to use the SRI Research Network s Zotero-based Library

Information Gathering Support Interface by the Overview Presentation of Web Search Results

Context Based Web Indexing For Semantic Web

Optimizing Search by Showing Results In Context

Knowledge enrichment through dynamic annotation

Evolution of the Walden s Paths Authoring Tools

An Analysis of Image Retrieval Behavior for Metadata Type and Google Image Database

Organizing Topic-Specific Web Information

Introduction to Web Clustering

How are XML-based Marc21 and Dublin Core Records Indexed and ranked by General Search Engines in Dynamic Online Environments?

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

A NEW PERFORMANCE EVALUATION TECHNIQUE FOR WEB INFORMATION RETRIEVAL SYSTEMS

SSRN PAPER SUBMISSION PROCESS

Automated Cognitive Walkthrough for the Web (AutoCWW)

Enhancing E-Journal Access In A Digital Work Environment

WESTLAW INTERNATIONAL Quick Guide

Enhanced Visualization for Web-Based Summaries

A New Technique to Optimize User s Browsing Session using Data Mining

Query Modifications Patterns During Web Searching

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Deep Web Content Mining

Instant Content Creator. User Guide

Analysis of Behavior of Parallel Web Browsing: a Case Study

Introduction to the Internet and Web

Meter Trouble Report PUBLIC. A Guide for Market Participants. Issue 6.0 IMP_GDE_0098

User Guide. Version 1.5 Copyright 2006 by Serials Solutions, All Rights Reserved.

Quoogle: A Query Expander for Google

An Empirical Evaluation of User Interfaces for Topic Management of Web Sites

BUILDING A CONCEPTUAL MODEL OF THE WORLD WIDE WEB FOR VISUALLY IMPAIRED USERS

Visualization of User Eye Movements for Search Result Pages

Student Usability Project Recommendations Define Information Architecture for Library Technology

Automated Online News Classification with Personalization

IJMIE Volume 2, Issue 9 ISSN:

The State of Website Accessibility in Higher Education

6367(Print), ISSN (Online) Volume 4, Issue 3, May June (2013), IAEME & TECHNOLOGY (IJCET)

CCH INCORPORATED 05/03

Dynamic Visualization of Hubs and Authorities during Web Search

The Role of Information Scent in On-line Browsing:

Link Based Clustering of Web Search Results

Introduction. What do you know about web in general and web-searching in specific?

Citations and Bibliographies

Qualitative Data Analysis Software. A workshop for staff & students School of Psychology Makerere University

An Introduction to PubMed Searching: A Reference Guide

Using the Web in Your Teaching

Destiny. Understanding Roles and Assigning Permissions Webinar. Participant Guide

Appendix A: Scenarios

Search Engine Architecture. Hongning Wang

Setting up Flickr. Open a web browser such as Internet Explorer and type this url in the address bar.

A Parallel Computing Architecture for Information Processing Over the Internet

Supporting World-Wide Web Navigation Through History Mechanisms

Narrowing It Down: Information Retrieval, Supporting Effective Visual Browsing, Semantic Networks

Chapter 3: Google Penguin, Panda, & Hummingbird

Introduction to Compendium Tutorial

Incorporating Hyperlink Analysis in Web Page Clustering

Authoritative Sources in a Hyperlinked Environment

The Internet The Internet

WEB APPLICATION DEVELOPMENT. How the Web Works

CHECKPOINT CATALYST QUICK REFERENCE CARD

ACM Digital Library. LIBRARY SERVICES

Competitive Intelligence and Web Mining:

CADIAL Search Engine at INEX

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm

Enhancing Cluster Quality by Using User Browsing Time

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca

I&R SYSTEMS ON THE INTERNET/INTRANET CITES AS THE TOOL FOR DISTANCE LEARNING. Andrii Donchenko

Adaptive and Personalized System for Semantic Web Mining

An Approach To Web Content Mining

EVALUATION OF SEARCHER PERFORMANCE IN DIGITAL LIBRARIES

COMPARISON OF FEATURES OF ONLINE JOURNALS AND DATABASES

Enhancing Cluster Quality by Using User Browsing Time

A Knowledge-Based Approach to Organizing Retrieved Documents

Recommendation on the Web Search by Using Co-Occurrence

CIW: JavaScript Specialist. Course Outline. CIW: JavaScript Specialist. 30 Dec

A Linear Regression Model for Assessing the Ranking of Web Sites Based on Number of Visits

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

ResPubliQA 2010

What is SEO? Search Engine Optimization 101

Information Retrieval Spring Web retrieval

The Effect of Individual Differences on Searching the Web

Document Structure Analysis in Associative Patent Retrieval

Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications

A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

Cost. For an explanation of JISC Banding and charging, please go to:

IBE101: Introduction to Information Architecture. Hans Fredrik Nordhaug 2008

Basics of SEO Published on: 20 September 2017

WebBeholder: A Revolution in Tracking and Viewing Changes on The Web by Agent Community

Ravel Law Quick Start Guide

The Black Magic of Flash SEO

Understanding the use of Temporal Expressions on Persian Web Search

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces

Evaluation and Design Issues of Nordic DC Metadata Creation Tool

RSDC 09: Tag Recommendation Using Keywords and Association Rules

Transcription:

Using Text Elements by Context to Display Search Results in Information Retrieval Systems Model and Research results Offer Drori SHAAM Information Systems The Hebrew University of Jerusalem offerd@ {shaam.gov.il, cs.huji.ac.il} Tel. +972-2-5688439 Fax +972-2-5688681 Abstract Information retrieval systems display search results by various methods. This paper focuses on a model for displaying a list of search results by means of textual elements that utilize a new information unit that replaces the currently used information unit. The paper includes a short description of several studies that support the model. 1. Introduction Because of the growth in the number and scope of global databases, a special approach is required to locating information, from the perspective of the user interface. The Internet, as it exists today, is an outstanding example of a broad-base, unfocused database. Most Internet search engines display their information as a serially ordered list of results (with a partial attempt at ranking the results). In most cases, this list includes the document title, URL and, at times, the first few lines of the document. The information, as currently displayed to the user, is incomplete and insufficiently focused on the search query. This requires the user to actually read all the documents in the list with being able to discriminate. With today s search engines most of the search transactions yield a list of hundreds and even thousands of documents, while studies show that the average user only looks at the first 1-2 results (Kirsch 1998). Finding a solution to this paradox presents a serious challenge to researchers in the field. This paper will suggest a way to locate the relevant document without having to read the listed documents. 2. A model for displaying textual search results To deal with the challenge presented above, this section will define a hierarchical structure containing three levels for displaying search results (see Figure No. 1). Search results can be displayed from textual databases by relying on two basic principles; visualization of the results, and the use of textual components to design the list of results. This article focuses solely on the use of textual components to display search results, where the textual component consists of two categories: internal document information and external document information. 2.1 Results based on internal document information In this category, a number of techniques are used, most of which include information components related to the search topic. Following is a description of the various methods. Significant sentences Significant sentences can be descriptive sentences based on defined paragraphs in the document, for example: Abstract, Introduction, Conclusion. Alternatively, sentences relevant to the search query can be 1

used, which include the terms that were the reason for the document being chosen (Luhn 196), (Drori 1998). Significant words Significant words in the document are intrinsic descriptives, such as keywords or frequently repeated words. The document s author determines keywords, or they can be produced automatically. Frequently repeated words that are computer generated (including Stop List operation) can yield results that are similar but less exact (Baldonado & Winograd 1997). Information from HTML tags The language tags can provide us with information about the document. For example, paragraph or subtitle headings can be located by using the <H> tags, and can even be used to generate a table of contents. <META> tags contain information about the document as recorded by the document author, such as: abstract, keywords, and others. A certain amount of noise must be taken into account with these tags because of commercial rating considerations. The following studies utilized tables of contents: Egan et al 1989, Chimera & Shneiderman 1994, and Hertzum & Frokjaer 1996. Additional information Additional information can be generated from within the actual document; for example, when a document includes citations from other documents, the titles of the cited documents can be used, assuming that they have a subject in common (Pitkow & Pirolli 1997). Displaying Search Results Techniques Displaying Search Results Textual Techniques Graphic Techniques Internal Document Information External Document Information Significant Sentences Significant Words Information from HTML Tags Additional Information Document Classification Cited Documents Information from the Data Base Figure No. 1 - Model for displaying search results 2.2 Results based on external document information This category utilizes a number of methods that include information components based on the document s subject field and not contained within the actual document. A description of these methods follows. Document classification This method displays the category with which the document is associated. Search engines that manually define document categories (such as Yahoo) can be used for this purpose. It is also possible to create categories with the aid of computerized algorithms, and the subject association of the document can be established by clustering all the search results (Allen et al 1993), (Zamir & Etzioni 1999). 2

Citing documents This refers to a situation in which one document cites another, where both have a subject in common. The citing documents can be located directly via the Internet, or by using a subject-oriented database such as the Science Citation Index. When the citing documents are located, either their titles or, alternatively, their cited paragraphs can be used (Amento et al 1999). Information from the database The database in which the document is located can provide an indication of the document s subject in a number of ways. Subject oriented databases usually specify the database subject field. An attempt can be made to determine the database subject field from the titles of additional documents contained in the database. 3. Research into displaying search results The research objective was to locate those information components in the search result display that are most relevant to the user, in order to make the task of locating information both more efficient and more effective. The research questions were: 1. What are the most important information details for display in the search results? 2. When comparing the various methods of displaying search result information, which method is preferable in terms of accomplishing the user s task? The research agenda included the performance of various search tasks by a number of user groups. The tasks were carried out using various interfaces. For research purposes, a database was created that included response documents (in English) for defined search queries. In all the studies 3-4 different interfaces were used. The effectiveness of the search, and user satisfaction, were checked using two dimensions: Objective data: response time and accuracy. Subjective data: convenience, sense of confidence, satisfaction, and the relevancy of information components. The participants in the research came from several groups and included students from the School of Business Administration (MBA), technical support personnel from the computer field, and information specialists and librarians from the information field. Statistical analyses included the Anova Test for examining the significance of the difference in the methods, the P Test to determine the significance of the results and, naturally, standard statistical analyses of averages, standard deviations, etc. In the initial study, 128 participants worked with 3 different interfaces. The subject examined was the contribution made by displaying lines of information from the document in addition to the title. The interfaces were: T - titles only; TFL - titles + first lines from the document head (refers to internal document information/significant sentences/descriptive sentences of the model in figure 1); TLC - titles + lines by search context (refers to internal document information/significant sentences/ sentences relevant to the search query of the model). 3

Research 1 - The difference between the methods 5 8 6 4 2 T TFL TLC Time (sec.) 4 3 2 Difficult tasks Simple tasks Comfort Confidence Relevancy T TFL TLC methods A significant difference was found between the methods (P<.1). The TLC method (displaying lines by search context) was preferable in all aspects of the subjective dimension (search convenience, feeling of confidence during use, and relevancy of information). For the objective dimension of search time, the TLC method had an advantage (31%) in the case of complicated tasks, while the other methods had an advantage when handling simple or moderately complex tasks. Research 1 Snap shot TLC interface ( Titles in blue, Lines in context in black, Search terms in blue) 4

In the second study, 51 participants worked with 3 different interfaces. The subject checked was the contribution made by displaying keywords for the information items that were displayed in the first study. Keywords refers to internal document information/significant words/intrinsic descriptive of the model in figure 1. The interfaces were: TK - titles + keywords; TFLK - titles + first lines from the document head + keywords; TLCK - titles + lines by search context + keywords. Research 2 Snap shot TLCK interface ( Titles in blue, Key words in green, Lines in context in black, Search terms in red) A significant difference was found between the methods (P<.1). The TLCK method (lines by search context + keywords) is preferable for the subjective dimension. The TLCK method possesses an advantage (33%) over the other methods, in the case of search times for moderate and difficult tasks. 5

Research 2 - The differences between the methods 8 6 4 2 Comfort Confidence Relevancy TK TFLK TLCK Time (sec.) 14 12 8 6 4 2 TK TFLK TLCK Methods Difficult tasks Simple tasks A significant difference was found between the methods (P<.1). The TLCK method (lines by search context + keywords) is preferable for the subjective dimension. The TLCK method possesses an advantage (33%) over the other methods, in the case of search times for moderate and difficult tasks. In the third study, 75 participants worked with 4 interfaces. The subject checked was the contribution made by displaying the document category in addition to lines from the document (category refers to external document information/documents classification of the model in figure 1). The interfaces were: TFL titles + first lines; TFLC titles + first lines + categories; TLC titles + lines by search context; TLCC titles + lines by search context + categories. Research 3 - The differences between the methods 8 6 4 2 Comfort Confidence Relevancy TFL TFLC TLC TLCC Time (sec.) 2 15 5 TFL TFLC TLC TLCC Methods Difficult tasks Simple tasks A significant difference was found between the methods (P<.1). The TLCC method possesses an advantage in the subjective dimension and also in search times (67%) across all task difficulty levels. In this study, we also examined which search results display parameters are important to the user. The findings showed that confidence in the answer is the most important parameter (78%), followed by search time (73%) and then the ability to find the answer without reading all the documents (54%). User convenience was found to be a less important parameter for the search process (44%). 6

Research 3 Snap shot TLCC interface ( Titles in blue, Categories in red, Lines in context in black) In the forth study, 61 participants worked with 4 interfaces. The subject checked was the contribution made by displaying the document category, address, common words and the organization that published the paper in addition to lines from the document (category refers to external document information/documents classification of the model in figure 1. Document address refers to external document information/information from the data base. Common words refers to internal document information/significant words and the organization that published the paper refers to external document information/information from the data base). The interfaces were: TLCC titles + lines by search context + categories; TLCA titles + lines by search context +internet address; TLCCW titles + lines by search context + common words; TLCO titles + lines by search context + organization name. 7

Research 4 - The differences between the methods % (-high; -low) 8 6 4 2 Comfort Confidence Relevancy TLCC TLCA TLCCW TLCO Time (min.) 7 6 5 4 3 2 1 TLCC TLCA TLCCW TLCO Methods Difficult tasks Simple tasks A significant difference was found between some of the methods (P<.1). The TLCCW method possesses an advantage in the subjective dimension and also in search times across all task difficulty levels. In this study, we also examined which search results display parameters are important to the user. The findings showed that the ability to find the answer without reading all the documents is the most important parameter (87%), followed by confidence (77%) and than search time (67%). User convenience was found to be a less important parameter for the search process (51%). 8

Research 4 Snap shot TLCCW interface ( Titles in blue, Common words in red, Lines in context in black, Search terms in bold) 4. Conclusion and findings The objective of the studies was to examine some of the components of the search results display model. The studies that were performed enable the definition of a new information unit that can replace the unit currently used. We found that, in addition to the title, the alternative information unit must include lines by search context, keywords, and an indication of the document category. Document category can be accomplished by common words. Authors of article and database administrators can benefit by including the suggested information components in each document using standardized means (such as XML). An interesting finding was revealed in the feedback on the parameters that the users considered important when using data retrieval systems. The feeling of confidence when using a system is perceived as having a higher priority than speed and locating the answer without having to read the list of documents. On the other hand, users assigned ease of use a low priority. A study planned by us will include an in-depth evaluation of additional information components of the model. Acknowledgements I wish to thank Eliezer Lozinski for his excellent suggestions in the course of the research. My thanks also to Nir Alon and Aliza Weisberg for their assistance in gathering the data in Research 3 and to Liora Halevi and Yifat Betser-Nahum for their assistance in gathering the data in Research 4. 9

References 1. Allen, R., Obry, P., Littman, M., An interface for navigating clustered document sets returned by queries, Proceedings of the ACM Conference on Organizational Computing Systems (COOCS93), 1993, 166-171. 2. Amento, B., Hill, W., Terveen, L., Hix, D., Ju, P., An empirical Evaluation of User interfaces for Topic Management of Web Sites, Proceedings of CHI'99, ACM Press, Pittsburg PA, May 1999, 552-559. 3. Baldonado, M., Winograd, T., SenseMaker: an information-exploration interface supporting the contextual evolution of a user's interests, Proceedings of CHI97, 1997, 11-18. 4. Chimera, R., Shneiderman, B., An Exploratory Evaluation of Three Interfaces for Browsing Large Hirarchical Tables of Contents, ACM Transaction in Information Systems, New York: ACM, 12 (4), October 1994, 383-46. 5. Drori, O., The User Interface in Text Retrieval Systems, SIGCHI bulletin, New York: ACM, July 1998, 3 (3), 26-29. 6. Egan, D., Remde, J., Landauer, T., Lochbaum, C., Gomez, L., Behavioral Evaluation and Analysis of a Hypertext Browser, Proceedings of CHI '89, New York: ACM, 1989, 25-21. 7. Hertzum, M., Frokjaer, E., Browsing and Querying in Online Documentation: A Study of User Interface and the Interaction Process, ACM Transaction on Computer-Human Interection, New York: ACM, 3 (2), June 1996, 136-161. 8. Kirsh, S., Infoseek's experiences searching the Internet, SIGIR Forum, New York: ACM, Vol. 32, 1998, Num. 2, 3. 9. Luhn, H., Keyword in Context Index for Technical Literature, American Documentation, XI (4), 196, 288-295. 1. Pitkow, J., Pirolli, P., Life, Death, and Lawfulness on the Electronic frontier, CHI '97 Proceedings, New York: ACM, April 1996, 213-22. 11. Zamir, O., Etzioni, O., Grouper: A Dynamic Clustering Interface to Web Search Results, WWW8 Proceedings, Toronto: WWW, 1999. 1