Quoogle: A Query Expander for Google

Size: px
Start display at page:

Download "Quoogle: A Query Expander for Google"

Transcription

1 Quoogle: A Query Expander for Google Michael Smit Faculty of Computer Science Dalhousie University 6050 University Avenue Halifax, NS B3H 1W5 smit@cs.dal.ca ABSTRACT The query is the fundamental way through which a user interacts with a body of data. The query is the sole vehicle for a user to obtain the information they are looking for with any degree of efficiency. Hence, providing a means for the user to improve the query itself is a quintessential step toward obtaining better results. We investigate the topic of query revision by creating a system that allows users to refine their query with vocabulary obtained during the searching process. Common terms in the hit set generated by the user's query will be shown to the user as suggestions as a way to interactively refine their queries. This will direct the search to particular clusters of results that should be more relevant to the user. Keywords Information retrieval, query expansion, query augmentation INTRODUCTION There has been much discussion in the literature regarding query reformulation. Different techniques, such as using a thesaurus or a corpus analysis [3] to expand the query, have been suggested. Other techniques include automatic query reformulation or expansion [4], nearest neighbor expansions [14], and direct relevance feedback [1]. It has been shown that under certain circumstances, these methods are effective in providing better results. However, each method has its own unique weaknesses and strengths. Direct relevance feedback is not commonly used in today s commercial information retrieval systems. The cost of learning to use the feedback mechanism often outweighs the perceived benefit, and users want a click and go search, not complicated feedback measures. Our investigation attempts to approach the issue of user feedback in a way that is intuitive and easily understood for the end user. The search algorithm should do any complicated computations such as clustering, indexing or topic relevance determinations in the background, where the user can t see the details. However, the user Whitney Thiele Faculty of Computer Science Dalhousie University 6050 University Avenue Halifax, NS B3H 1W5 thiele@cs.dal.ca should be given the final choice on what modifications are made to their query, if any. In short, we attempt to refine (or broaden) a query as necessary to suggest some possible additions to the query as guided by the user s feedback. We have created a system that allows users to refine their query with vocabulary obtained from the results of their initial query. High-weighted terms in the hit set that is returned by the user s query are shown to the user as suggestions for ways to enhance their queries. This directs the search to particular clusters of results that are more relevant to the user. The choice the user makes will be stored to aid in the automatic refinement process of other queries. Query term suggestions to modify the original query should be a tool which is available for the user to both provide insight into the representative terms in the resulting hit set, and to help the user augment the query quickly and immediately. BACKGROUND AND MOTIVATION One shortcoming in many modern information retrieval systems is the interface between the retrieval system and the end user. Specifically, the query formulated by the user is the only input means a system has to determine the intentions of the user and provide suitable results. As data sets like the World Wide Web get larger and larger, and as they cover an ever-widening range of topics, more sophistication in our query generation is required. However, new users of search engines that search these large data sets are generally not capable of entering advanced queries they expect relevant results while providing only the minimum of input. It has been shown in some cases that compared to doing no expansion to the query at all, query expansion can improve the results obtained. [13] Unfortunately, the cumbersome interfaces and complex processes on some current query modification techniques deter the users who would receive the most benefit from these methods. User s do not perceive any immediate benefit to

2 providing extra information to the search engine, and view more complicated methods with skepticism. There have been many investigations into different aspects of query modification using various techniques. These methods include relevance feedback, term cooccurrence, word stemming, and thesauruses. The use of terms to augment the query obtained through relevance feedback has been shown in [6] to produce better results. Relevance feedback is the method in which the system adjusts the query to produce a greater number of relevant documents. In the case of the vector space model, this is accomplished by adjusting the term weights in the query so that the resulting query vector moves closer to the to large groupings or clusters of relevant documents. This is dependant on factors such as the document-ranking algorithm which determines which documents are relevant. Another modification that has been examined is the use of terms that co-occur in the document set. Terms that have a high frequency of occurring in the same document are added to the query, thereby expanding the query. This has been shown to be a highly variable technique that can produce both relevant and irrelevant results depending on the terms that are added to the query. [5] Word Stemming algorithms are used in query expansion by conflating word stems with a collection of suffixes. In other words, this extends the terms in the query by adding similar terms based on lexical structure. This strategy for automatically expanding the query terms is also highly variable, depending on the term root, or stem. This can lead to cases where the returned results are improved, and in other situations providing poorer results. [7][11] A thesaurus is used to provide terms that are somehow related to the query terms by similarities in definition instead of the using word stems. This strategy has also been studied for use in automatic query expansion. The query term is augmented with the addition of synonyms from a thesaurus in an attempt to include related terms that might occur in related documents. This will obviously increase the recall capacity of the system, since a wider variety of terms will be included in the results. [12] However, it can also have the effect of increasing the size of hit set, for the new query would match any of the added terms, not all of them. The increase in recall comes at a cost to precision, and forces the user to sort through a larger hit set if their desired document is not the first result. New retrieval algorithms use different techniques to rank documents in an attempt to provide the better results to the end user. For example, some systems use link topology to assign a measure of popularity, and others use geographical constraints to localize potential search results. In commercial systems, especially Internet search engines, the retrieval mechanism and any resultant processing is done out of the user s sight. This means that additional processing of the query can be done without the knowledge of the user, with the user seeing only the suggested queries. IMPLEMENTATION To test our approach to query augmentation, we developed a system that makes use of an existing search engine which has the infrastructure and a proven searching method already in place. Quoogle, our augmentation to the existing search engine Google 1, was written using Perl and tested on a Windows XP Pentium 2.4 GHz machine running Apache. To use Quoogle, a user enters their query into the Quoogle search page. The results page is divided into two sections the top portion is for Quoogle s suggested augmented queries, and the bottom section is the standard Google results page. If one of the top ten Google responses is sufficient for the user, they can ignore Quoogle s suggestions; if they find that their query was too broad, they can select one of the augment queries, or try another query of their own invention. This method of user choice regarding query augmentation is very similar to Google s own spelling correction suggestions. Although Google was chosen for our implementation, other search engine systems such as AltaVista 2 or Yahoo 3 could have been augmented in a similar fashion. To generate the augmented queries, the Quoogle engine performs an analysis of the hit set. It takes a random sample of 30 result pages from the first 100 results returned by Google. It then downloads each of those pages, removes HTML tags and comments, tokenizes the documents based on white space, and removes any words that are on the stop word list. It then calculates normalized term frequency and inverse document frequency scores for each of documents and terms in this set of 30, and assigns each term a normalized weight according to this formula:

3 Figure 1: The Quoogle user interface. This normalization is done to minimize the effect that document length would have on the term weights. The terms are ranked according to their weight, and all but the top 5 terms are discarded. These remaining terms are used to generate five suggested queries that are displayed to the user. If the user clicks on of these links, the system stores which key word was chosen, and redirects the user to a Google search page with the results for the augmented query. This ends the user s interaction with Quoogle. Limitations There are a number of limitations to the current implementation of Quoogle. In particular, Quoogle is an augmentation to an existing system, Google. Although Google provides a convenient method for programmers to interface with their search engine, this method was not suited to this application. Their method limits each query to 10 results; to obtain the top 100 documents, ten separate queries were run. Once the 30 sample documents were selected, each page had to be downloaded. Running the queries and downloading the pages is very time consuming; the time taken for the actual analysis of the pages is insignificant in comparison. This prevents the current implantation from being an effective real-time system. If Google chose to deploy an augmentation such as ours, these limitations would not be a factor. Look and Feel The interface to the Quoogle system was carefully designed so that it remains consistent with the interface of the augmented system. In this particular case, the term suggestions are presented via simple, single click hypertext links which are located in a frame directly Term Relevant Explanation casinos yes Casino gambling sports yes Gambling on sports cyber yes Gambling online thrill no money yes Gambling with money Table 1: The term suggestions for gambling, their relevance, and the reason given for their relevance. above the main Google search results page. Figure 1 illustrates how an augmented system can be easily integrated into an existing system. Experiment The Quoogle system was evaluated by considering the relevance of the term suggestions that were returned by a typical query. The hypothesis was that the Quoogle system would be able to return term suggestions that were related or relevant to the original query. The first step in determining the feasibility of this approach was to analyze the terms which were returned by Quoogle from a typical search. A collection of 100 search queries was randomly selected from the Metaspy website 4. This site provides a random sample of queries that users have made on a selection of search engines online. Each search query was entered into the Quoogle system, and the resulting suggested terms were determined to be either relevant or not relevant terms to 4

4 Term Position Average Relevance 1 94% 2 88% 3 80% 4 87% 5 28% Table 2: Relevance by rank of weighted term suggestions. the original query. The system created 500 suggested terms over the course of the evaluation of 100 queries. The determination of relevance was based on relational information from WordNet [10] and a system similar to Joho [8] in which conceptual or contextual relations are considered. These types of relations can be considered for terms that have a conceptual relation or association outside the entries of a typical thesaurus or WordNet. For instance, the term elevator is related to the terms stairs and office tower (Table 1). The resulting size of the hit set after the query modification was considered and recorded for further analysis. RESULTS The Quoogle augmentation of the Google system showed that the system may provide some useful suggested terms. The analysis of the suggested terms shows an 85.2% relevance rating when compared to the original terms in the query. Some terms will co-occur over a specific topic category and may be included in the returned suggestions, which might account for such a high relevance result. There may be some natural clustering of vocabulary terms that may exist in the returned hit set which should be picked up by the termweighting mechanism. Another interesting feature of the relevance result is that on average, there was a substantial drop of relevance for the last suggestion that was returned by the system (Table 2). This could be explained by terms which should have been captured by Quoogle in a stop list. For instance, in a number of specific cases, there were errors on the hit set pages that caused some HTML to appear in the term index. This in turn allowed some erroneous results to be returned to the end user in the form of suggestions that had little value for query modification. The hit set of the augmented queries was reduced on average by 71% from the hit set of the original query. It should be noted that although we measure relevance to the initial query, we have no way of knowing how Cheap cigarettes: Tobacco, search, information, free, products Affirmative Action: Cell phones: Cheap Tickets: Admissions, court, office, black, opportunity Radiation, cellular, cancer, rf, middot Airline, pm, air, airlines, flights, eacute Table 3: The first four queries from Metaspy and the terms suggested for addition to the queries. Notice that airline and airlines appears in the cheap tickets expansion, which indicates that some type of stemming may be useful for the query suggestions. frequently the terms are relevant to what the user wished to find. There is a random factor in the equation, as there could be many terms relevant to the initial query. It is impossible to determine what terms the original creator of the query would find relevant. FUTURE WORK In its current form, Quoogle presents relevant terms to the user that might narrow and focus their search. There have been many studies on hypertext systems and how the structure hypertext can have an affect on the end user. Some studies illustrate that this structure is important in developing mental models of concepts [2]. It is therefore important to facilitate both the information retrieval task, concurrently with helping in the development of the end users model. The addition of term suggestions to current online systems should help the end user focus their query and obtain better results. The mechanism of this is dependant on several usability issues that should be investigated to determine the role they may play. For instance, there have been investigations into different interface presentation mechanisms for interactive query expansion systems[8]. Hierarchical lists are another option to single click expansion tools to be provided to the end user to limit the amount of typing required. Some research suggests that there is little difference in performance between automatic and interactive query expansion [9]. However, other studies show that the end user may prefer to have control of the system and make the decisions instead of having the system automatically

5 choose for them [8]. More investigation of this effect should be addressed in a comprehensive user study. Other future work may include ways to add more intelligence to the weighting mechanism, and attempting to reduce the random factor. For example, words that co-occur in the hit set with the original query terms could be given a higher weight. A user s history could be stored to assign additional weight to hit set terms based on the past interests of the user. CONCLUSION We have presented a system to augment a user s query using relevant terms from the hit set. The system is an addition to an existing service, and therefore has some speed limitations. However, initial results show that the system holds some promise. Future work should include a user study to perform a more detailed analysis. ACKNOWLEDGMENTS We thank Google for their provision of a method for researches to query their system. Their API made the development of Quoogle easier. Thanks also to the CSCI 6403 class for their helpful comments, questions, and criticisms. REFERENCES 1. ALLAN, J. (1996) Incremental relevance feedback for information filtering, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, August 18-22, Zurich, Switzerland, p DILLON, A., (1991) A. Readers' models of text structures: the case of academic articles. International Journal of Man-Machine Studies, 35:p GAUCH, S., WANG, J., AND MAHESH, S.,(1999) A corpus analysis approach for automatic query expansion and its extension to multiple databases, ACM Transactions on Information Systems (TOIS), v.17 n.3, p GAUCH, S. AND SMITH, J. B.,(1991) Search improvement via automatic query reformulation, ACM Transactions on Information Systems (TOIS), v.9 n.3, p HARMAN, D. (1988) Towards interactive query expansion: In Proceedings of the Eleventh International Conference on Research & Development in Information Retrieval (New York, NY) HARMAN, D. (1992). Relevance feedback revisited: In Proceedings of the 15th International ACM/SIGIR Conference on Research and Development in Information Retrieval. 7. HARMAN, D. (1992) A failure analysis on the limitations of suffixing in an online environment. In SIGIR '87, JOHO, H et al., (2002) Hierarchical presentation of expansion terms, Proceedings of the 17th symposium on Proceedings of the 2002 ACM symposium on applied computing, March 11-14, Madrid, Spain 9. MAGENNIS, M. AND VAN RIJSBERGEN, C. J. V. (1997) The potential and actual effectiveness of interactive query expansion: In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval MILLER, G. A. (1995) WordNet: A lexical database for English. Communications of the ACM, 38(11): PAICE C. D. (1994), An evaluation method for stemming algorithms. Proceedings of the 17th annual International ACM-SIGIR conference on Research and Development in Information Retrieval 1994, pp QIU, Y. AND FREI, H. P. (1993) Concept Based Query Expansion Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval 13. RUTHVEN, I. (2003): Re-examining the potential effectiveness of interactive query expansion. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval: p SMEATON, A. F., AND VAN RIJSBERGEN, (1981) The nearest neighbour problem in information retrieval: an algorithm using upperbounds, Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval, Oakland, California., p

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Using Query History to Prune Query Results

Using Query History to Prune Query Results Using Query History to Prune Query Results Daniel Waegel Ursinus College Department of Computer Science dawaegel@gmail.com April Kontostathis Ursinus College Department of Computer Science akontostathis@ursinus.edu

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

An Adaptive Agent for Web Exploration Based on Concept Hierarchies

An Adaptive Agent for Web Exploration Based on Concept Hierarchies An Adaptive Agent for Web Exploration Based on Concept Hierarchies Scott Parent, Bamshad Mobasher, Steve Lytinen School of Computer Science, Telecommunication and Information Systems DePaul University

More information

Noida institute of engineering and technology,greater noida

Noida institute of engineering and technology,greater noida Impact Of Word Sense Ambiguity For English Language In Web IR Prachi Gupta 1, Dr.AnuragAwasthi 2, RiteshRastogi 3 1,2,3 Department of computer Science and engineering, Noida institute of engineering and

More information

A Document-centered Approach to a Natural Language Music Search Engine

A Document-centered Approach to a Natural Language Music Search Engine A Document-centered Approach to a Natural Language Music Search Engine Peter Knees, Tim Pohle, Markus Schedl, Dominik Schnitzer, and Klaus Seyerlehner Dept. of Computational Perception, Johannes Kepler

More information

Document Structure Analysis in Associative Patent Retrieval

Document Structure Analysis in Associative Patent Retrieval Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,

More information

Using SportDiscus (and Other Databases)

Using SportDiscus (and Other Databases) Using SportDiscus (and Other Databases) Databases are at the heart of research. Google is a database, and it receives almost 6 billion searches every day. Believe it or not, however, there are better databases

More information

Formulating XML-IR Queries

Formulating XML-IR Queries Alan Woodley Faculty of Information Technology, Queensland University of Technology PO Box 2434. Brisbane Q 4001, Australia ap.woodley@student.qut.edu.au Abstract: XML information retrieval systems differ

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es

More information

CADIAL Search Engine at INEX

CADIAL Search Engine at INEX CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

A World Wide Web-based HCI-library Designed for Interaction Studies

A World Wide Web-based HCI-library Designed for Interaction Studies A World Wide Web-based HCI-library Designed for Interaction Studies Ketil Perstrup, Erik Frøkjær, Maria Konstantinovitz, Thorbjørn Konstantinovitz, Flemming S. Sørensen, Jytte Varming Department of Computing,

More information

21. Search Models and UIs for IR

21. Search Models and UIs for IR 21. Search Models and UIs for IR INFO 202-10 November 2008 Bob Glushko Plan for Today's Lecture The "Classical" Model of Search and the "Classical" UI for IR Web-based Search Best practices for UIs in

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Using Text Elements by Context to Display Search Results in Information Retrieval Systems Model and Research results

Using Text Elements by Context to Display Search Results in Information Retrieval Systems Model and Research results Using Text Elements by Context to Display Search Results in Information Retrieval Systems Model and Research results Offer Drori SHAAM Information Systems The Hebrew University of Jerusalem offerd@ {shaam.gov.il,

More information

This literature review provides an overview of the various topics related to using implicit

This literature review provides an overview of the various topics related to using implicit Vijay Deepak Dollu. Implicit Feedback in Information Retrieval: A Literature Analysis. A Master s Paper for the M.S. in I.S. degree. April 2005. 56 pages. Advisor: Stephanie W. Haas This literature review

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

Feature Selection for an n-gram Approach to Web Page Genre Classification

Feature Selection for an n-gram Approach to Web Page Genre Classification Feature Selection for an n-gram Approach to Web Page Genre Classification Jane E. Mason Michael Shepherd Jack Duffy Technical Report CS-2009-04 June 22, 2009 Faculty of Computer Science 6050 University

More information

Evaluating the Effectiveness of Term Frequency Histograms for Supporting Interactive Web Search Tasks

Evaluating the Effectiveness of Term Frequency Histograms for Supporting Interactive Web Search Tasks Evaluating the Effectiveness of Term Frequency Histograms for Supporting Interactive Web Search Tasks ABSTRACT Orland Hoeber Department of Computer Science Memorial University of Newfoundland St. John

More information

A New Measure of the Cluster Hypothesis

A New Measure of the Cluster Hypothesis A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer

More information

Using Text Learning to help Web browsing

Using Text Learning to help Web browsing Using Text Learning to help Web browsing Dunja Mladenić J.Stefan Institute, Ljubljana, Slovenia Carnegie Mellon University, Pittsburgh, PA, USA Dunja.Mladenic@{ijs.si, cs.cmu.edu} Abstract Web browsing

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

Searching the Evidence using EBSCOHost

Searching the Evidence using EBSCOHost CAMBRIDGE UNIVERSITY LIBRARY MEDICAL LIBRARY Supporting Literature Searching Searching the Evidence using EBSCOHost PsycINFO Use to search PsycINFO with a RAVEN login, (or CINAHL with an NHS ATHENS login)

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

Query Modifications Patterns During Web Searching

Query Modifications Patterns During Web Searching Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

Sec. 8.7 RESULTS PRESENTATION

Sec. 8.7 RESULTS PRESENTATION Sec. 8.7 RESULTS PRESENTATION 1 Sec. 8.7 Result Summaries Having ranked the documents matching a query, we wish to present a results list Most commonly, a list of the document titles plus a short summary,

More information

Context based Re-ranking of Web Documents (CReWD)

Context based Re-ranking of Web Documents (CReWD) Context based Re-ranking of Web Documents (CReWD) Arijit Banerjee, Jagadish Venkatraman Graduate Students, Department of Computer Science, Stanford University arijitb@stanford.edu, jagadish@stanford.edu}

More information

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

VIDEO SEARCHING AND BROWSING USING VIEWFINDER VIDEO SEARCHING AND BROWSING USING VIEWFINDER By Dan E. Albertson Dr. Javed Mostafa John Fieber Ph. D. Student Associate Professor Ph. D. Candidate Information Science Information Science Information Science

More information

Searching for journal articles on ATLA Databases

Searching for journal articles on ATLA Databases Searching for journal articles on ATLA Databases All members of the Gamble Library have access to ATLA Religion Database and ATLASerials. This allows access to thousands of journal articles and indexed

More information

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

Search Engine Architecture II

Search Engine Architecture II Search Engine Architecture II Primary Goals of Search Engines Effectiveness (quality): to retrieve the most relevant set of documents for a query Process text and store text statistics to improve relevance

More information

Txt2vz: a new tool for generating graph clouds

Txt2vz: a new tool for generating graph clouds Txt2vz: a new tool for generating graph clouds HIRSCH, L and TIAN, D Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/6619/

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Be able to define what a database is Be able to describe the strategies for developing an effective search

Be able to define what a database is Be able to describe the strategies for developing an effective search Helena VonVille, MLS, MPH Director University of Texas School of Public Health Library Fall 2012 Be able to define what a database is Be able to describe the strategies for developing an effective search

More information

This paper studies methods to enhance cross-language retrieval of domain-specific

This paper studies methods to enhance cross-language retrieval of domain-specific Keith A. Gatlin. Enhancing Cross-Language Retrieval of Comparable Corpora Through Thesaurus-Based Translation and Citation Indexing. A master s paper for the M.S. in I.S. degree. April, 2005. 23 pages.

More information

Chapter 3: Google Penguin, Panda, & Hummingbird

Chapter 3: Google Penguin, Panda, & Hummingbird Chapter 3: Google Penguin, Panda, & Hummingbird Search engine algorithms are based on a simple premise: searchers want an answer to their queries. For any search, there are hundreds or thousands of sites

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance

More information

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces A user study on features supporting subjective relevance for information retrieval interfaces Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S. (2006). Proc. 9th International Conference of Asian Digital Libraries

More information

Identification and Classification of A/E/C Web Sites and Pages

Identification and Classification of A/E/C Web Sites and Pages Construction Informatics Digital Library http://itc.scix.net/ paper w78-2002-34.content Theme: Title: Author(s): Institution(s): E-mail(s): Abstract: Keywords: Identification and Classification of A/E/C

More information

Combining Information Retrieval and Relevance Feedback for Concept Location

Combining Information Retrieval and Relevance Feedback for Concept Location Combining Information Retrieval and Relevance Feedback for Concept Location Sonia Haiduc - WSU CS Graduate Seminar - Jan 19, 2010 1 Software changes Software maintenance: 50-90% of the global costs of

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

A Model for Interactive Web Information Retrieval

A Model for Interactive Web Information Retrieval A Model for Interactive Web Information Retrieval Orland Hoeber and Xue Dong Yang University of Regina, Regina, SK S4S 0A2, Canada {hoeber, yang}@uregina.ca Abstract. The interaction model supported by

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Searching For Healthcare Information

Searching For Healthcare Information Searching For Healthcare Information Accessing the Databases Go to https://www.evidence.nhs.uk/ and select Journals and Databases. Click on Healthcare Databases Advanced Search (HDAS). You will then need

More information

CABI Training Materials. Ovid Silver Platter (SP) platform. Simple Searching of CAB Abstracts and Global Health KNOWLEDGE FOR LIFE.

CABI Training Materials. Ovid Silver Platter (SP) platform. Simple Searching of CAB Abstracts and Global Health KNOWLEDGE FOR LIFE. CABI Training Materials Ovid Silver Platter (SP) platform Simple Searching of CAB Abstracts and Global Health www.cabi.org KNOWLEDGE FOR LIFE Contents The OvidSP Database Selection Screen... 3 The Ovid

More information

Semantic Search in s

Semantic Search in  s Semantic Search in Emails Navneet Kapur, Mustafa Safdari, Rahul Sharma December 10, 2010 Abstract Web search technology is abound with techniques to tap into the semantics of information. For email search,

More information

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Question Answering Approach Using a WordNet-based Answer Type Taxonomy Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

Searching the Evidence in the Cochrane Library

Searching the Evidence in the Cochrane Library CAMBRIDGE UNIVERSITY LIBRARY MEDICAL LIBRARY Searching the Evidence Searching the Evidence in the Cochrane Library January 2014 (due for revision July 2014) Searching the Evidence 1. How to access The

More information

Utilizing a Common Language as a Generative Software Reuse Tool

Utilizing a Common Language as a Generative Software Reuse Tool Utilizing a Common Language as a Generative Software Reuse Tool Chris Henry and Stanislaw Jarzabek Department of Computer Science School of Computing, National University of Singapore 3 Science Drive,

More information

Lecture 7: Relevance Feedback and Query Expansion

Lecture 7: Relevance Feedback and Query Expansion Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Ronan Cummins Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk

More information

IBE101: Introduction to Information Architecture. Hans Fredrik Nordhaug 2008

IBE101: Introduction to Information Architecture. Hans Fredrik Nordhaug 2008 IBE101: Introduction to Information Architecture Hans Fredrik Nordhaug 2008 Objectives Defining IA Practicing IA User Needs and Behaviors The anatomy of IA Organizations Systems Labelling Systems Navigation

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

VISUAL RERANKING USING MULTIPLE SEARCH ENGINES

VISUAL RERANKING USING MULTIPLE SEARCH ENGINES VISUAL RERANKING USING MULTIPLE SEARCH ENGINES By Dennis Lim Thye Loon A REPORT SUBMITTED TO Universiti Tunku Abdul Rahman in partial fulfillment of the requirements for the degree of Faculty of Information

More information

Similarity search in multimedia databases

Similarity search in multimedia databases Similarity search in multimedia databases Performance evaluation for similarity calculations in multimedia databases JO TRYTI AND JOHAN CARLSSON Bachelor s Thesis at CSC Supervisor: Michael Minock Examiner:

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

Inverted Index for Fast Nearest Neighbour

Inverted Index for Fast Nearest Neighbour Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

A Parallel Computing Architecture for Information Processing Over the Internet

A Parallel Computing Architecture for Information Processing Over the Internet A Parallel Computing Architecture for Information Processing Over the Internet Wendy A. Lawrence-Fowler, Xiannong Meng, Richard H. Fowler, Zhixiang Chen Department of Computer Science, University of Texas

More information

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,

More information

How SPICE Language Modeling Works

How SPICE Language Modeling Works How SPICE Language Modeling Works Abstract Enhancement of the Language Model is a first step towards enhancing the performance of an Automatic Speech Recognition system. This report describes an integrated

More information

The Topic Specific Search Engine

The Topic Specific Search Engine The Topic Specific Search Engine Benjamin Stopford 1 st Jan 2006 Version 0.1 Overview This paper presents a model for creating an accurate topic specific search engine through a focussed (vertical)

More information

A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING

A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING Jianhua Wang, Ruixu Li Computer Science Department, Yantai University, Yantai, Shandong, China Abstract: Key words: Document clustering methods

More information

GIR experiements with Forostar at GeoCLEF 2007

GIR experiements with Forostar at GeoCLEF 2007 GIR experiements with Forostar at GeoCLEF 2007 Simon Overell 1, João Magalhães 1 and Stefan Rüger 2,1 1 Multimedia & Information Systems Department of Computing, Imperial College London, SW7 2AZ, UK 2

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion

More information

Enhancing E-Journal Access In A Digital Work Environment

Enhancing E-Journal Access In A Digital Work Environment Enhancing e-journal access in a digital work environment Foo, S. (2006). Singapore Journal of Library & Information Management, 34, 31-40. Enhancing E-Journal Access In A Digital Work Environment Schubert

More information

PERSONAL WEB REVISITATION BY CONTEXT AND CONTENT KEYWORDS WITH RELEVANCE FEEDBACK

PERSONAL WEB REVISITATION BY CONTEXT AND CONTENT KEYWORDS WITH RELEVANCE FEEDBACK PERSONAL WEB REVISITATION BY CONTEXT AND CONTENT KEYWORDS WITH RELEVANCE FEEDBACK Getting back to previously viewed web pages is a common yet uneasy task for users due to the large volume of personally

More information

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using

More information

OvidSP Quick Reference Guide

OvidSP Quick Reference Guide OvidSP Quick Reference Guide Select Resources On the Select a Database to Begin Searching page, select one resource by clicking on the database name link, or select several resources by clicking the checkbox

More information

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval

More information

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT

More information

Content Based Cross-Site Mining Web Data Records

Content Based Cross-Site Mining Web Data Records Content Based Cross-Site Mining Web Data Records Jebeh Kawah, Faisal Razzaq, Enzhou Wang Mentor: Shui-Lung Chuang Project #7 Data Record Extraction 1. Introduction Current web data record extraction methods

More information

Introduction. What do you know about web in general and web-searching in specific?

Introduction. What do you know about web in general and web-searching in specific? WEB SEARCHING Introduction What do you know about web in general and web-searching in specific? Web World Wide Web (or WWW, It is called a web because the interconnections between documents resemble a

More information

---(Slide 0)--- Let s begin our prior art search lecture.

---(Slide 0)--- Let s begin our prior art search lecture. ---(Slide 0)--- Let s begin our prior art search lecture. ---(Slide 1)--- Here is the outline of this lecture. 1. Basics of Prior Art Search 2. Search Strategy 3. Search tool J-PlatPat 4. Search tool PATENTSCOPE

More information

Information Retrieval. Information Retrieval and Web Search

Information Retrieval. Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

Ranking in a Domain Specific Search Engine

Ranking in a Domain Specific Search Engine Ranking in a Domain Specific Search Engine CS6998-03 - NLP for the Web Spring 2008, Final Report Sara Stolbach, ss3067 [at] columbia.edu Abstract A search engine that runs over all domains must give equal

More information

Information discovery and retrieval

Information discovery and retrieval Outline of today s lecture IMS2603 Lecture 13 1. Seeking information 2. Thinking about retrieval 3. Controlled vocabulary and natural language Information discovery and retrieval 4. Relevance 5. Other

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

Information discovery and retrieval

Information discovery and retrieval IMS2603 Lecture 13 Information discovery and retrieval Outline of today s lecture 1. Seeking information 2. Thinking about retrieval 3. Controlled vocabulary and natural language 4. Relevance 5. Other

More information

Chapter 1: The Cochrane Library Search Tour

Chapter 1: The Cochrane Library Search Tour Chapter : The Cochrane Library Search Tour Chapter : The Cochrane Library Search Tour This chapter will provide an overview of The Cochrane Library Search: Learn how The Cochrane Library new search feature

More information

EBSCOhost Web 6.0. User s Guide EBS 2065

EBSCOhost Web 6.0. User s Guide EBS 2065 EBSCOhost Web 6.0 User s Guide EBS 2065 6/26/2002 2 Table Of Contents Objectives:...4 What is EBSCOhost...5 System Requirements... 5 Choosing Databases to Search...5 Using the Toolbar...6 Using the Utility

More information

Program Synthesis. SWE 795, Spring 2017 Software Engineering Environments

Program Synthesis. SWE 795, Spring 2017 Software Engineering Environments Program Synthesis SWE 795, Spring 2017 Software Engineering Environments Today HW3 is due next week in class! Part 1 (Lecture)(~50 mins) Break! Part 2 (Discussion)(~60 mins) Discussion of readings Part

More information

Session 10: Information Retrieval

Session 10: Information Retrieval INFM 63: Information Technology and Organizational Context Session : Information Retrieval Jimmy Lin The ischool University of Maryland Thursday, November 7, 23 Information Retrieval What you search for!

More information

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Web Search Engine Question Answering

Web Search Engine Question Answering Web Search Engine Question Answering Reena Pindoria Supervisor Dr Steve Renals Com3021 07/05/2003 This report is submitted in partial fulfilment of the requirement for the degree of Bachelor of Science

More information

Summarizing Public Opinion on a Topic

Summarizing Public Opinion on a Topic Summarizing Public Opinion on a Topic 1 Abstract We present SPOT (Summarizing Public Opinion on a Topic), a new blog browsing web application that combines clustering with summarization to present an organized,

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

Searching the Evidence using EBSCOHost

Searching the Evidence using EBSCOHost CAMBRIDGE UNIVERSITY LIBRARY MEDICAL LIBRARY Supporting Literature Searching Searching the Evidence using EBSCOHost ATHENS CINAHL Use to search CINAHL with an NHS ATHENS login (or PsycINFO with University

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information