highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

Size: px
Start display at page:

Download "highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate"

Transcription

1 Searching Information Servers Based on Customized Proles Technical Report USC-CS Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California fshli, Abstract We investigate the eect of using customized proles to help searching relevant servers in Internet. Our experiments demonstrate that the use of customized proles with latent semantic indexing (LSI) technique can improve the performance of Internet searching. 1 Introduction When searching information in a retrieval system, people use dierent terms to describe their information needs. The retrieval system searches through its database and returns documents indexed with matching terms. Since a concept can be represented by a variety of terms, users may fail to obtain the information they require. This is called the vocabulary problem [1]. The vocabulary problem occurs not only in traditional information retrieval, but also in Internet resource discovery, where users seek relevant information servers to submit their queries. Previously, we proposed to use Latent Semantic Indexing (LSI) [2] to ameliorate the vocabulary problem in the Internet search [3]. Here, we expand the idea by integrating a customized prole with LSI to assist the searching. We demonstrate that customized proles can help a retrieval system to understand a user's terminology better, and thus improve the performance. 2 Background Originally LSI [2] was developed to address the vocabulary problem in Salton's Vector Space Model (VSM) [4] where documents and queries are represented as vectors of term frequencies or weights. It assumes some underlying semantic structure exists in the pattern of term usage across documents. To capture this information, LSI applies Singular Value Decomposition (SVD) to a term-document matrix representing a database and generates vectors of k (typically 100 to 300) orthogonal indexing dimensions, where each dimension represents a linearly independent concept. The decomposed vectors are used to represent both documents and terms in queries in the same semantic space, while their values indicate the degrees of association with the k underlying concepts. A query vector in LSI is the weighted sum of its component term vectors. For example, a p-term query is represented as the average sum of the p decomposed term vectors. To determine relevant documents, the query vector is compared with all document vectors, and those with the 1

2 highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate the concepts, not the exact terms used. Hence, LSI improves search performance by ameliorating the vocabulary problem. A prole (or user prole) is a collection of data, specied by users to reect their interests. It can be used as a lter to select new documents or information that match users' interests [6, 7, 8], or used to augment the query for improving retrieval eectiveness [9, 10]. Foltz and Dumais used LSI to lter new incoming documents based on user proles [8]. They compared new documents against users' word and document proles, and ranked them based on their similarities to the prole. For the word prole, users indicate words or phrases of interests, each is represented as a separate vector and compared with new documents using the standard vector and LSI vector methods. Similarly, each document in the document prole is expressed as a vector and compared to all new documents using the same matching methods. In their experiment, they found LSI-match with document prole has the best performance. Earlier, we proposed Two-Level LSI in the Internet environment [3], where a \directory of services" records the descriptions of information servers using LSI. A user sends his query to the directory of services which determines and ranks the information servers relevant to the user's request. The user employs the rankings when selecting the most relevant information servers to query directly. Here we investigate the use of customized proles in two-level LSI. In this research, a prole provides background information to the query. It could be a set of documents reecting a user's interests or a discipline's taxonomy representing a specialized knowledge. To distinguish these two types of prole, we call the former \user prole" and the latter \taxonomy" in the paper. Below, we compare the eect of merging taxonomy at the directory of services against expanding queries with user prole at the client site. 3 Experiment In [3], we showed that two-level LSI can outperform VSM in estimating server rankings in the Internet environment. Here we focus on the comparisons of using user prole and taxonomy with LSI. We generate three server rankings using (1) the original LSI, (2) LSI with user prole, and (3) LSI with taxonomy. In this experiment, a user prole is used to expand a user's query before sending to the directory of services, while a taxonomy is merged with server descriptions at the directory of services. Figure 1 shows the three processes. We use the standard CACM and MED document collections, for which queries and relevant judgments are available. We compute the rankings estimated by the three methods and calculate their rank-order coecient and accumulated recall. 3.1 Methodology We combine the documents from both the CACM and MED collections, and divide them into nine sub-collections, each representing a server's database. Notice that these documents may use the same terms for totally dierent meanings because they belong to two dierent disciplines (computer science and medicine). There could exist severe vocabulary problem in such environment. Documents in each database are indexed with terms occurring in the title and abstract but not on a stop list of 429 common words. While queries are written in natural language, terms in a query are used only if they do not appear on the same stop list and if they appear in at least one document. All indexed terms are stored in their original forms without stemming. Table 1 gives 2

3 Figure 1: The three ranking processes - (1) the original LSI, (2) LSI with user prole, and (3) LSI with taxonomy. the additional characteristics of our experiment. Number of documents 4237 Number of queries 64 Number of indexing terms Mean number of terms per document Mean number of terms per query Table 1: The characteristics of the test collection. In LSI ranking, we apply the single link clustering algorithm [11] to construct server descriptions. We cluster documents when their similarity is greater than a predened threshold. Each of the remaining documents forms a cluster of its own. Each cluster is represented by the mean vector of its component document vectors, and the server description is the set of its cluster vectors. The directory of services collects the server descriptions from all the servers, and determines the ranking using SVD for each user query. In this experiment, server descriptions are decomposed into vectors of 100 dimensions as suggested in Deerwester's LSI experiments [2]. The ranking is based on the cosine similarity between server descriptions and user query. 3

4 In LSI with user prole ranking, we select half of the relevant documents of a query to construct the user prole for that query. Since those documents have been judged relevant by the user, they can reect the interests of the user. Before sending a query to the directory of services, we expand it by adding the \prole vector", which is the centroid of all the document vectors in the prole. The directory of services applies typical LSI algorithm to rank servers for the remaining half of relevant documents. In LSI with taxonomy ranking, we generate \pseudo-documents" from the ACM taxonomy which contains a listing of computer science classication schemes [12]. We then merge these pseudodocuments with the server descriptions in the directory of services before applying LSI algorithm. We postulate that adding it as pseudo-documents may reinforce the computer science interpretation of the terms in the CACM documents. Therefore, it can help increase the likelihood that computer science rather than medical documents are returned from the new superset collection. Below, we use two methods to evaluate the rankings estimated by the above three approaches. Our criterion is to give high ranks to the servers that contain the most relevant documents. 3.2 Rank-Order Correlation To verify the estimated rankings, we generate a standard ranking (denoted as STD) by sorting servers based on their number of relevant documents excluding those used in the user prole. We calculate the Spearman rank-order correlation coecient (r s ) [13] to measure the closeness of STD and the estimated ranking. The r s ranges between?1 and 1. If two rankings are identical, r s = 1. If one ranking is the reverse of the other, r s =?1. The larger the r s, the closer the rankings. The r s coecient allows us to determine which of the above methods generates a ranking closest to that of STD. To compare the rankings generated using the original LSI (denoted as LSI), LSI with user prole (denoted as LSI-PRO), and LSI with taxonomy (denoted as LSI-TAX), we calculate their r s against STD for each query. Among the 64 samples, r s (LSI, STD) is larger than, equal to, and less than r s (LSI-TAX, STD) for 16, 19, and 29 times, respectively. This indicates when using indexing dimension 100, LSI with taxonomy generates a ranking closer to STD than without it for 29 out of 64 times, whereas the latter only has closer order for 16 out of 64 times. Similarly, r s (LSI-TAX, STD) is larger than, equal to, and less than r s (LSI-PRO, STD) for 18, 14, and 32 times, respectively. Therefore, LSI with user prole generates more closer rankings than with taxonomy. To measure the condence that LSI with user prole outperforms the other methods, we calculate the condence interval for proportion dened as follows [14]: Sample proportion = p = max[n 1; n2] ; n1 + n2 Condence interval for proportion = p z1? 2 s p(1? p) n1 + n2 ; where n1 is the number of times one method is better than the other, and n2 is the number of times it is worse. The z1? is the (1? 2 2 )-quantile of a unit normal variate. For 95% condence level, z1? = 1:960. If the condence interval does not include 0.5, we can say with 95% condence that 2 one method is superior to the other. For r s (LSI-PRO, STD) and r s (LSI-TAX, STD), their condence interval is (0.507, 0.773). Because it does not include 0.5, we can say with 95% condence that LSI with user prole is superior to LSI with taxonomy. Similarly, the condence interval for r s (LSI-TAX, 4

5 STD) and r s (LSI, STD) is (0.505, 0.784). Therefore, LSI with taxonomy is superior to LSI with 95% condence. From the two ranking comparisons, we conclude LSI with either user prole or taxonomy gives a better ranking than without it. Additionally, LSI with user prole performs better than with taxonomy. The reason could be that the user prole is query-specic which changes from query to query, while the taxonomy acts as a generic prole for all queries. Therefore, LSI with user prole gives more accurate results. 3.3 Accumulated Recall To measure the performance of using estimated server rankings, we calculate the \accumulated recall" for the top n out of total N servers in the ranking. Let rel i be the set of relevant documents and retr i the set of retrieved documents for a given query on server i. We dene: Document Recall, denoted as R d (n), is the ratio of the number of relevant documents retrieved in the top n servers over the number of relevant documents in all servers, R d (n) = P n i=1 jrel i \ retr i j P : N i=1 jrel ij Server Recall, denoted as R s (n), is the ratio of the number of the top n servers having relevant documents over the total number of servers having relevant documents, R s (n) = jfserver ijrel i 6= ;; 1 i ngj jfserver ijrel i 6= ;; 1 i N gj : Because the returned documents are determined by the query processing engine in each server, we assume all relevant documents are returned for simplicity. Table 2 shows the average document and server recalls as a function of number of servers for 64 queries retrieved on the test collection. In Table 2, LSI with user prole has the highest document recall except when the number of servers n is 6 and 7, and the original LSI has the lowest value except when n = 1 and 5. This means when retrieving the top 1, 2, 3, 4, 5, and 8 servers in the ranking estimated by LSI with user prole, we can get more relevant documents than LSI or LSI with taxonomy. If we retrieve the servers ranked by LSI only, we will obtain fewer relevant documents most of the time. This is consistent with LSI's lower rank-order correlation coecient. The average order of the nine document recalls for LSI, LSI-PRO, and LSI-TAX are 2.556, 1.222, and 1.889, respectively. Clearly, LSI with user prole performs best among the three methods. For server recall, both LSI-PRO and LSI-TAX get the rst places 4 out of 9 times. The average order for LSI, LSI-PRO, and LSI-TAX are 2.333, 1.556, and 1.667, respectively. Thus, LSI with user prole performs slightly better than with taxonomy, while both of them are much better than the original LSI. Therefore, users can get more relevant servers using the ranking order estimated by LSI with either user prole or taxonomy than without it. 4 Conclusions We proposed to use Deerwester's latent semantic indexing with customized proles to search and rank information servers in Internet. We conducted experiments on standard document collections 5

6 Recall n LSI LSI-PRO LSI-TAX (2) (1) (3) (3) (1) (2) (3) (1) (2) (3) (1) (2) R d (2) (1) (3) (3) (2) (1) (3) (2) (1) (3) (1) (2) (1) (1) (1) (2) (1) (3) (3) (2) (1) (3) (1) (2) (3) (2) (1) R s (1) (1) (3) (2) (3) (1) (3) (2) (1) (3) (1) (2) (1) (1) (1) Table 2: The average document recall (R d ) and server recall (R s ) as a function of number of servers (n) for 64 queries retrieved on the test collection. The numbers in parentheses indicate the order among the three methods for a given n. and compared the performance of LSI, LSI with user prole, and LSI with taxonomy using rankorder coecient and accumulated recall. The results show that LSI with user prole performs best, LSI with taxonomy second, and the original LSI third in estimating and ranking relevant servers for user queries. A customized prole provides background information to the query. In practice, novice users can use LSI with taxonomy to nd initial documents in a specic eld, then construct their own prole for further searching. Users having a variety of interests can use dierent prole for each query to get higher recall. As the number of Internet servers on Internet grows rapidly, we believe this technique can ameliorate the vocabulary problem and improve user's searching process. References [1] George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais, \The vocabulary problem in human-system communication", Communications of the ACM, vol. 30, no. 11, pp. 964{971, November [2] Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman, \Indexing by latent semantic analysis", Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391{407, September

7 [3] Shih-Hao Li and Peter B. Danzig, \Vocabulary problem in Internet resource discovery", in Proceedings of the Second International Workshop on Next Generation Information Technologies and Systems, Naharia, Israel, June 1995, pp. 139{145, Available from ftp://catarina.usc.edu/shli/ngits.ps.gz. [4] Gerard Salton and Michael J. McGill, Introduction to Modern Information Retrieval, McGraw- Hill Book Company, [5] Gerard Salton, Automatic Information Organization and Retrieval, McGraw-Hill Book Company, [6] K. H. Packer and D. Soergel, \The importance of SDI for current awareness in elds with severe scatter of information", Journal of the American Society for Information Science, vol. 30, no. 3, pp. 125{135, [7] Shoshnan Loeb, \Architecting personalized delivery of multimedia information", Communications of the ACM, vol. 35, no. 12, pp. 39{48, December [8] Peter W. Foltz and Susan T. Dumais, \Personalized information delivery: An analysis of information ltering methods", Communications of the ACM, vol. 35, no. 12, pp. 51{60, December [9] H. Grzelak and K. Kowalski, \Automatic construction of information queries", Information Processing and Management, vol. 19, pp. 381{389, [10] Robert R. Korfhage, \Query enhancement by user proles", in Proceedings of the Third Jonit BCS and ACM Symposium, 1984, pp. 111{122. [11] Ellen M. Voorhees, \Implementing agglomerative hierarchic clustering algorithms for use in document retrieval", Information Processing and Management, vol. 22, no. 6, pp. 465{476, [12] Jean E. Sammet and Anthony Ralston, \The new (1982) computing review classication system - nal version", Communications of the ACM, vol. 25, no. 1, pp. 13{25, January [13] Maurice Kendall and Jean D. Gibbons, Rank Correlation Methods, Edward Arnold, London, fth edition, [14] Raj Jain, The Art of Computer Systems Performance Analysis, John Wiley & Son, Inc., New York,

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California Two-Dimensional Visualization for Internet Resource Discovery Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781 fshli, danzigg@cs.usc.edu

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department

More information

Decomposition. November 20, Abstract. With the electronic storage of documents comes the possibility of

Decomposition. November 20, Abstract. With the electronic storage of documents comes the possibility of Latent Semantic Indexing via a Semi-Discrete Matrix Decomposition Tamara G. Kolda and Dianne P. O'Leary y November, 1996 Abstract With the electronic storage of documents comes the possibility of building

More information

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072

More information

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may

More information

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst

More information

A Content Vector Model for Text Classification

A Content Vector Model for Text Classification A Content Vector Model for Text Classification Eric Jiang Abstract As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications.

More information

Clustered SVD strategies in latent semantic indexing q

Clustered SVD strategies in latent semantic indexing q Information Processing and Management 41 (5) 151 163 www.elsevier.com/locate/infoproman Clustered SVD strategies in latent semantic indexing q Jing Gao, Jun Zhang * Laboratory for High Performance Scientific

More information

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907 The Game of Clustering Rowena Cole and Luigi Barone Department of Computer Science, The University of Western Australia, Western Australia, 697 frowena, luigig@cs.uwa.edu.au Abstract Clustering is a technique

More information

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

Routing and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany.

Routing and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany. Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers University of Dortmund, Germany

More information

Using Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department

Using Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department Using Statistical Properties of Text to Create Metadata Grace Crowder crowder@cs.umbc.edu Charles Nicholas nicholas@cs.umbc.edu Computer Science and Electrical Engineering Department University of Maryland

More information

ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ

ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ - 1 - ITERATIVE SEARCHING IN AN ONLINE DATABASE Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ 07962-1910 ABSTRACT An experiment examined how people use

More information

Information Retrieval. hussein suleman uct cs

Information Retrieval. hussein suleman uct cs Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information

More information

SPE Copyright 2010, Society of Petroleum Engineers

SPE Copyright 2010, Society of Petroleum Engineers SPE-132629 Intelligent model management and Visualization for smart oilfields Charalampos Chelmis 1, Amol Bakshi 3*, Burcu Seren 2, Karthik Gomadam 3, Viktor K. Prasanna 3 1 Department of Computer Science,

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , An Integrated Neural IR System. Victoria J. Hodge Dept. of Computer Science, University ofyork, UK vicky@cs.york.ac.uk Jim Austin Dept. of Computer Science, University ofyork, UK austin@cs.york.ac.uk Abstract.

More information

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic

More information

60-538: Information Retrieval

60-538: Information Retrieval 60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are

More information

Self-organization of very large document collections

Self-organization of very large document collections Chapter 10 Self-organization of very large document collections Teuvo Kohonen, Samuel Kaski, Krista Lagus, Jarkko Salojärvi, Jukka Honkela, Vesa Paatero, Antti Saarela Text mining systems are developed

More information

Collaborative Filtering based on User Trends

Collaborative Filtering based on User Trends Collaborative Filtering based on User Trends Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos Papadopoulos, and Yannis Manolopoulos Aristotle University, Department of Informatics, Thessalonii 54124,

More information

Information Retrieval: Retrieval Models

Information Retrieval: Retrieval Models CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Clustered Index Queries. Non-clustered Index Queries. Non-index Queries

Clustered Index Queries. Non-clustered Index Queries. Non-index Queries Query Classication in Multidatabase Systems Banchong Harangsri John Shepherd Anne Ngu School of Computer Science and Engineering, The University of New South Wales, Sydney 2052, AUSTRALIA. Email: fbjtong,jas,anneg@cse.unsw.edu.au

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

Content-based Dimensionality Reduction for Recommender Systems

Content-based Dimensionality Reduction for Recommender Systems Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender

More information

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example

More information

GlOSS: Text-Source Discovery over the Internet

GlOSS: Text-Source Discovery over the Internet GlOSS: Text-Source Discovery over the Internet LUIS GRAVANO Columbia University HÉCTOR GARCÍA-MOLINA Stanford University and ANTHONY TOMASIC INRIA Rocquencourt The dramatic growth of the Internet has created

More information

Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3

Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Department of Computer Science & Engineering, Gitam University, INDIA 1. binducheekati@gmail.com,

More information

A Balanced Term-Weighting Scheme for Effective Document Matching. Technical Report

A Balanced Term-Weighting Scheme for Effective Document Matching. Technical Report A Balanced Term-Weighting Scheme for Effective Document Matching Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 2 Union Street SE Minneapolis,

More information

Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Enhancing Internet Search Engines to Achieve Concept-based Retrieval Enhancing Internet Search Engines to Achieve Concept-based Retrieval Fenghua Lu 1, Thomas Johnsten 2, Vijay Raghavan 1 and Dennis Traylor 3 1 Center for Advanced Computer Studies University of Southwestern

More information

1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma

1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma MULTIPLE SUBSPACE ULV ALGORITHM AND LMS TRACKING S. HOSUR, A. H. TEWFIK, D. BOLEY University of Minnesota 200 Union St. S.E. Minneapolis, MN 55455 U.S.A fhosur@ee,tewk@ee,boley@csg.umn.edu ABSTRACT. The

More information

Siemens TREC-4 Report: Further Experiments with Database. Merging. Ellen M. Voorhees. Siemens Corporate Research, Inc.

Siemens TREC-4 Report: Further Experiments with Database. Merging. Ellen M. Voorhees. Siemens Corporate Research, Inc. Siemens TREC-4 Report: Further Experiments with Database Merging Ellen M. Voorhees Siemens Corporate Research, Inc. Princeton, NJ ellen@scr.siemens.com Abstract A database merging technique is a strategy

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

A Metadatabase System for Semantic Image Search by a Mathematical Model of Meaning

A Metadatabase System for Semantic Image Search by a Mathematical Model of Meaning A Metadatabase System for Semantic Image Search by a Mathematical Model of Meaning Yasushi Kiyoki, Takashi Kitagawa and Takanari Hayama Institute of Information Sciences and Electronics University of Tsukuba

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

Automated Clustering-Based Workload Characterization

Automated Clustering-Based Workload Characterization Automated Clustering-Based Worload Characterization Odysseas I. Pentaalos Daniel A. MenascŽ Yelena Yesha Code 930.5 Dept. of CS Dept. of EE and CS NASA GSFC Greenbelt MD 2077 George Mason University Fairfax

More information

Assessing the Impact of Sparsification on LSI Performance

Assessing the Impact of Sparsification on LSI Performance Accepted for the Grace Hopper Celebration of Women in Computing 2004 Assessing the Impact of Sparsification on LSI Performance April Kontostathis Department of Mathematics and Computer Science Ursinus

More information

Published in A R DIGITECH

Published in A R DIGITECH IMAGE RETRIEVAL USING LATENT SEMANTIC INDEXING Rachana C Patil*1, Imran R. Shaikh*2 *1 (M.E Student S.N.D.C.O.E.R.C, Yeola) *2(Professor, S.N.D.C.O.E.R.C, Yeola) rachanap4@gmail.com*1, imran.shaikh22@gmail.com*2

More information

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Information Processing and Management 43 (2007) 1044 1058 www.elsevier.com/locate/infoproman Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Anselm Spoerri

More information

Data Distortion for Privacy Protection in a Terrorist Analysis System

Data Distortion for Privacy Protection in a Terrorist Analysis System Data Distortion for Privacy Protection in a Terrorist Analysis System Shuting Xu, Jun Zhang, Dianwei Han, and Jie Wang Department of Computer Science, University of Kentucky, Lexington KY 40506-0046, USA

More information

Clustering Startups Based on Customer-Value Proposition

Clustering Startups Based on Customer-Value Proposition Clustering Startups Based on Customer-Value Proposition Daniel Semeniuta Stanford University dsemeniu@stanford.edu Meeran Ismail Stanford University meeran@stanford.edu Abstract K-means clustering is a

More information

The Semantic Conference Organizer

The Semantic Conference Organizer 34 The Semantic Conference Organizer Kevin Heinrich, Michael W. Berry, Jack J. Dongarra, Sathish Vadhiyar University of Tennessee, Knoxville, USA CONTENTS 34.1 Background... 571 34.2 Latent Semantic Indexing...

More information

Web personalization using Extended Boolean Operations with Latent Semantic Indexing

Web personalization using Extended Boolean Operations with Latent Semantic Indexing Web personalization using Extended Boolean Operations with Latent Semantic Indexing Preslav Nakov Bulgaria, Sofia, Studentski grad. bl.8/room 723 (preslav@rila.bg) Key words: Information Retrieval and

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN ICED 01 GLASGOW, AUGUST 21-23, 2001

INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN ICED 01 GLASGOW, AUGUST 21-23, 2001 INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN ICED 01 GLASGOW, AUGUST 21-23, 2001 AUTOMATIC COMPOSITION OF XML DOCUMENTS TO EXPRESS DESIGN INFORMATION NEEDS Andy Dong, Shuang Song, Jialong Wu, and Alice

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

NASA Ames Research Center. user groups. Information preferences of specic queries are

NASA Ames Research Center.  user groups. Information preferences of specic queries are Learning Subjective Relevance to Facilitate Information Access James R. Chen & Nathalie Mathe y NASA Ames Research Center Moet Field, CA 94035-1000 jchen@ptolemy.arc.nasa.gov, mathe@ptolemy.arc.nasa.gov

More information

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University of Maryland, College Park, MD 20742 oard@glue.umd.edu

More information

3. Information Organization {and,or,vs} Search

3. Information Organization {and,or,vs} Search 1 of 36 8/31/2006 3:14 PM 3. Information Organization {and,or,vs} Search IS 202-5 September 2006 Bob Glushko 2 of 36 8/31/2006 3:14 PM Plan for IO & IR Lecture #3 The Information Life Cycle "Search"!=

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation

More information

Concept Based Search Using LSI and Automatic Keyphrase Extraction

Concept Based Search Using LSI and Automatic Keyphrase Extraction Concept Based Search Using LSI and Automatic Keyphrase Extraction Ravina Rodrigues, Kavita Asnani Department of Information Technology (M.E.) Padre Conceição College of Engineering Verna, India {ravinarodrigues

More information

A Semi-Discrete Matrix Decomposition for Latent. Semantic Indexing in Information Retrieval. December 5, Abstract

A Semi-Discrete Matrix Decomposition for Latent. Semantic Indexing in Information Retrieval. December 5, Abstract A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval Tamara G. Kolda and Dianne P. O'Leary y December 5, 1996 Abstract The vast amount of textual information available

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,

More information

Retrieving Model for Design Patterns

Retrieving Model for Design Patterns Retrieving Model for Design Patterns 51 Retrieving Model for Design Patterns Sarun Intakosum and Weenawadee Muangon, Non-members ABSTRACT The purpose of this research is to develop a retrieving model for

More information

Research on outlier intrusion detection technologybased on data mining

Research on outlier intrusion detection technologybased on data mining Acta Technica 62 (2017), No. 4A, 635640 c 2017 Institute of Thermomechanics CAS, v.v.i. Research on outlier intrusion detection technologybased on data mining Liang zhu 1, 2 Abstract. With the rapid development

More information

TREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood

TREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System Paul Thompson Howard Turtle Bokyung Yang James Flood West Publishing Company Eagan, MN 55123 1 Introduction The WIN retrieval engine

More information

Object classes. recall (%)

Object classes. recall (%) Using Genetic Algorithms to Improve the Accuracy of Object Detection Victor Ciesielski and Mengjie Zhang Department of Computer Science, Royal Melbourne Institute of Technology GPO Box 2476V, Melbourne

More information

Performance Measures for Multi-Graded Relevance

Performance Measures for Multi-Graded Relevance Performance Measures for Multi-Graded Relevance Christian Scheel, Andreas Lommatzsch, and Sahin Albayrak Technische Universität Berlin, DAI-Labor, Germany {christian.scheel,andreas.lommatzsch,sahin.albayrak}@dai-labor.de

More information

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Multimodal Medical Image Retrieval based on Latent Topic Modeling Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath

More information

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System Takashi Yukawa Nagaoka University of Technology 1603-1 Kamitomioka-cho, Nagaoka-shi Niigata, 940-2188 JAPAN

More information

2 Partitioning Methods for an Inverted Index

2 Partitioning Methods for an Inverted Index Impact of the Query Model and System Settings on Performance of Distributed Inverted Indexes Simon Jonassen and Svein Erik Bratsberg Abstract This paper presents an evaluation of three partitioning methods

More information

1 Introduction The history of information retrieval may go back as far as According to Maron[7], 1948 signies three important events. The rst is

1 Introduction The history of information retrieval may go back as far as According to Maron[7], 1948 signies three important events. The rst is The MANICURE Document Processing System Kazem Taghva, Allen Condit, Julie Borsack, John Kilburg, Changshi Wu, and Je Gilbreth Technical Report 95-02 Information Science Research Institute University of

More information

Very Fast Image Retrieval

Very Fast Image Retrieval Very Fast Image Retrieval Diogo André da Silva Romão Abstract Nowadays, multimedia databases are used on several areas. They can be used at home, on entertainment systems or even in professional context

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,

More information

Favorites-Based Search Result Ordering

Favorites-Based Search Result Ordering Favorites-Based Search Result Ordering Ben Flamm and Georey Schiebinger CS 229 Fall 2009 1 Introduction Search engine rankings can often benet from knowledge of users' interests. The query jaguar, for

More information

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T. Document Image Restoration Using Binary Morphological Filters Jisheng Liang, Robert M. Haralick University of Washington, Department of Electrical Engineering Seattle, Washington 98195 Ihsin T. Phillips

More information

IRCE at the NTCIR-12 IMine-2 Task

IRCE at the NTCIR-12 IMine-2 Task IRCE at the NTCIR-12 IMine-2 Task Ximei Song University of Tsukuba songximei@slis.tsukuba.ac.jp Yuka Egusa National Institute for Educational Policy Research yuka@nier.go.jp Masao Takaku University of

More information

Recommendation System for Location-based Social Network CS224W Project Report

Recommendation System for Location-based Social Network CS224W Project Report Recommendation System for Location-based Social Network CS224W Project Report Group 42, Yiying Cheng, Yangru Fang, Yongqing Yuan 1 Introduction With the rapid development of mobile devices and wireless

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Leveraging Transitive Relations for Crowdsourced Joins*

Leveraging Transitive Relations for Crowdsourced Joins* Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,

More information

Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis

Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis Ayman Farahat Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94304 ayman.farahat@gmail.com Francine Chen

More information

Video Representation. Video Analysis

Video Representation. Video Analysis BROWSING AND RETRIEVING VIDEO CONTENT IN A UNIFIED FRAMEWORK Yong Rui, Thomas S. Huang and Sharad Mehrotra Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign

More information

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 Automatic New Topic Identification in Search Engine Transaction Log

More information

Document Clustering using Concept Space and Cosine Similarity Measurement

Document Clustering using Concept Space and Cosine Similarity Measurement 29 International Conference on Computer Technology and Development Document Clustering using Concept Space and Cosine Similarity Measurement Lailil Muflikhah Department of Computer and Information Science

More information

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Kwangcheol Shin 1, Sang-Yong Han 1, and Alexander Gelbukh 1,2 1 Computer Science and Engineering Department, Chung-Ang University,

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

An enhanced similarity measure for utilizing site structure in web personalization systems

An enhanced similarity measure for utilizing site structure in web personalization systems University of Wollongong Research Online University of Wollongong in Dubai - Papers University of Wollongong in Dubai 2008 An enhanced similarity measure for utilizing site structure in web personalization

More information

Document Clustering in Reduced Dimension Vector Space

Document Clustering in Reduced Dimension Vector Space Document Clustering in Reduced Dimension Vector Space Kristina Lerman USC Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292 Email: lerman@isi.edu Abstract Document clustering is

More information

Distributed Information Retrieval using LSI. Markus Watzl and Rade Kutil

Distributed Information Retrieval using LSI. Markus Watzl and Rade Kutil Distributed Information Retrieval using LSI Markus Watzl and Rade Kutil Abstract. Latent semantic indexing (LSI) is a recently developed method for information retrieval (IR). It is a modification of the

More information

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Collaborative Filtering using Euclidean Distance in Recommendation Engine Indian Journal of Science and Technology, Vol 9(37), DOI: 10.17485/ijst/2016/v9i37/102074, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Collaborative Filtering using Euclidean Distance

More information

Networks for Control. California Institute of Technology. Pasadena, CA Abstract

Networks for Control. California Institute of Technology. Pasadena, CA Abstract Learning Fuzzy Rule-Based Neural Networks for Control Charles M. Higgins and Rodney M. Goodman Department of Electrical Engineering, 116-81 California Institute of Technology Pasadena, CA 91125 Abstract

More information

The Effect of Word Sampling on Document Clustering

The Effect of Word Sampling on Document Clustering The Effect of Word Sampling on Document Clustering OMAR H. KARAM AHMED M. HAMAD SHERIN M. MOUSSA Department of Information Systems Faculty of Computer and Information Sciences University of Ain Shams,

More information

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca The MANICURE Document Processing System Kazem Taghva, Allen Condit, Julie Borsack, John Kilburg, Changshi Wu, and Je Gilbreth Information Science Research Institute University of Nevada, Las Vegas ABSTRACT

More information

Interme diate DNS. Local browse r. Authorit ative ... DNS

Interme diate DNS. Local browse r. Authorit ative ... DNS WPI-CS-TR-00-12 July 2000 The Contribution of DNS Lookup Costs to Web Object Retrieval by Craig E. Wills Hao Shang Computer Science Technical Report Series WORCESTER POLYTECHNIC INSTITUTE Computer Science

More information

Contextual Search Using Ontology-Based User Profiles Susan Gauch EECS Department University of Kansas Lawrence, KS

Contextual Search Using Ontology-Based User Profiles Susan Gauch EECS Department University of Kansas Lawrence, KS Vishnu Challam Microsoft Corporation One Microsoft Way Redmond, WA 9802 vishnuc@microsoft.com Contextual Search Using Ontology-Based User s Susan Gauch EECS Department University of Kansas Lawrence, KS

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

FEATURE EXTRACTION TECHNIQUES FOR IMAGE RETRIEVAL USING HAAR AND GLCM

FEATURE EXTRACTION TECHNIQUES FOR IMAGE RETRIEVAL USING HAAR AND GLCM FEATURE EXTRACTION TECHNIQUES FOR IMAGE RETRIEVAL USING HAAR AND GLCM Neha 1, Tanvi Jain 2 1,2 Senior Research Fellow (SRF), SAM-C, Defence R & D Organization, (India) ABSTRACT Content Based Image Retrieval

More information

Probabilistic Learning Approaches for Indexing and Retrieval with the. TREC-2 Collection

Probabilistic Learning Approaches for Indexing and Retrieval with the. TREC-2 Collection Probabilistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection Norbert Fuhr, Ulrich Pfeifer, Christoph Bremkamp, Michael Pollmann University of Dortmund, Germany Chris Buckley

More information

Combining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web

Combining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web BU CS TR98-004. To appear in IEEE Workshop on Content-based Access of Image and Video Libraries, June 1998. Combining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web Marco

More information

Workload Characterization Techniques

Workload Characterization Techniques Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

Development of Search Engines using Lucene: An Experience

Development of Search Engines using Lucene: An Experience Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 18 (2011) 282 286 Kongres Pengajaran dan Pembelajaran UKM, 2010 Development of Search Engines using Lucene: An Experience

More information

perform well on paths including satellite links. It is important to verify how the two ATM data services perform on satellite links. TCP is the most p

perform well on paths including satellite links. It is important to verify how the two ATM data services perform on satellite links. TCP is the most p Performance of TCP/IP Using ATM ABR and UBR Services over Satellite Networks 1 Shiv Kalyanaraman, Raj Jain, Rohit Goyal, Sonia Fahmy Department of Computer and Information Science The Ohio State University

More information

Indexing by Latent Semantic Analysis

Indexing by Latent Semantic Analysis Indexing by Latent Semantic Analysis Scott Deerwester Center for Information and Language Studies, University of Chicago, Chicago, IL 60637 Susan T. Dumais*, George W. Furnas, and Thomas K. Landauer Bell

More information

Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages

Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages Chirag Shah Dept. of CSE IIT Madras Chennai - 600036 Tamilnadu, India. chirag@speech.iitm.ernet.in A. Nayeemulla Khan Dept. of CSE

More information