Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California
|
|
- Tyrone Brooks
- 5 years ago
- Views:
Transcription
1 Two-Dimensional Visualization for Internet Resource Discovery Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California fshli, Abstract Traditional information retrieval systems return documents in a list, where documents are sorted according to their publication dates, titles, or similarities to the user query. Users select interested documents by searching them through the returned list. In a distributed environment, documents are returned from more than one information server. A simple document list is not ecient to present a large volume data to the users. In this paper, we propose a two-dimensional visualization scheme for Internet resource discovery. Our method displays data by clusters according to their cross-similarities and still retains the rankings with respect to the user query. In addition, it provides a customized view that arranges data in favor of user's preference terms. Index Terms - information retrieval, resource discovery, similarity measure, visualization. 1 Introduction Traditional information retrieval systems return documents in lists or sets, where documents are sorted according to their publication dates, title alphabets, or computed similarities to the user query. In a distributed environment, documents are returned from more than one information server, and a simple document list is not ecient to present a large volume of data to the users. One solution to this problem is to present data using a two-dimensional graphical display. By arranging documents in a two-dimensional space based on their inter-relationships, users can more easily identify topics and browse topical clusters. In [1], Gower and Digby introduce several methods, such as principal components analysis, biplot, and correspondence analysis, for expressing complex relationships in two dimensions. These methods use Singular Value Decomposition (SVD) to transform documents into one or more matrices that carry the \compressed" relationships of the original data. In Information Agents [2], Cybenko et. al use Self Organizing Maps (SOM) [3] to map multi-dimensional data onto a two-dimensional space. SOM is based on neural networks and needs iterative training and adjustments. In this paper, we propose a two-dimensional visualization scheme based on Latent Semantic Indexing (LSI) [4] for Internet resource discovery. Our method displays data by clusters according to their cross-similarities and ranks documents with respect to the user query. In addition, it provides a customized view that arranges data in favor of user's preference terms. Below, we introduce LSI and describe our method in Internet resource discovery. 1
2 2 Latent Semantic Indexing LSI was originally developed for solving the vocabulary problem [5], which states that people with dierent backgrounds or in dierent contexts describe information dierently. LSI assumes there is some underlying semantic structure in the pattern of term usage across documents and uses SVD to capture this structure. Our goal is to present this structure instead of a simple ranked list to the users. In LSI, documents are represented as vectors of term frequencies following Salton's wellestablished Vector Space Model (VSM) [6]. An entire database is represented as a term-document matrix, where the rows and columns indicate the terms and documents, respectively. To capture the semantic structure among documents, LSI applies SVD to this matrix and generates vectors of k (typically 100 to 300) orthogonal indexing dimensions, where each dimension represents a linearly independent concept. The decomposed vectors are used to represent both documents and terms in the same semantic space, while their values indicate the degrees of association with the k underlying concepts. Because k is chosen much smaller than the number of documents and terms in the database, the decomposed vectors are represented in a compressed dimensional space and are not independent. Therefore, two documents can be relevant without having common terms but with common concepts. Figure 1 shows SVD applied to a term-document matrix and Example 1 illustrates the document representation before and after SVD. Figure 1: SVD applies to an m n term-document matrix, where m and n are the numbers of terms and documents in the database, and k is the dimensionality of the SVD representation. Example 1 Let d i (1 i 5), and t j an information system, where (1 j 8) be a set of documents and their associated terms in d 1 = ft 1 ; t 3 ; t 4 ; t 4 g; d 2 = ft 1 ; t 2 ; t 2 ; t 3 g; d 3 = ft 2 ; t 3 ; t 4 g; d 4 = ft 6 ; t 7 ; t 8 g; d 5 = ft 5 ; t 6 ; t 6 ; t 7 g: 2
3 Table 1 shows their vector representations in VSM (i.e. before SVD) and LSI (i.e. after SVD). Figure 2 shows the two-dimensional plot of the decomposed document and term vectors in LSI. document term VSM LSI description (t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 ) (dim1 dim2) d 1 t 1 t 3 t 4 t ?0.508 d 2 t 1 t 2 t 2 t d 3 t 2 t 3 t d 4 t 6 t 7 t ? d 5 t 5 t 6 t 6 t ? Table 1: The vector representations of documents d i (1 i 5) in VSM (i.e. before SVD) and LSI (i.e. after SVD). The \(t 1 : : : t 8 )" and \(dim1 dim2)" columns show the vectors in VSM and LSI, respectively D Plot of Terms and Documents d5 t6 Dimension cluster A d4 t7 t5 t8 t1 t3 cluster B t2 t4 d3 d2 d Dimension 1 Figure 2: Two-dimensional plot of decomposed vectors in Example 1, where documents d i (1 i 5) and terms t j (1 j 8) are represented as \" and \+", respectively. The above example shows that documents having a number of common terms are close to each other as well as their common terms. In Figure 2, we can roughly see two clusters, A and B. In cluster A, documents d 1, d 2, and d 3 are close to each other because each pair of them share t 1, t 2, t 3, or t 4. Similarly, d 4 and d 5 are close to each other because both of them have t 6 and t 7. To present documents and terms in this way, we can easily see their relationships by x?y coordinates. 2 3
4 3 Internet Resource Discovery In a distributed environment, people search information by sending requests to associated information servers using a client-server model (Figure 3(a)). In this approach, the client needs to know the server's name or address before sending a request. In the Internet where thousands of servers provide information, it becomes dicult and inecient to search all the servers manually. (a) (b) Figure 3: (a) Client-Server Model. (b) Client-Directory-Server Model. One step to solve this problem is to keep a \directory of services" that records the description or summary of each information server. A user sends his query to the directory of services which determines and ranks the information servers relevant to the user's request. The user employs the rankings when selecting the information servers to query directly. We call this the client-directoryserver model (Figure 3(b)). Internet resource discovery services [7], such as Archie [8], WAIS [9], WWW [10], Gopher [11], and Indie [12], allow users to retrieve information throughout the Internet. All of the above systems provide services similar to the client-directory-server model. For example, Archie, WAIS, and Indie support a global index like the directory of services in their systems. WWW and Gopher do not have a global index by themselves. Instead, they have an added-on indexing scheme built by other tools, such as Harvest [13], WWWW [14] for WWW and Veronica [15] for Gopher. 4 Visualization in Client-Directory-Server Model In Internet resource discovery, users search data from thousands of information servers. A simple ranked list is inecient to present a large volume data to the users. To solve this problem, we propose a LSI-based two-dimensional visualization scheme. Our method presents documents in a two-dimensional space based on their cross-similarities. Documents having a number of common terms are placed close to each other as well as their common terms. In addition, documents are colored based on their similarities to the user query. Documents with higher similarities are painted with darker colors or gray-levels. The darkest document in a cluster is used as the representative document of this cluster. The documents with similarities below a certain value are colorless. By doing this, relevant documents can be grouped together and still show their individual similarities to the query. Users can browse the common terms or representative documents and decide whether to see the details. In contrast, if returned documents are presented in a ranked list, the rst and second documents may have identical similarity values with the query but cover completely dierent topics. If the user wants to focus on the second topic, he still needs to search the whole list to nd other similar ones. 4
5 To implement our visualization method, we collect all the returned documents at the client site, generate a term-document matrix out of them, and apply SVD to this matrix. The resulting terms and documents are plotted on a two-dimensional graphical display and presented to the users. Similarly, the relevant servers returned by the directory of services are displayed in the same manner. Figure 4 shows the architecture of our proposed method in the client-directory-server environment. (a) (b) Figure 4: Two-dimensional visualization for (a) returned servers from the directory of services, and (b) returned documents from relevant servers in the client-directory-server environment. The returned data are colored based on their similarities to the user query. Because users have their own preference terms or taxonomies, displaying data in a customized view can assist users in searching and browsing documents. To achieve this, we rearrange returned data by merging them with a user prole that consists of a set of documents (called pseudo documents) selected by users to represent their interests. For each user prole, we gener- 5
6 ate a \term-prole" matrix like the term-document matrix. We merge it with the term-document matrix of the returned documents and apply SVD to them. The result is a customized view that contains the decomposed term and document vectors arranged in favor of the terminology used by the merged user prole. To scale to the number of returned documents, we replicate the pseudo documents or increase their weights in the merged matrix. Below, we dene the merge operation and demonstrate the customization in Example 2. Denition 1 Let A and B be two term-document matrices, where A consists of m 1 terms (rows) and n 1 documents (columns), and B consists of m 2 terms (rows) and n 2 documents (columns). Let T A and T B be the sets of terms in A and B, respectively. If A merges B is equal to C, denoted as A B = C, then C is a term-document matrix consisting of m terms (rows) and n documents (columns), where n = n 1 + n 2 and m = jt A [ T B j. Example 2 Let p 1 and p 2 be two pseudo documents in a user prole, where p 1 = ft 5 ; t 8 ; t 8 g; p 2 = ft 1 ; t 1 ; t 3 g: We generate a term-prole matrix from p 1 and p 2 and merge it with the original term-document matrix in Example 1 (see Figure 5). Figure 6 shows the two-dimensional plot of the merged matrix after SVD. d 1 d 2 d 3 d 4 d 5 t t t t t t t t p 1 p 2 t t t t = d 1 d 2 d 3 d 4 d 5 p 1 p 2 t t t t t t t t Figure 5: The term-document matrix in Example 1 merges with the termpseudo-document matrix generated from a user prole. To examine the eect of merging a user prole, we calculate the Euclidean distance between document d i and d j in each cluster before and after the merging. Table 2 and Figure 7 show the distances and two-dimensional coordinates of the decomposed document vectors before and after merging with the user prole. 2 In Figure 6, the documents and terms in each cluster are closer to each other than those in Figure 2. The distance changes between documents before and after the merging can be easily seen 6
7 1 2-D Plot of Terms and Documents Dimension d5 d4 p1 t5 t6 t8 t7 d1 p2 d2 d3 t1 t2 t3 t Dimension 1 Figure 6: Two-dimensional plot of the term-document matrix after merging with a user prole. The pseudo documents p 1 and p 2 are shown as \". The documents d 0 (1 i i 5) and terms t0 j (1 j 8) are represented as \" and \+", respectively. document before after change distance merging merging dierence (d 1 ; d 2 ) ?100.00% (d 1 ; d 3 ) ?69.87% (d 2 ; d 3 ) ?70.20% (d 4 ; d 5 ) ?48.53% Table 2: Document distances before and after merging with a user prole. The \change dierence" column shows the distance increasing (\+") or decreasing (\?") rates. 7
8 1 2-D Plot of Terms and Documents d d4 p1 d5 d4 d2 Dimension d3 d1 d2 d3 p d Dimension 1 Figure 7: The documents in two-dimensional space when merging with the a prole. The pseudo documents p 1 and p 2 are shown as \". The documents before the merging are represented by d i (1 i 5), shown as \", and linked by dashed lines. The documents after the merging are represented by (1 i 5), shown as \", and linked by solid lines. d 0 i 8
9 in Figure 7. In Table 2, the distances between documents d 1, d 2, and d 3 all decrease. These changes are because pseudo document p 2 contains both t 1 and t 3, which increases the correlation between any document containing either t 1 (such as d 1 and d 2 ) or t 3 (such as d 1, d 2, and d 3 ). Similarly, the distance between d 4 and d 5 decreases due to p 1 's containing t 5 and t 8. From the example above, we can see the distance changes reecting the correlations between terms and documents in the semantic space. When merging with a user prole, those documents having common terms with pseudo documents move toward each other. 5 Summary In Internet resource discovery, a simple ranked list is inecient to present a large volume data to the users. In this paper, we have proposed a two-dimensional visualization scheme based on Deerwester's latent semantic indexing. Our method displays data according to their cross-similarities and still retains the rankings with respect to the query. We believe integrating two-dimensional visualization with traditional information retrieval techniques, such as query reformation and relevance feedback, can greatly improve user's searching process. References [1] J. C. Gower and P. G. N. Digby, \Expressing complex relationships in two dimensions," in Interpreting Multivariate Data (V. Barnett, ed.), pp. 83{118, John Wiley & Son, INC., [2] G. Cybenko, Y. Wu, A. Khrabrov, and R. Gray, \Information agents as organizers," in CIKM Workshop on Intelligent Information Agents, December [3] T. Kohonen, \The self organizing map," Proceedings of the IEEE Transactions on Computers, vol. 78, no. 9, pp. 1464{1480, [4] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, \Indexing by latent semantic analysis," Journal of the American Society for Information Science, vol. 41, pp. 391{407, September [5] G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais, \The vocabulary problem in human-system communication," Communications of the ACM, vol. 30, pp. 964{971, November [6] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. McGraw-Hill Book Company, [7] K. Obraczka, P. B. Danzig, and S.-H. Li, \Internet resource discovery services," Computer, vol. 26, pp. 8{22, September [8] A. Emtage and P. Deutsch, \Archie: An electronic directory service for the internet," in Proceedings of the Winter 1992 USENIX Conference, pp. 93{110, [9] B. Kahle and A. Medlar, \An information system for corporate users: Wide Area Information Servers," ConneXions { The Interoperability Report, vol. 5, no. 11, pp. 2{9,
10 [10] T. Berners-Lee, R. Cailliau, J.-F. Gro, and B. Pollermann, \World-Wide Web: The information universe," Electronic Networking: Research, Applications and Policy, vol. 1, no. 2, pp. 52{58, [11] M. McCahill, \The internet gopher protocol: A distributed server information system," ConneXions { The Interoperability Report, vol. 6, pp. 10{14, July [12] P. B. Danzig, S.-H. Li, and K. Obraczka, \Distributed indexing of autonomous internet services," Computing Systems, vol. 5, no. 4, pp. 433{459, [13] C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and M. F. Schwartz, \Harvest: A scalable, customizable discovery and access system," Technical Report CU-CS , University of Colorado, [14] O. A. McBryan, \GENVL and WWWW: Tools for taming the Web," in Proceedings of the First International World-Wide Web Conference, May [15] S. Foster, \About the Veronica service," November Electronic bulletin board posting on the comp.infosystems.gopher newsgroup. 10
highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate
Searching Information Servers Based on Customized Proles Technical Report USC-CS-96-636 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California
More informationMinoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University
Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University
More informationInternet Resource Discovery Services. Peter B. Danzig, Katia Obraczka and Shih-Hao Li
Internet Resource Discovery Services Peter B. Danzig, Katia Obraczka and Shih-Hao Li Computer Science Department University of Southern California Los Angeles, California 90089-0781 danzig@usc.edu 213-740-4780
More informationLRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier
LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072
More informationDecomposition. November 20, Abstract. With the electronic storage of documents comes the possibility of
Latent Semantic Indexing via a Semi-Discrete Matrix Decomposition Tamara G. Kolda and Dianne P. O'Leary y November, 1996 Abstract With the electronic storage of documents comes the possibility of building
More informationITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ
- 1 - ITERATIVE SEARCHING IN AN ONLINE DATABASE Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ 07962-1910 ABSTRACT An experiment examined how people use
More informationInformation Retrieval. hussein suleman uct cs
Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information
More informationJune 15, Abstract. 2. Methodology and Considerations. 1. Introduction
Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may
More informationInternet Resource Discovery Services. Katia Obraczka, Peter B. Danzig and Shih-Hao Li. phone: (213) fax: (213)
Internet Resource Discovery Services Katia Obraczka, Peter B. Danzig and Shih-Hao Li Computer Science Department University of Southern California Los Angeles, California 90089-0781 kobraczk,danzig,shli@usc.edu
More informationSelf-organization of very large document collections
Chapter 10 Self-organization of very large document collections Teuvo Kohonen, Samuel Kaski, Krista Lagus, Jarkko Salojärvi, Jukka Honkela, Vesa Paatero, Antti Saarela Text mining systems are developed
More informationUsing Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department
Using Statistical Properties of Text to Create Metadata Grace Crowder crowder@cs.umbc.edu Charles Nicholas nicholas@cs.umbc.edu Computer Science and Electrical Engineering Department University of Maryland
More informationPublished in A R DIGITECH
IMAGE RETRIEVAL USING LATENT SEMANTIC INDEXING Rachana C Patil*1, Imran R. Shaikh*2 *1 (M.E Student S.N.D.C.O.E.R.C, Yeola) *2(Professor, S.N.D.C.O.E.R.C, Yeola) rachanap4@gmail.com*1, imran.shaikh22@gmail.com*2
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationA Content Vector Model for Text Classification
A Content Vector Model for Text Classification Eric Jiang Abstract As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications.
More informationIMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL
IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department
More informationDiscover: A Resource Discovery System based on Content. Routing
Discover: A Resource Discovery System based on Content Routing Mark A. Sheldon 1, Andrzej Duda 2, Ron Weiss 1, David K. Giord 1 1 Programming Systems Research Group, MIT Laboratory for Computer Science,
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
An Integrated Neural IR System. Victoria J. Hodge Dept. of Computer Science, University ofyork, UK vicky@cs.york.ac.uk Jim Austin Dept. of Computer Science, University ofyork, UK austin@cs.york.ac.uk Abstract.
More informationA Metadatabase System for Semantic Image Search by a Mathematical Model of Meaning
A Metadatabase System for Semantic Image Search by a Mathematical Model of Meaning Yasushi Kiyoki, Takashi Kitagawa and Takanari Hayama Institute of Information Sciences and Electronics University of Tsukuba
More informationText Modeling with the Trace Norm
Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to
More informationLatent Semantic Indexing
Latent Semantic Indexing Thanks to Ian Soboroff Information Retrieval 1 Issues: Vector Space Model Assumes terms are independent Some terms are likely to appear together synonyms, related words spelling
More informationVisualization of Text Document Corpus
Informatica 29 (2005) 497 502 497 Visualization of Text Document Corpus Blaž Fortuna, Marko Grobelnik and Dunja Mladenić Jozef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia E-mail: {blaz.fortuna,
More informationWeb personalization using Extended Boolean Operations with Latent Semantic Indexing
Web personalization using Extended Boolean Operations with Latent Semantic Indexing Preslav Nakov Bulgaria, Sofia, Studentski grad. bl.8/room 723 (preslav@rila.bg) Key words: Information Retrieval and
More informationCombining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web
BU CS TR98-004. To appear in IEEE Workshop on Content-based Access of Image and Video Libraries, June 1998. Combining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web Marco
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationCSE 494: Information Retrieval, Mining and Integration on the Internet
CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:
More informationContent-based Dimensionality Reduction for Recommender Systems
Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender
More informationCHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING
43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic
More informationsecond_language research_teaching sla vivian_cook language_department idl
Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli
More informationRowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907
The Game of Clustering Rowena Cole and Luigi Barone Department of Computer Science, The University of Western Australia, Western Australia, 697 frowena, luigig@cs.uwa.edu.au Abstract Clustering is a technique
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationA Semi-Discrete Matrix Decomposition for Latent. Semantic Indexing in Information Retrieval. December 5, Abstract
A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval Tamara G. Kolda and Dianne P. O'Leary y December 5, 1996 Abstract The vast amount of textual information available
More informationThe Semantic Conference Organizer
34 The Semantic Conference Organizer Kevin Heinrich, Michael W. Berry, Jack J. Dongarra, Sathish Vadhiyar University of Tennessee, Knoxville, USA CONTENTS 34.1 Background... 571 34.2 Latent Semantic Indexing...
More informationClustering. Bruno Martins. 1 st Semester 2012/2013
Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 Motivation Basic Concepts
More informationThe Harvest Information Discovery and Access System
The Harvest Information Discovery and Access System C. Mic Bowman, Member of the Technical Staff, Transarc, Inc. Peter B. Danzig, Assistant Professor, CS Dept., U. Southern California Darren R. Hardy,
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationWeb site Image database. Web site Video database. Web server. Meta-server Meta-search Agent. Meta-DB. Video query. Text query. Web client.
(Published in WebNet 97: World Conference of the WWW, Internet and Intranet, Toronto, Canada, Octobor, 1997) WebView: A Multimedia Database Resource Integration and Search System over Web Deepak Murthy
More informationClustered SVD strategies in latent semantic indexing q
Information Processing and Management 41 (5) 151 163 www.elsevier.com/locate/infoproman Clustered SVD strategies in latent semantic indexing q Jing Gao, Jun Zhang * Laboratory for High Performance Scientific
More informationResearch Problems for. Scalable Internet Resource Discovery. C. Mic Bowman. Peter B. Danzig. Michael F. Schwartz. CU-CS March 1993
Research Problems for Scalable Internet Resource Discovery C. Mic Bowman Peter B. Danzig Michael F. Schwartz CU-CS-643-93 March 1993 University of Colorado at Boulder Technical Report CU-CS-643-93 Department
More informationDOCUMENT INDEXING USING INDEPENDENT TOPIC EXTRACTION. Yu-Hwan Kim and Byoung-Tak Zhang
DOCUMENT INDEXING USING INDEPENDENT TOPIC EXTRACTION Yu-Hwan Kim and Byoung-Tak Zhang School of Computer Science and Engineering Seoul National University Seoul 5-7, Korea yhkim,btzhang bi.snu.ac.kr ABSTRACT
More informationDistributed Indexing of the Web Using Migrating Crawlers
Distributed Indexing of the Web Using Migrating Crawlers Odysseas Papapetrou cs98po1@cs.ucy.ac.cy Stavros Papastavrou stavrosp@cs.ucy.ac.cy George Samaras cssamara@cs.ucy.ac.cy ABSTRACT Due to the tremendous
More informationthe number of states must be set in advance, i.e. the structure of the model is not t to the data, but given a priori the algorithm converges to a loc
Clustering Time Series with Hidden Markov Models and Dynamic Time Warping Tim Oates, Laura Firoiu and Paul R. Cohen Computer Science Department, LGRC University of Massachusetts, Box 34610 Amherst, MA
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL
ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL Shwetha S P 1 and Alok Ranjan 2 Visvesvaraya Technological University, Belgaum, Dept. of Computer Science and Engineering, Canara
More informationFavorites-Based Search Result Ordering
Favorites-Based Search Result Ordering Ben Flamm and Georey Schiebinger CS 229 Fall 2009 1 Introduction Search engine rankings can often benet from knowledge of users' interests. The query jaguar, for
More informationCluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6
Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,
More informationFeature-Guided Automated Collaborative Filtering. Yezdi Lashkari. Abstract. of content analysis of documents to represent a prole of user interests.
Feature-Guided Automated Collaborative Filtering Yezdi Lashkari Abstract Information ltering systems have traditionally relied on some form of content analysis of documents to represent a prole of user
More informationVector Semantics. Dense Vectors
Vector Semantics Dense Vectors Sparse versus dense vectors PPMI vectors are long (length V = 20,000 to 50,000) sparse (most elements are zero) Alternative: learn vectors which are short (length 200-1000)
More informationDocument Clustering in Reduced Dimension Vector Space
Document Clustering in Reduced Dimension Vector Space Kristina Lerman USC Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292 Email: lerman@isi.edu Abstract Document clustering is
More informationSprinkled Latent Semantic Indexing for Text Classification with Background Knowledge
Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge Haiqin Yang and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,
More informationLab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD
Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example
More informationMap of the document collection. Document j. Vector n. document. encoding. Mapping function
To appear in Proceedings of IJCNN'98, 1998 IEEE International Joint Conference on Neural Networs, Anchorage, Alasa, May 4-9, 1998. Dimensionality Reduction by Random Mapping: Fast Similarity Computation
More informationCumulative Percentage of File Requests
An Analysis of Geographical Push-Caching James Gwertzman, Microsoft Corporation Margo Seltzer, Harvard University Abstract Most caching schemes in wide-area, distributed systems are client-initiated. Decisions
More informationData Distortion for Privacy Protection in a Terrorist Analysis System
Data Distortion for Privacy Protection in a Terrorist Analysis System Shuting Xu, Jun Zhang, Dianwei Han, and Jie Wang Department of Computer Science, University of Kentucky, Lexington KY 40506-0046, USA
More informationConcept Based Search Using LSI and Automatic Keyphrase Extraction
Concept Based Search Using LSI and Automatic Keyphrase Extraction Ravina Rodrigues, Kavita Asnani Department of Information Technology (M.E.) Padre Conceição College of Engineering Verna, India {ravinarodrigues
More informationCluster Analysis for Microarray Data
Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that
More informationInformation Retrieval: Retrieval Models
CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models
More information523, IEEE Expert, England, Gaithersburg, , 1989, pp in Digital Libraries (ADL'99), Baltimore, 1998.
[14] L. Gravano, and H. Garcia-Molina. Generalizing GlOSS to Vector-Space databases and Broker Hierarchies. Technical Report, Computer Science Dept., Stanford University, 1995. [15] L. Gravano, and H.
More informationRetrieval by Content. Part 3: Text Retrieval Latent Semantic Indexing. Srihari: CSE 626 1
Retrieval by Content art 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent Semantic Indexing LSI isadvantage of exclusive use of representing a document as a T-dimensional vector of
More informationClustering Startups Based on Customer-Value Proposition
Clustering Startups Based on Customer-Value Proposition Daniel Semeniuta Stanford University dsemeniu@stanford.edu Meeran Ismail Stanford University meeran@stanford.edu Abstract K-means clustering is a
More informationThe Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers Detection
Volume-8, Issue-1 February 2018 International Journal of Engineering and Management Research Page Number: 194-200 The Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers
More informationVector Space Models: Theory and Applications
Vector Space Models: Theory and Applications Alexander Panchenko Centre de traitement automatique du langage (CENTAL) Université catholique de Louvain FLTR 2620 Introduction au traitement automatique du
More informationFeature Selection for Clustering. Abstract. Clustering is an important data mining task. Data mining
Feature Selection for Clustering Manoranjan Dash and Huan Liu School of Computing, National University of Singapore, Singapore. Abstract. Clustering is an important data mining task. Data mining often
More informationCollaborative Filtering based on User Trends
Collaborative Filtering based on User Trends Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos Papadopoulos, and Yannis Manolopoulos Aristotle University, Department of Informatics, Thessalonii 54124,
More informationThe Curated Web: A Recommendation Challenge. Saaya, Zurina; Rafter, Rachael; Schaal, Markus; Smyth, Barry. RecSys 13, Hong Kong, China
Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title The Curated Web: A Recommendation Challenge
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationDistance (*40 ft) Depth (*40 ft) Profile A-A from SEG-EAEG salt model
Proposal for a WTOPI Research Consortium Wavelet Transform On Propagation and Imaging for seismic exploration Ru-Shan Wu Modeling and Imaging Project, University of California, Santa Cruz August 27, 1996
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationResource and Knowledge Discovery in Global Information Systems: A Preliminary Design and Experiment
From: KDD-95 Proceedings. Copyright 1995, AAAI (www.aaai.org). All rights reserved. Resource and Knowledge Discovery in Global Information Systems: A Preliminary Design and Experiment Osmar R. Zaisne and
More informationSPE Copyright 2010, Society of Petroleum Engineers
SPE-132629 Intelligent model management and Visualization for smart oilfields Charalampos Chelmis 1, Amol Bakshi 3*, Burcu Seren 2, Karthik Gomadam 3, Viktor K. Prasanna 3 1 Department of Computer Science,
More informationAnalysis and Latent Semantic Indexing
18 Principal Component Analysis and Latent Semantic Indexing Understand the basics of principal component analysis and latent semantic index- Lab Objective: ing. Principal Component Analysis Understanding
More informationstorage, access, search, and organization. interpreted by remote applications. File systems support storage, providing a repository where
Prospero: A Base for Building Information Infrastructure B. Cliæord Neuman Steven Seger Augart Information Sciences Institute University of Southern California Abstract The recent introduction of new network
More informationPrincipal Component Image Interpretation A Logical and Statistical Approach
Principal Component Image Interpretation A Logical and Statistical Approach Md Shahid Latif M.Tech Student, Department of Remote Sensing, Birla Institute of Technology, Mesra Ranchi, Jharkhand-835215 Abstract
More information[8] Peter W. Foltz and Susan T. Dumais. Personalized information delivery: An analysis of information
[8] Peter W. Foltz and Susan T. Dumais. Personalized information delivery: An analysis of information ltering methods. Communications of the ACM, 35(12), December 1992. [9] Christopher Fox. A stop list
More informationDocument Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps
Proceedings of the Twenty-Second International FLAIRS Conference (2009) Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps Jeremy R. Millar and Gilbert L. Peterson
More informationVector Semantics. Dense Vectors
Vector Semantics Dense Vectors Sparse versus dense vectors PPMI vectors are long (length V = 20,000 to 50,000) sparse (most elements are zero) Alterna>ve: learn vectors which are short (length 200-1000)
More informationRecommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu
Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu 1 Introduction Yelp Dataset Challenge provides a large number of user, business and review data which can be used for
More informationDocument Clustering using Concept Space and Cosine Similarity Measurement
29 International Conference on Computer Technology and Development Document Clustering using Concept Space and Cosine Similarity Measurement Lailil Muflikhah Department of Computer and Information Science
More information/00/$10.00 (C) 2000 IEEE
A SOM based cluster visualization and its application for false coloring Johan Himberg Helsinki University of Technology Laboratory of Computer and Information Science P.O. Box 54, FIN-215 HUT, Finland
More informationCalibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland
Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland New Zealand Tel: +64 9 3034116, Fax: +64 9 302 8106
More informationA World Wide Web Resource Discovery System. Budi Yuwono Savio L. Lam Jerry H. Ying Dik L. Lee. Hong Kong University of Science and Technology
A World Wide Web Resource Discovery System Budi Yuwono Savio L. Lam Jerry H. Ying Dik L. Lee Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong Abstract
More informationA Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System
A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System Takashi Yukawa Nagaoka University of Technology 1603-1 Kamitomioka-cho, Nagaoka-shi Niigata, 940-2188 JAPAN
More informationVideo Representation. Video Analysis
BROWSING AND RETRIEVING VIDEO CONTENT IN A UNIFIED FRAMEWORK Yong Rui, Thomas S. Huang and Sharad Mehrotra Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign
More informationMichael F. Schwartz. March 12, (Original Date: August 1994 Revised March 1995) Abstract
Harvest: A Scalable, Customizable Discovery and Access System C. Mic Bowman Transarc Corp. Udi Manber University of Arizona Peter B. Danzig University of Southern California Michael F. Schwartz University
More informationAlgebraic Techniques for Analysis of Large Discrete-Valued Datasets
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets Mehmet Koyutürk 1,AnanthGrama 1, and Naren Ramakrishnan 2 1 Dept. of Computer Sciences, Purdue University W. Lafayette, IN, 47907, USA
More informationExplore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan
Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents
More informationUsing the Self-Organizing Map (SOM) Algorithm, as a Prototype E-Content Retrieval Tool
Using the Self-Organizing Map (SOM) Algorithm, as a Prototype E-Content Retrieval Tool Athanasios S. Drigas and John Vrettaros Department of Applied Technologies,NCSR DEMOKRITOS Ag. Paraskeui, Greece {dr,jvr}.imm.demokritos.gr
More informationIndexing by Latent Semantic Analysis
Indexing by Latent Semantic Analysis Scott Deerwester Center for Information and Language Studies, University of Chicago, Chicago, IL 60637 Susan T. Dumais*, George W. Furnas, and Thomas K. Landauer Bell
More informationCSE 252B: Computer Vision II
CSE 5B: Computer Vision II Lecturer: Ben Ochoa Scribes: San Nguyen, Ben Reineman, Arturo Flores LECTURE Guest Lecture.. Brief bio Ben Ochoa received the B.S., M.S., and Ph.D. degrees in electrical engineering
More informationIdentifying Value Mappings for Data Integration: An Unsupervised Approach
Identifying Value Mappings for Data Integration: An Unsupervised Approach Jaewoo Kang 1, Dongwon Lee 2, and Prasenjit Mitra 2 1 NC State University, Raleigh NC 27695, USA 2 Penn State University, University
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
Interpretation and Comparison of Multidimensional Data Partitions Esa Alhoniemi and Olli Simula Neural Networks Research Centre Helsinki University of Technology P. O.Box 5400 FIN-02015 HUT, Finland esa.alhoniemi@hut.fi
More informationManaging Changes to Schema of Data Sources in a Data Warehouse
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Managing Changes to Schema of Data Sources in
More informationRequest for Comments: Network Research/UCSD September 1997
Network Working Group Request for Comments: 2186 Category: Informational D. Wessels K. Claffy National Laboratory for Applied Network Research/UCSD September 1997 Status of this Memo Internet Cache Protocol
More informationClassification of Web Resident Sensor Resources using Latent Semantic Indexing and Ontologies
Classification of Web Resident Sensor Resources using Latent Semantic Indexing and Ontologies Wabo Majavu and Terence van Zyl Meraka Institute, CSIR P.O. Box 395 Pretoria 0001, South Africa (wmajavu/tvzyl@csir.co.za)
More informationSkill. Robot/ Controller
Skill Acquisition from Human Demonstration Using a Hidden Markov Model G. E. Hovland, P. Sikka and B. J. McCarragher Department of Engineering Faculty of Engineering and Information Technology The Australian
More informationRobust Pole Placement using Linear Quadratic Regulator Weight Selection Algorithm
329 Robust Pole Placement using Linear Quadratic Regulator Weight Selection Algorithm Vishwa Nath 1, R. Mitra 2 1,2 Department of Electronics and Communication Engineering, Indian Institute of Technology,
More informationModule 9 : Numerical Relaying II : DSP Perspective
Module 9 : Numerical Relaying II : DSP Perspective Lecture 36 : Fast Fourier Transform Objectives In this lecture, We will introduce Fast Fourier Transform (FFT). We will show equivalence between FFT and
More informationInfo Agent USER. External Retrieval Agent. Internal Services Agent. Interface Agent
The Info Agent: an Interface for Supporting Users in Intelligent Retrieval Daniela D'Aloisi and Vittorio Giannini Fondazione Ugo Bordoni Via B. Castiglione 59, I-00142, Rome, Italy Voice +39 6 5480 3422/3425
More informationRouting and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany.
Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers University of Dortmund, Germany
More informationImproving Probabilistic Latent Semantic Analysis with Principal Component Analysis
Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis Ayman Farahat Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94304 ayman.farahat@gmail.com Francine Chen
More informationThe Design and Implementation of an Intelligent Online Recommender System
The Design and Implementation of an Intelligent Online Recommender System Rosario Sotomayor, Joe Carthy and John Dunnion Intelligent Information Retrieval Group Department of Computer Science University
More information