LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

Size: px
Start display at page:

Download "LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier"

Transcription

1 LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, Shanghai, China Abstract. The task of Text Classification (TC) is to automatically assign natural language texts with thematic categories from a predefined category set. And Latent Semantic Indexing (LSI) is a well known technique in Information Retrieval, especially in dealing with polysemy (one word can have different meanings) and synonymy (different words are used to describe the same concept), but it is not an optimal representation for text classification. It always drops the text classification performance when being applied to the whole training set (global LSI) because this completely unsupervised method ignores class discrimination while only concentrating on representation. Some local LSI methods have been proposed to improve the classification by utilizing class discrimination information. However, their performance improvements over original term vectors are still very limited. In this paper, we propose a new local Latent Semantic Indexing method called Local Relevancy Ladder-Weighted LSI to improve text classification. And separate matrix singular value decomposition (SVD) was used to reduce the dimension of the vector space on the transformed local region of each class. Experimental results show that our method is much better than global LSI and traditional local LSI methods on classification within a much smaller LSI dimension. Key words: Data mining, Text classification, Latent Semantic Indexing (LSI). 1 Introduction Traditional text classification is based on explicit character, and the common method is to represent textual materials with space vectors using Vector Space Model (VSM) [3][4][5][6], finally, confirm the category of the test documents by comparing the degree of similarity. With more and more textual information available on the internet, conceptual retrieval has become more important than word matching retrieval. Traditional information retrieval system such as VSM retrieves relevant documents by lexical matching with query. The drawback of VSM is that it cannot retrieve the conceptually relevant documents with respect to query, and the semantic information may lose during the process of VSM.

2 In order to overcome the drawbacks of VSM, we apply Latent Semantic Indexing (LSI), which is widely used as the information retrieval technique, in the proposed method. While LSI is applied to text classification, there are two common methods. The first one is called Global LSI, which performs SVD directly on the entire training document collection to generate the new feature space. This method is completely unsupervised, that is, it pays no attention to the class label of the existing training data. It has no help to improve the discrimination power of document classes, so it always yields no better, sometimes even worse performance than original term vector on classification [3]. The other one is called Local LSI, which performs a separate SVD on the local region of each topic. Compared with global LSI, this method utilizes the class information effectively, so it improves the performance of global LSI greatly. However, due to the same weighting problem, the improvements over original term vector are still very limited. Typically all documents in the local region are equally considered in the SVD computation in local LSI. But intuitively, first, more relevant documents to the topic should contributes more to the local semantic space than those less-relevant ones; second, tiny less local relevant documents may be a little more global relevant. So based on these ideas, we propose a new local LSI method Local Relevancy Ladder-Weighted LSI (LRLW-LSI), which selects documents to the local region in a ladder way so that the local semantic space can be extracted more accurately considering both the local and global relevancy. Experimental results shown later prove this idea and it is found LRLW-LSI is much better than global LSI and ordinary local LSI methods on classification performance within a much smaller LSI dimension. 2 Prelminaries 2.1 Singular Value Decomposition SVD is one of the most important matrix decomposition in numerical linear algebra. It has been applied in many fields such as image processing [1], neural networks [2] and others. In this paper, we use matrix singular value decomposition to reduce the dimension of the vector space and remove the influences of synonymy and polysemy. Let A denote an matrix of real-valued. Without loss of generality we assume that m n, and rank(a) is r. The Singular value decomposition (SVD) of A is its factorization into a product of three matrices A = USV T (1) Where U is an orthogonal matrix, V an orthogonal matrix, and S an matrix and U T U = V T V = I n, S = diag(σ 1 σ n ) σ i > 0 for 1 i r and σ j = 0 for j r + 1. The first r columns of the orthogonal matrix U and V define the orthonormal eigenvectors associated with the r nonzero eigenvalues of AA T and A T A, respectively. The columns of U and V are referred to as the left and

3 right singular vectors, respectively, and the singular values of A are defined as the diagonal elements of S which are the nonnegative square roots of the n eigenvalues of AA T [4]. The following two theorems illustrate how the SVD can reveal important information about the structure of a matrix. Theorem 1: Let the SVD of A be given by Equation (1) and σ 1 σ 2 σ r > σ r+1 = = σ n = 0 let R(A) and N(A) denote the range and null space of A, respectively. Then, (1) Rank property: rank(a) = r, N(A) span{v r+1,, v n } and R(A) span{u 1,, u r }, where U = [u 1 u 2 u m ] and V = [v 1 v 2 v n ]. (2) Dyadic decomposition: A = r u i σ i vi T i=1 (3) Norms: A 2 F = σ2 1 + σ2 2 + σ2 r and A 2 2 = σ 1 Proof. See [4]. Theorem 2: Let the SVD of A be given by Equation (1) with and define A k = Then we can get:(proof. See [5]). k u i σ i vi T (2) i=1 min rank(b)=k A B 2 F = A A k 2 F = σ2 k σ2 p In other words, A k which is constructed from the k-largest singular triplets of A, is the closest rank-k matrix to A. In fact, A k is the best approximation to A for any unitarily invariant norm. Hence, min A B 2 = A A k = σ k+1 (3) rank(b)=k 2.2 Latent Semantic Indexing (LSI) Firstly, we briefly describe the Vector space model (VSM). Vector space model of text document is put forward by Salton [6] and used in SMART system. In the vector space model, a document is represented by a vector of words. And a word-by-document matrix A used to represent a collection of documents, where each entry represents the occurrences of a word in a document, e.g., A = (a wd ) where a wd is the weight of the word w in the document d. In order to implement Latent Semantic Indexing [7], a matrix of terms by documents must be constructed. In this paper, we use Vector Space Model mentioned above to construct the original and rough matrix A for LSI. Since every word does not normally appear in each document, the matrix A is usually sparse. The matrix A is factored into product of three matrices using the singular value decomposition. The SVD derives the latent semantic structure model from the orthogonal matrices U and V containing left and right singular vectors of A

4 respectively, and the diagonal matrix, S, of singular values of A. These matrices reflect a breakdown of the original relationships into linearly-independent vectors or factor values. The use of k factors or k-largest singular triplets is equivalent to approximation the original term-document matrix by A k in Equation (2). The SVD captures most of the important underlying structure in the association of terms and documents, yet at the same time removes the noise or variability in word usage. 3 Proposed Algorithm The most straightforward method of applying LSI for text classification is the global LSI method as discusses in section 2.2, which performs SVD directly on the entire training set and then testing documents transformed by simply projecting them onto the left singular matrix produced in the original decomposition. However, global LSI has many drawbacks which are discussed above. In order to overcome the drawbacks of Global LSI, we proposed a local LSI method. We name it Local Relevancy Ladder-Weighted LSI (LRLW-LSI). In local LSI, each document in the training set is first assigned with a relevancy score related to a topic, and then the documents whose scores are larger than a predefined threshold value are selected to generate the local region. Then SVD is performed on the local region to produce a local semantic space. This process can be simply described as the jump curve in Figure 4.2. That is 0/1 weighting method is used to generate the local region where documents whose scores are larger than the predefined threshold value are weighted with 1 and others are weighted with 0. The 0/1 weighting method is a simple but crude way to generate local region. It assumes that the selected documents are equally important in the SVD computation. But intuitively, first, different documents should play different roles to the final feature space and it is expected that more relevant documents to the topic should contributes more to the local semantic space than those less-relevant ones; second, less local relevant documents may be more global relevant. So based on these ideas, we propose the local LSI method Local Relevancy Ladder-Weighted LSI (LRLW-LSI), which selects documents to the local region in a ladder way. In other words, LRLW-LSI gives same weight among a ladder-range and different weight located in different ladder-range to documents in the local region according to its relevance before performing SVD so that the local semantic space can be extracted more accurately considering both the local and global relevancy and more relevant documents can be introduced with higher weights, which make they do more contribution to SVD computation. Hence, the better local semantic space which results in better classification performance can be extracted to separate positive documents from negative documents. Ladder-Curve is described in Figure 1. LRLW-LSI Algorithm For each class, assume an initial classifier C 0 has been trained using training documents in term vector representation and here we use SVM classifier. Then the training process of LRLW-LSI contains the following six steps.

5 Fig. 1: Local Relevancy Ladder-Weighted LSI (LRLW-LSI) (1) The initial classifier C 0 of topic c is used to assign initial relevancy score RS 0 to each training document. (2) Each training document is first weighted according to equation (4). The weighting function is a Sigmoid function which has two parameters a and b. Then, assign the belonged average ladder weight according to the first raw weight. E.g. if the first raw weight is 0.91, then we will assign 0.95 for the new weight, which is the average of the top ladder. (3) Top n documents are selected to generate the local term-by-document matrix of the topic c. (4) The SVD is performed to generate the local semantic space. (5) All other weighted training documents are folded into the new space. (6) All training documents in local LSI vector are used to train a real classifier C of topic c. t = t f(rsi ), wheref(rs i ) = e a(rsi+b) (4) Then the testing process of LRLW-LSI contains the following three steps. When a testing document comes in, (1) It is classified by the initial classifier C 0 to get its initial relevancy score. (2) It is weighted according to the equation (4) and then folded into the local semantic space to get its local LSI vector. (3) The local LSI vector generated in step 2 is finally used to be classified by the classifier RC to decide whether it is belongs to the topic or not. 4 Experiment Results In this section, we evaluate Local Relevancy Ladder-Weighted LSI method. SVM light ( is chosen as the classification algorithm, SVDPAKC/sis ( is used to perform SVD and

6 F-Measure is used to evaluate the classification results. Two common data sets are used, including Reuters and Industry Sector. Before performing classification, a standard stop-word list is used to remove common stop words and stemming technology is used to convert variations of the same words into its base form. Then those terms that appear in less than 3 documents are removed. Finally tf*idf (with ltc option) is used to assign the weight of each term in each document. 4.1 Data Set Two text collections, Reuters ( and Industry Sector4 (www-2.cs.cmu.edu/afs/cs.cmu.edu), are used in our experiment. Reuters (Reuters) is the most widely used text collection for text classification. There are total documents and 135 categories in this corpus. In our experiments, we only chose the most frequent 25 topics and used Lewis split which results in 6314 training examples and 2451 testing examples. Industry Sector (IS) is a collection of web pages belonging to companies from various economic sectors. There are 105 topics and total 9652 web pages in this dataset. A subset of the 14 categories whose size are bigger than 130 is selected for the experiments. 4.2 Experimental Results and Discussion For local relevancy ladder-weighted LSI, we use SVM classifier as the initial classifier C 0 to generate each document s initial relevancy score. And the parameters a and b of Sigmoid function are initially set with 5.0 and 0.2. The number of ladder step is assigned 10. Figure 2 and Figure 3 show the classification results on the data set, Reuters and Industry Sector. The lines of term vector are displayed only as the reference points in terms of performance comparison. From these figures, the following observations can be made: First, compared to term vector, LRLW-LSI improves the both F1 performances greatly on both data. For example, using 20 dimensions on Reuters , the micro-averaging F1 is improved by 1.1% and the macro-averaging F1 is improved by 3.7%; using 50 dimensions on Industry Sector, the microaveraging F1 is improved by 7.2% and the macro-averaging F1 is improved by 9.8%. Second, Figure 4 shows the run time of different LSI methods on a PC with Pentium IV 1.7GHz and 256M memory. The runtime includes both training procedure and testing procedure. As can be seen, term vector is the fastest and it needs only hundred seconds. Global LSI needs much more time than term vector due to the costly SVD computation on entire training set. Although SVD computation on local region is very fast, the overall computation on all topics is extremely high, so local LSI is not expected to be used in practice. Similar with

7 local LSI, LRLW-LSI has to perform a separate SVD on local region of each topic, but such a low LSI dimension makes LRLW-LSI be extremely rapid. It needs only less than 3 times of runtime of term vector, so it can be widely used in practice. Third, with the LSI dimension increases, the performances decrease slowly. But even in a relatively high dimension, the performances are still above the performances of term vector. Using 150 dimensions, for example, on Reuters the micro-averaging F1 is improved by 1.1% and the macro-averaging F1 is still improved by 1.2%; on Industry Sector, the micro-averaging F1 and the macro-averaging F1 are still improved by 3.7%. In this paper, we propose a Fig. 2: Results on Reuters Figure Fig. 3: Results on Industry Sector Local Relevancy Ladder-Weighted LSI (LRLW-LSI) method to help improve the text classification performance. This method is developed from Local LSI, but different from Local LSI in that the documents in the local region are introduced using a ladder-descending curve so that more relevant documents to the topic are assigned higher weights and the global relevancy is also considered. Therefore, the local SVD can concentrate on modeling the semantic information that is

8 Fig. 4: Run time of different methods actually most important for the classification task. The experimental results verify this idea and show that LRLW-LSI is quite effective. Acknowledgements This research was supported by the international cooperation project of Ministry of Science and Technology of PR China, grant No. CB , and by SEC E- Institute: Shanghai High Institutions Grid project. References 1. Liu, J., Niu, X.M., Kong, W.H.: Image Watermarking Based on Singular Value Decomposition. In: Proceedings of the 06 Intelligent Information Hiding and Multimedia Signal Processing, pp (2006) 2. Kanjilal, P.P., Dey, P.K., Banerjee, D.N.: Reduced-size neural networks through singular value decomposition and subset selection. Electronics Letters. 29, (1993) 3. Torkkola, K.: Linear Discriminant Analysis in Document Classification. In: Proceedings of the 01 IEEE ICDM Workshop Text Mining (2001) 4. Golub, G., Loan, C.V.: Matrix Computations. Johns-Hopkins, Baltimore, second ed. (1989) 5. Golub, G., Reinsch, C.: Handbook for automatic computation: linear algebra. Springer-Verlag, New York (1971) 6. Salton, Gerard.: Introduction to modern information retrieval, Auckland, McGraw- Hill (1983) 7. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science (1990)

A Content Vector Model for Text Classification

A Content Vector Model for Text Classification A Content Vector Model for Text Classification Eric Jiang Abstract As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications.

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

Clustered SVD strategies in latent semantic indexing q

Clustered SVD strategies in latent semantic indexing q Information Processing and Management 41 (5) 151 163 www.elsevier.com/locate/infoproman Clustered SVD strategies in latent semantic indexing q Jing Gao, Jun Zhang * Laboratory for High Performance Scientific

More information

Decomposition. November 20, Abstract. With the electronic storage of documents comes the possibility of

Decomposition. November 20, Abstract. With the electronic storage of documents comes the possibility of Latent Semantic Indexing via a Semi-Discrete Matrix Decomposition Tamara G. Kolda and Dianne P. O'Leary y November, 1996 Abstract With the electronic storage of documents comes the possibility of building

More information

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California Two-Dimensional Visualization for Internet Resource Discovery Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781 fshli, danzigg@cs.usc.edu

More information

Published in A R DIGITECH

Published in A R DIGITECH IMAGE RETRIEVAL USING LATENT SEMANTIC INDEXING Rachana C Patil*1, Imran R. Shaikh*2 *1 (M.E Student S.N.D.C.O.E.R.C, Yeola) *2(Professor, S.N.D.C.O.E.R.C, Yeola) rachanap4@gmail.com*1, imran.shaikh22@gmail.com*2

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge

Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge Haiqin Yang and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate Searching Information Servers Based on Customized Proles Technical Report USC-CS-96-636 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California

More information

Distributed Information Retrieval using LSI. Markus Watzl and Rade Kutil

Distributed Information Retrieval using LSI. Markus Watzl and Rade Kutil Distributed Information Retrieval using LSI Markus Watzl and Rade Kutil Abstract. Latent semantic indexing (LSI) is a recently developed method for information retrieval (IR). It is a modification of the

More information

Data Distortion for Privacy Protection in a Terrorist Analysis System

Data Distortion for Privacy Protection in a Terrorist Analysis System Data Distortion for Privacy Protection in a Terrorist Analysis System Shuting Xu, Jun Zhang, Dianwei Han, and Jie Wang Department of Computer Science, University of Kentucky, Lexington KY 40506-0046, USA

More information

Visualization of Text Document Corpus

Visualization of Text Document Corpus Informatica 29 (2005) 497 502 497 Visualization of Text Document Corpus Blaž Fortuna, Marko Grobelnik and Dunja Mladenić Jozef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia E-mail: {blaz.fortuna,

More information

The Semantic Conference Organizer

The Semantic Conference Organizer 34 The Semantic Conference Organizer Kevin Heinrich, Michael W. Berry, Jack J. Dongarra, Sathish Vadhiyar University of Tennessee, Knoxville, USA CONTENTS 34.1 Background... 571 34.2 Latent Semantic Indexing...

More information

Information Retrieval: Retrieval Models

Information Retrieval: Retrieval Models CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may

More information

RGB Digital Image Forgery Detection Using Singular Value Decomposition and One Dimensional Cellular Automata

RGB Digital Image Forgery Detection Using Singular Value Decomposition and One Dimensional Cellular Automata RGB Digital Image Forgery Detection Using Singular Value Decomposition and One Dimensional Cellular Automata Ahmad Pahlavan Tafti Mohammad V. Malakooti Department of Computer Engineering IAU, UAE Branch

More information

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic

More information

A ew Algorithm for Community Identification in Linked Data

A ew Algorithm for Community Identification in Linked Data A ew Algorithm for Community Identification in Linked Data Nacim Fateh Chikhi, Bernard Rothenburger, Nathalie Aussenac-Gilles Institut de Recherche en Informatique de Toulouse 118, route de Narbonne 31062

More information

ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ

ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ - 1 - ITERATIVE SEARCHING IN AN ONLINE DATABASE Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ 07962-1910 ABSTRACT An experiment examined how people use

More information

SOM+EOF for Finding Missing Values

SOM+EOF for Finding Missing Values SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

Self-organization of very large document collections

Self-organization of very large document collections Chapter 10 Self-organization of very large document collections Teuvo Kohonen, Samuel Kaski, Krista Lagus, Jarkko Salojärvi, Jukka Honkela, Vesa Paatero, Antti Saarela Text mining systems are developed

More information

Information Retrieval. hussein suleman uct cs

Information Retrieval. hussein suleman uct cs Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information

More information

Concept Based Search Using LSI and Automatic Keyphrase Extraction

Concept Based Search Using LSI and Automatic Keyphrase Extraction Concept Based Search Using LSI and Automatic Keyphrase Extraction Ravina Rodrigues, Kavita Asnani Department of Information Technology (M.E.) Padre Conceição College of Engineering Verna, India {ravinarodrigues

More information

Latent Semantic Indexing

Latent Semantic Indexing Latent Semantic Indexing Thanks to Ian Soboroff Information Retrieval 1 Issues: Vector Space Model Assumes terms are independent Some terms are likely to appear together synonyms, related words spelling

More information

Automatic Web Page Categorization using Principal Component Analysis

Automatic Web Page Categorization using Principal Component Analysis Automatic Web Page Categorization using Principal Component Analysis Richong Zhang, Michael Shepherd, Jack Duffy, Carolyn Watters Faculty of Computer Science Dalhousie University Halifax, Nova Scotia,

More information

Incorporating Latent Semantic Indexing into Spectral Graph Transducer for Text Classification

Incorporating Latent Semantic Indexing into Spectral Graph Transducer for Text Classification Proceedings of the Twenty-First International FLAIRS Conference (2008) Incorporating Latent Semantic Indexing into Spectral Graph Transducer for Text Classification Xinyu Dai 1, Baoming Tian 1, Junsheng

More information

Collaborative Filtering based on User Trends

Collaborative Filtering based on User Trends Collaborative Filtering based on User Trends Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos Papadopoulos, and Yannis Manolopoulos Aristotle University, Department of Informatics, Thessalonii 54124,

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Image Compression with Singular Value Decomposition & Correlation: a Graphical Analysis

Image Compression with Singular Value Decomposition & Correlation: a Graphical Analysis ISSN -7X Volume, Issue June 7 Image Compression with Singular Value Decomposition & Correlation: a Graphical Analysis Tamojay Deb, Anjan K Ghosh, Anjan Mukherjee Tripura University (A Central University),

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

DOCUMENT INDEXING USING INDEPENDENT TOPIC EXTRACTION. Yu-Hwan Kim and Byoung-Tak Zhang

DOCUMENT INDEXING USING INDEPENDENT TOPIC EXTRACTION. Yu-Hwan Kim and Byoung-Tak Zhang DOCUMENT INDEXING USING INDEPENDENT TOPIC EXTRACTION Yu-Hwan Kim and Byoung-Tak Zhang School of Computer Science and Engineering Seoul National University Seoul 5-7, Korea yhkim,btzhang bi.snu.ac.kr ABSTRACT

More information

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks Series Prediction as a Problem of Missing Values: Application to ESTSP7 and NN3 Competition Benchmarks Antti Sorjamaa and Amaury Lendasse Abstract In this paper, time series prediction is considered as

More information

IPL at CLEF 2013 Medical Retrieval Task

IPL at CLEF 2013 Medical Retrieval Task IPL at CLEF 2013 Medical Retrieval Task Spyridon Stathopoulos, Ismini Lourentzou, Antonia Kyriakopoulou, and Theodore Kalamboukis Information Processing Laboratory, Department of Informatics, Athens University

More information

Feature Selection Methods for an Improved SVM Classifier

Feature Selection Methods for an Improved SVM Classifier Feature Selection Methods for an Improved SVM Classifier Daniel Morariu, Lucian N. Vintan, and Volker Tresp Abstract Text categorization is the problem of classifying text documents into a set of predefined

More information

Dimension Reduction in Text Classification with Support Vector Machines

Dimension Reduction in Text Classification with Support Vector Machines Journal of Machine Learning Research 6 (2005) 37 53 Submitted 3/03; Revised 9/03; Published 1/05 Dimension Reduction in Text Classification with Support Vector Machines Hyunsoo Kim Peg Howland Haesun Park

More information

Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition

Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Nicholas J. Higham Pythagoras Papadimitriou Abstract A new method is described for computing the singular value decomposition

More information

Comparing and Combining Dimension Reduction Techniques for Efficient Text Clustering

Comparing and Combining Dimension Reduction Techniques for Efficient Text Clustering Comparing and Combining Dimension Reduction Techniques for Efficient Text Clustering Bin Tang Michael Shepherd Evangelos Milios Malcolm I. Heywood Faculty of Computer Science, Dalhousie University, Halifax,

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Unsupervised Feature Selection for Sparse Data

Unsupervised Feature Selection for Sparse Data Unsupervised Feature Selection for Sparse Data Artur Ferreira 1,3 Mário Figueiredo 2,3 1- Instituto Superior de Engenharia de Lisboa, Lisboa, PORTUGAL 2- Instituto Superior Técnico, Lisboa, PORTUGAL 3-

More information

Singular Value Decomposition, and Application to Recommender Systems

Singular Value Decomposition, and Application to Recommender Systems Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

Sparse Matrices in Image Compression

Sparse Matrices in Image Compression Chapter 5 Sparse Matrices in Image Compression 5. INTRODUCTION With the increase in the use of digital image and multimedia data in day to day life, image compression techniques have become a major area

More information

Document Clustering using Concept Space and Cosine Similarity Measurement

Document Clustering using Concept Space and Cosine Similarity Measurement 29 International Conference on Computer Technology and Development Document Clustering using Concept Space and Cosine Similarity Measurement Lailil Muflikhah Department of Computer and Information Science

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Chapter 1 AN INTRODUCTION TO TEXT MINING. 1. Introduction. Charu C. Aggarwal. ChengXiang Zhai

Chapter 1 AN INTRODUCTION TO TEXT MINING. 1. Introduction. Charu C. Aggarwal. ChengXiang Zhai Chapter 1 AN INTRODUCTION TO TEXT MINING Charu C. Aggarwal IBM T. J. Watson Research Center Yorktown Heights, NY charu@us.ibm.com ChengXiang Zhai University of Illinois at Urbana-Champaign Urbana, IL czhai@cs.uiuc.edu

More information

ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL

ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL Shwetha S P 1 and Alok Ranjan 2 Visvesvaraya Technological University, Belgaum, Dept. of Computer Science and Engineering, Canara

More information

1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma

1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma MULTIPLE SUBSPACE ULV ALGORITHM AND LMS TRACKING S. HOSUR, A. H. TEWFIK, D. BOLEY University of Minnesota 200 Union St. S.E. Minneapolis, MN 55455 U.S.A fhosur@ee,tewk@ee,boley@csg.umn.edu ABSTRACT. The

More information

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings. Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review

More information

Semantic text features from small world graphs

Semantic text features from small world graphs Semantic text features from small world graphs Jurij Leskovec 1 and John Shawe-Taylor 2 1 Carnegie Mellon University, USA. Jozef Stefan Institute, Slovenia. jure@cs.cmu.edu 2 University of Southampton,UK

More information

Feature Selection for fmri Classification

Feature Selection for fmri Classification Feature Selection for fmri Classification Chuang Wu Program of Computational Biology Carnegie Mellon University Pittsburgh, PA 15213 chuangw@andrew.cmu.edu Abstract The functional Magnetic Resonance Imaging

More information

An ICA based Approach for Complex Color Scene Text Binarization

An ICA based Approach for Complex Color Scene Text Binarization An ICA based Approach for Complex Color Scene Text Binarization Siddharth Kherada IIIT-Hyderabad, India siddharth.kherada@research.iiit.ac.in Anoop M. Namboodiri IIIT-Hyderabad, India anoop@iiit.ac.in

More information

LATENT SEMANTIC ANALYSIS AND WEIGHTED TREE SIMILARITY FOR SEMANTIC SEARCH IN DIGITAL LIBRARY

LATENT SEMANTIC ANALYSIS AND WEIGHTED TREE SIMILARITY FOR SEMANTIC SEARCH IN DIGITAL LIBRARY 6-02 Latent Semantic Analysis And Weigted Tree Similarity For Semantic Search In Digital Library LATENT SEMANTIC ANALYSIS AND WEIGHTED TREE SIMILARITY FOR SEMANTIC SEARCH IN DIGITAL LIBRARY Umi Sa adah

More information

Applications Video Surveillance (On-line or off-line)

Applications Video Surveillance (On-line or off-line) Face Face Recognition: Dimensionality Reduction Biometrics CSE 190-a Lecture 12 CSE190a Fall 06 CSE190a Fall 06 Face Recognition Face is the most common biometric used by humans Applications range from

More information

VK Multimedia Information Systems

VK Multimedia Information Systems VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Information Retrieval Basics: Agenda Vector

More information

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Comparing and Combining Dimension Reduction Techniques for Efficient Text Clustering

Comparing and Combining Dimension Reduction Techniques for Efficient Text Clustering Comparing and Combining Dimension Reduction Techniques for Efficient Text Clustering Bin Tang, Michael Shepherd, Evangelos Milios, Malcolm I. Heywood {btang, shepherd, eem, mheywood}@cs.dal.ca Faculty

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Collaborative Filtering for Netflix

Collaborative Filtering for Netflix Collaborative Filtering for Netflix Michael Percy Dec 10, 2009 Abstract The Netflix movie-recommendation problem was investigated and the incremental Singular Value Decomposition (SVD) algorithm was implemented

More information

Document Clustering in Reduced Dimension Vector Space

Document Clustering in Reduced Dimension Vector Space Document Clustering in Reduced Dimension Vector Space Kristina Lerman USC Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292 Email: lerman@isi.edu Abstract Document clustering is

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis

Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis Ayman Farahat Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94304 ayman.farahat@gmail.com Francine Chen

More information

Image Compression using Singular Value Decomposition

Image Compression using Singular Value Decomposition Applications of Linear Algebra 1/41 Image Compression using Singular Value Decomposition David Richards and Adam Abrahamsen Introduction The Singular Value Decomposition is a very important process. In

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

Essential Dimensions of Latent Semantic Indexing (LSI)

Essential Dimensions of Latent Semantic Indexing (LSI) Essential Dimensions of Latent Semantic Indexing (LSI) April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville, PA 19426 Email: akontostathis@ursinus.edu Abstract

More information

Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision

Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision Due to space limitation in the main paper, we present additional experimental results in this supplementary

More information

Real-time Background Subtraction via L1 Norm Tensor Decomposition

Real-time Background Subtraction via L1 Norm Tensor Decomposition Real-time Background Subtraction via L1 Norm Tensor Decomposition Taehyeon Kim and Yoonsik Choe Yonsei University, Seoul, Korea E-mail: pyomu@yonsei.ac.kr Tel/Fax: +82-10-2702-7671 Yonsei University, Seoul,

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information

Content-based Dimensionality Reduction for Recommender Systems

Content-based Dimensionality Reduction for Recommender Systems Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender

More information

A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS)

A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) Eman Abdu eha90@aol.com Graduate Center The City University of New York Douglas Salane dsalane@jjay.cuny.edu Center

More information

An efficient algorithm for sparse PCA

An efficient algorithm for sparse PCA An efficient algorithm for sparse PCA Yunlong He Georgia Institute of Technology School of Mathematics heyunlong@gatech.edu Renato D.C. Monteiro Georgia Institute of Technology School of Industrial & System

More information

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering A. Anil Kumar Dept of CSE Sri Sivani College of Engineering Srikakulam, India S.Chandrasekhar Dept of CSE Sri Sivani

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Recognition, SVD, and PCA

Recognition, SVD, and PCA Recognition, SVD, and PCA Recognition Suppose you want to find a face in an image One possibility: look for something that looks sort of like a face (oval, dark band near top, dark band near bottom) Another

More information

Term Graph Model for Text Classification

Term Graph Model for Text Classification Term Graph Model for Text Classification Wei Wang, Diep Bich Do, and Xuemin Lin University of New South Wales, Australia {weiw, s2221417, lxue}@cse.unsw.edu.au Abstract. Most existing text classification

More information

Text Similarity Based on Semantic Analysis

Text Similarity Based on Semantic Analysis Advances in Intelligent Systems Research volume 133 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE2016) Text Similarity Based on Semantic Analysis Junli Wang Qing

More information

Influence of Word Normalization on Text Classification

Influence of Word Normalization on Text Classification Influence of Word Normalization on Text Classification Michal Toman a, Roman Tesar a and Karel Jezek a a University of West Bohemia, Faculty of Applied Sciences, Plzen, Czech Republic In this paper we

More information

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents

More information

Image and Video Quality Assessment Using Neural Network and SVM

Image and Video Quality Assessment Using Neural Network and SVM TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1007-0214 18/19 pp112-116 Volume 13, Number 1, February 2008 Image and Video Quality Assessment Using Neural Network and SVM DING Wenrui (), TONG Yubing (), ZHANG Qishan

More information

Fast Linear Discriminant Analysis using QR Decomposition and Regularization

Fast Linear Discriminant Analysis using QR Decomposition and Regularization 1 Fast Linear Discriminant Analysis using QR Decomposition Regularization Haesun Park, Barry L. Drake, Sangmin Lee, Cheong Hee Park College of Computing, Georgia Institute of Technology, 2 Ferst Drive,

More information

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 14-PCA & Autoencoders 1 / 18

More information

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Raquel Urtasun & Rich Zemel University of Toronto Nov 4, 2015 Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:

More information

Non-negative Matrix Factorization for Multimodal Image Retrieval

Non-negative Matrix Factorization for Multimodal Image Retrieval Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Bioingenium Research Group Computer Systems and Industrial Engineering Department Universidad Nacional de Colombia

More information

Principal Coordinate Clustering

Principal Coordinate Clustering Principal Coordinate Clustering Ali Sekmen, Akram Aldroubi, Ahmet Bugra Koku, Keaton Hamm Department of Computer Science, Tennessee State University Department of Mathematics, Vanderbilt University Department

More information

PARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES

PARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES PARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES Zhou B. B. and Brent R. P. Computer Sciences Laboratory Australian National University Canberra, ACT 000 Abstract We describe

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , An Integrated Neural IR System. Victoria J. Hodge Dept. of Computer Science, University ofyork, UK vicky@cs.york.ac.uk Jim Austin Dept. of Computer Science, University ofyork, UK austin@cs.york.ac.uk Abstract.

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

1) Give a set-theoretic description of the given points as a subset W of R 3. a) The points on the plane x + y 2z = 0.

1) Give a set-theoretic description of the given points as a subset W of R 3. a) The points on the plane x + y 2z = 0. ) Give a set-theoretic description of the given points as a subset W of R. a) The points on the plane x + y z =. x Solution: W = {x: x = [ x ], x + x x = }. x b) The points in the yz-plane. Solution: W

More information