Vector Space Models: Theory and Applications
|
|
- Peregrine Underwood
- 5 years ago
- Views:
Transcription
1 Vector Space Models: Theory and Applications Alexander Panchenko Centre de traitement automatique du langage (CENTAL) Université catholique de Louvain FLTR 2620 Introduction au traitement automatique du langage 8 December 2010 FLTR Vector-Space Models 1/55
2 Plan 1 Vector Algebra Basics 2 Vector Space Model 3 Applications of the Vector Space Models 4 References and Further Reading FLTR Vector-Space Models 2/55
3 Vector Space Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR Vector-Space Models 3/55
4 Vector Space Vector Space Vector Space Set of elements x 1, x 2, x 3,... called vector space L if this set is closed under vector addition and scalar multiplication operations. Elements of this set called vectors. The following conditions must hold for x 1, x 2, x 3 L and α, β: 1 Commutativity x 1 + x 2 = x 2 + x 1. 2 Associativity of vector addition: (x 1 + x 2 ) + x 3 = x 1 + (x 2 + x 3 ). 3 Additive identity: For all x, 0 + x = x + 0 = x. 4 Existence of additive inverse: For any x, there exists a x such that x + ( x) = 0. 5 Associativity of scalar multiplication: α(βx) = (αβ)x. 6 Distributivity of scalar sums: (α + β)x = αx + βx. 7 Distributivity of vector sums: α(x 1 + x 2 ) = αx 1 + αx 2. 8 Scalar multiplication identity: 1x = x. FLTR Vector-Space Models 4/55
5 Euclidean Space Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR Vector-Space Models 5/55
6 Euclidean Space Euclidean Space Euclidean Space Euclidean n-dimensional space R n is a vector space, where (1) scalars are real numbers, (2) every element is represented by a tuple of real numbers, (3) addition is componentwise, and (4) scalar multiplication is multiplication on each term separately. A scalar α is an element of the field of real numbers R: α R, for example α = 3.14, β = 5.25, γ = FLTR Vector-Space Models 6/55
7 Euclidean Space Euclidean Space: Vectors A vector x is n-tuple of real numbers, an element of n-dimensional Euclidean space R n : x n 1 {}}{ x = x 2 R n = R R... R, x 3 for example x 1 = 5.25 R , x 2 = R FLTR Vector-Space Models 7/55
8 Euclidean Space Euclidean Space: Column and Row Vectors By default the vectors are column vectors: x 1 x = x 2 x 3 The transpose of a column vector is a row vector: x 1 x T = x 2 x 3 T = (x 1, x 2, x 3 ). FLTR Vector-Space Models 8/55
9 Euclidean Space Euclidean Space: Vector Addition, Scalar Multiplication Vector addition is componentwise for example x 1 + x 2 = (x 11 + x 21, x 12 + x 22,..., x 1n + x 2n ) T, x 1 = (3.14, 5.25, 1.45) T, x 2 = (1.45, 5.25, 3.14) T. x 1 + x 2 = (4.59, 10.50, 4.59) T. Multiplication of a vector x by a scalar α: αx = (αx 1, αx 2,..., αx n ) T, for example α = 2, x = (3.14, 5.25, 1.45) T, αx = (6.28, 10.50, 2.90) T. FLTR Vector-Space Models 9/55
10 Euclidean Space Geometrical Interpretation FLTR Vector-Space Models 10/55
11 Euclidean Space Euclidean Space: Dot Product, Vector Norm Dot (inner) product of two vectors for example x 1 x 2 = x 11 x 21 + x 12 x x 1n x 2n = n x 1i x 2i, i=1 x 1 = (3.14, 5.25, 1.45) T, x 2 = (1.45, 5.25, 3.14). x 1 x 2 = = Euclidean norm of a vector x = x x = n xi 2, for example x 1 = = = 6.28 FLTR Vector-Space Models 11/55 i=1
12 Euclidean Space Euclidean Space: Cosine Cosine between two vectors for example cos(x 1, x 2 ) = x 1 x 2 x 1 x 2 x 1 = (3.14, 5.25, 1.45) T, x 2 = (0, 0, 1), cos(x 1, x 2 ) = = 0.23( 77 ) The cosine is defined in terms of vector norm, and inner product. Therefore, every linear space with inner product defines cosine between vectors. FLTR Vector-Space Models 12/55
13 Euclidean Space Geometrical Interpretation Length of the vector a is its norm a. Length of the projection of the vector a to the vector i equals: a x = a cos(a, i) = a i i = a i. FLTR Vector-Space Models 13/55
14 Vector Space Basis Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR Vector-Space Models 14/55
15 Vector Space Basis Linear Independence Linear Combination Linear combination of k vectors is an expression as following: α 1 x 1 + α 2 x α k x k, where α 1, α 2,..., α k R are scalars. Linearly Dependent and Independent Vectors Vectors x 1, x 2,...x k are linearly dependent iff there exist scalars α 1, α 2,..., α k, not all zero, such that α 1 x 1 + α 2 x α k x k = 0 If no such scalars exist, then the vectors are said to be linearly independent. FLTR Vector-Space Models 15/55
16 Vector Space Basis Basis Basis A basis of a vector space L is a subset b 1, b 2,..., b n of vectors in L such that all basis vectors are linearly independent and if every vector x L can be represented as a linear combination of basis vectors: For all x L exist α 1, α 1,..., α n R such that Uniqueness of representation x = α 1 b 1 + α 2 b α n b n. A vector x L can be represented only in a one way with help of a basis of this vector space. FLTR Vector-Space Models 16/55
17 Vector Space Basis Standard Basis Standard Basis The standard basis for a Euclidean space consists of one unit vector pointing in the direction of each axis of the Cartesian coordinate system. The standard basis for the three-dimensional Euclidean space R 3 are three following orthogonal vectors of unit length: i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1). The standard basis for the n-dimensional Euclidean space R n is set of the following vectors: b 1 = (1, 0, 0, 0,..., 0) b 2 = (0, 1, 0, 0,..., 0)... b n = (0, 0, 0, 0,..., 1). FLTR Vector-Space Models 17/55
18 Matrices Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR Vector-Space Models 18/55
19 Matrices Matrix A m n matrix X is a rectangular array of scalars x ij R. x 11 x x 1n X =.... R m n x m1 x m2... x mn for example X = R A matrix with m rows and n columns X can be represented as a set of m row vectors or as a set of n column vectors: X = (x 1, x 2,..., x m ) T, X = (x 1, x 2,..., x n ). FLTR Vector-Space Models 19/55
20 Matrices Matrix Operations Matrix addition C = A + B is elementwise c ij = a ij + b ij. Matrix multiplication by a scalar C = αa is multiplication on each element separately c ij = αa ij. Matrix Euclidean norm equals n A = Transpose of the matrix A T is the matrix obtained by exchanging A s rows and columns: a ij = a ji. i=1 FLTR Vector-Space Models 20/55 n j=1 a 2 ij
21 Matrices Matrix Product: Coordinate Form a 11 a a 1n b 11 b b 1k A =...., B = a m1 a m2... a mn b n1 b n2... b nk The product C = AB of two matrices A and B is defined as following: c ij = n a il b lj = a i b j. l=1 Matrix multiplication is defined only if the dimensions of the matrices A, and B are compatible: C A B {}}{{}}{{}}{ [m k] = [m n] [n k]. FLTR Vector-Space Models 21/55
22 Matrices Matrix Product: Vector Form The Row by Column Method Represent A as a set of m row vectors, and B as a set of k column vectors. Then if C = AB, element c ij of C is the inner product of the i-th row of A and the j-th column of B: c ij = a i b j, i = 1, m, j = 1, k. a 11 a a 1n a 1 A = =., a m1 a m2... a mn a m b 11 b b 1k B = = ( ) b 1, b 2,..., b k. b n1 b n2... b nk FLTR Vector-Space Models 22/55
23 Matrices Matrix Multiplication: Vector Form FLTR Vector-Space Models 23/55
24 Matrices Matrix Product: Example For example, let A = and B = The dimensions of the matrices agree matrix multiplication is defined: C A B {}}{{}}{{}}{ [3 2] = [3 3] [3 2]. The matrix multiplication equals ( ) ( ) C = AB = ( ) ( ) = ( ) ( ) FLTR Vector-Space Models 24/55
25 Matrices Properties of Matrix Product Matrix multiplication is associative: A(BC) = (AB)C. Matrix multiplication is distributive over matrix addition: A(B + C) = AB + AC. Matrix product is compatible with scalar multiplication: α(ab) = (αa)b = A(αB). Matrix multiplication is NOT commutative: AB BA FLTR Vector-Space Models 25/55
26 Matrices Matrix Factorization Singular Value Decomposition is a factorization of a rectangular m n matrix A such that A = UDV T, where U is a m m matrix, and V is a n n matrix. These matrices are composed of orthogonal column vectors U T U = I, V T V = I. The m n matrix D has nonegative real numbers long the diagonal called singular values. FLTR Vector-Space Models 26/55
27 Definition Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR Vector-Space Models 27/55
28 Definition Main Characteristics of the Vector Space Model Vector Space Model (VSM) calculates similarity between m homogeneous objects O = {o 1, o 2,..., o m }. The model represents an object o as a vector (point) x in a n-dimensional Euclidean space R n. Every dimension of the vector space corresponds to a feature of an object. Set of all object are represented with a feature matrix X x 1 x 11 x x 1n x 2 X =. = x 21 x x 2n..... x m x m1 x m2... x mn The similarity between objects is modeled in terms of spatial distance between vectors (points). FLTR Vector-Space Models 28/55
29 Vector space model FLTR2620sometimes - Vector-Space Models called29/55 semantic space model in the Definition Vector Space Model Vector-Space Model Formally, Vector Space Model can be represented as a quadruple A, B, S, M, where B is a set b 1,..., b n of basis elements that determine the dimensionality of the space and the interpretation of each dimension. A specifies the weighting function A : R n R n. It takes as input a vector x representing an object o, and returns its normalized version. S is a similarity function S : R n 2 [0; 1] that maps pairs of vectors onto a scalar that represents measure of their similarity. M is a transformation that takes one vector space L and maps it onto another vector space L, in order to reduce dimensionality.
30 Basis Elements Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR Vector-Space Models 30/55
31 Basis Elements Interpretation: Basis Elements and Objects Basis elements b 1,..., b n define the interpretation of each dimension, or to the standard basis vectors b 1,..., b n. Type of objects defines the interpretation for each vector, represented by a VSM. The bag-of-words (BOW) is a vector space model, where objects are text documents, and basis elements are words of these text documents: Here b 1 = car, b 2 = auto, b 3 = insurance, b 4 = best, and o 1 = Doc1, o 2 = Doc2, o 3 = Doc3. FLTR Vector-Space Models 31/55
32 Basis Elements Interpretation: Feature Matrix Basis elements (features) can be also lemmas, multi-word expressions, named entities, documents, syntactic dependencies, morphemes, etc. Term-Document matrix: objects are documents, features are words of the document. Problem: information retrieval, text categorization and clustering. Term-Term matrix: objects are terms, features are context words / words from a dictionary definition. Problem: computational lexical semantics, distributional analysis. Term Senses-Terms matrix: objects are word senses, features are words. Problem: word sense disambiguation. Term-Syntactic Dependencies matrix: objects are terms, features are syntactic dependencies of a term. Problem: computational lexical semantics.... FLTR Vector-Space Models 32/55
33 Weighting Function Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR Vector-Space Models 33/55
34 Weighting Function Weighting Function Weighting Function Weighting function A : R n R n takes as input a vector x, representing an object o, and returns its normalized version. Weighting is used to adapt the feature value according to its actual importance. Identity function (trivial): A(x) = x. Logarithmic weighting function: A(x ij ) = 1 + log(x ij ), x ij > 0. Length-normalization with Euclidean norm: A(x) = x x. Convert to probability distribution: x ij A(x ij ) = p(i, j) = n j=1 x ij = x ij x i l. FLTR Vector-Space Models 34/55
35 Weighting Function Weighting Function Entropy weighting: ( A(x ij ) = x ij n k=1 Pointwise Mutual Information: ) p ik log(p ik ), p ik = x ik log(n) n l=1 x. il p(i, j) A(x ij ) = log p(i)p(j). TF-IDF (Term Frequency - Inversed Document Frequency):... TF IDF {}}{{}}{ x ij m A(x ij ) = n k=1 x log ik {x lj > 0, l = 1, m} FLTR Vector-Space Models 35/55
36 Weighting Function Weighting Function: Example Consider the following term-document matrix X, where x ij is term frequency: Let us normalize it with the Euclidean norm: x Doc1 = x Doc1 x Doc1 = (27,3,0,14)T = (27,3,0,14)T = (0.88, 0.10, 0, 0.46) T. Finally, we obtain the normalized term-document matrix: FLTR Vector-Space Models 36/55
37 Similarity Function Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR Vector-Space Models 37/55
38 Similarity Function Similarity Function Similarity Function A similarity function S(x, y) defines a measure of similarity of two vectors x, y R n. It should follow the following properties for any vectors x, y: Non-negativity: S(x, y) 0. Maximality: S(x, x) S(x, y). Symmetry : S(x, y) = S(y, x). FLTR Vector-Space Models 38/55
39 Similarity Function Distance Function Distance Function A distance (dissimilarity) function D(x, y) defines distance between two vectors x, y R n. It should follow the following properties for any vectors x, y, z: Non-negativity D(x, y) 0. Identity of indiscernibles D(x, y) = 0 iff x = y. Symmetry D(x, y) = D(y, x). Triangle inequality: D(x, z) D(x, y) + D(y, z). FLTR Vector-Space Models 39/55
40 Similarity Function Converting Distance to Similarity A distance measure between two vectors x, y R n can be converted to a similarity measure between them as following: S(x, y) = 1 D(x, y), if S(x, y) [0; 1] S(x, y) = 1 2D(x, y), if S(x, y) [ 1; +1] FLTR Vector-Space Models 40/55
41 Similarity Function Some Similarity and Distance Functions Minkowski distance (L q distance): D(x, y) = n q (x i y i ) q. i=1 Euclidean distance (L 2 distance): D(x, y) = n (x i y i ) 2 = x y. i=1 Manhattan or city block distance (L 1 distance): n D(x, y) = x i y i. i=1 FLTR Vector-Space Models 41/55
42 Similarity Function Some Similarity and Distance Functions Jaccard similarity: S(x, y) = n i=1 min(x i, y i ) n i=1 max(x i, y i ). Dice similarity: S(x, y) = 2 n i=1 min(x i, y i ) n i=1 (x. i, y i ) Cosine similarity: S(x, y) = x y x y. FLTR Vector-Space Models 42/55
43 Transformation Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR Vector-Space Models 43/55
44 Transformation Transformation: Dimensionality Reduction Transformation M is a transformation that takes a vector space L and maps it onto another vector space L, in order to reduce dimensionality, so that dim(l) dim( L). The goal of a dimensionality reduction is to find a smaller number of uncorrelated or lowly correlated dimensions. Reasons for dimensionality reduction: The VSM assumes independence of dimensions. In practice, some dimensions are linear combinations of other dimensions: synonyms, various spellings, etc. High computational complexity in the high-dimensional space. Can help discover latent structure in the data. FLTR Vector-Space Models 44/55
45 Transformation Transformation: Dimensionality Reduction Simple dimensionality reduction can be done on the preprocessing stage: stop words, rare dimensions, etc. In addition, feature matrix factorization methods can be used for dimensionality reduction: Truncated Singular Value Decomposition (SVD) Non-Negative Matrix Factorization (NMF)... FLTR Vector-Space Models 45/55
46 Transformation Truncated Singular Value Decomposition FLTR Vector-Space Models 46/55
47 Various applications of the Vector Space Models 1 Information Retrieval 2 Computational Lexical Semantics 3 Word Sense Disambiguation 4 Other Applications FLTR Vector-Space Models 47/55
48 Information Retrieval Problem Formulation Given a user query q find the k most relevant documents {d 1,..., d k } from collection of n documents {d 1,..., d n }. A TF-IDF B Terms from all documents O Documents S Cosine similarity M Truncated SVD (Latent Semantic Indexing) Documents are represented as vectors in the bag-of-word space. User text query is represented as a vector in the same space as the documents. FLTR Vector-Space Models 48/55
49 Information Retrieval Let search query be q = car, then it will be represented as the following vector: q = (1, 0, 0, 0). FLTR Vector-Space Models 49/55
50 Computational Lexical Semantics Problem Formulation Given a term t find the k most semantically similar terms {t 1,..., t k } from the vocabulary of n terms {t 1,..., t n }. A Pointwise Mutual Information B Words / Terms / Syntactic Contexts O Terms S Cosine similarity / Kullback-Leibler divergence M Truncated SVD (Latent Semantic Analysis)/ Non-Negative Matrix Factorization Distributional hypothesis of Harris: terms are semantically similar if they appear within similar context windows. FLTR Vector-Space Models 50/55
51 Computational Lexical Semantics FLTR Vector-Space Models 51/55
52 Word Sense Disambiguation Problem Formulation Given a word occurrence w find its sense from the k possible senses {s 1,..., s k }. A Identity function / Length-normalization B Words / Terms O Term Senses S Inner Product (simplified Lesk) M No Term senses are represented as vectors in the BOW of the dictionary definitions. Term is represented as a vector in the same space as term senses. FLTR Vector-Space Models 52/55
53 Some Other Applications Named Entity Disambiguation Text Documents Clustering Text Documents Categorization Collaborative Recommendations... FLTR Vector-Space Models 53/55
54 References I Berry, M. W. and Browne, M. (2005). Understanding Search Engines: Mathematical Modeling and Text Retrieval (Software, Environments, Tools), Second Edition. SIAM, Society for Industrial and Applied Mathematics. Berry, M. W., Drmac, Z., and Jessup, E. R. (1999). Matrices, vector spaces, and information retrieval. SIAM Rev., 41: Lowe, W. Towards a theory of semantic space. Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, 1 edition. Van de Cruys, T. (2010). Mining for Meaning. The Extraction of Lexicosemantic Knowledge from Text. FLTR Vector-Space Models 54/55
55 Acknowledgments Some illustrations for this presentation were borrowed from [Manning et al., 2008], [Van de Cruys, 2010], and Wikipedia. I would like to thank the authors of these figures. FLTR Vector-Space Models 55/55
Knowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability
More informationvector space retrieval many slides courtesy James Amherst
vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the
More informationC O M P U T E R G R A P H I C S. Computer Graphics. Three-Dimensional Graphics I. Guoying Zhao 1 / 52
Computer Graphics Three-Dimensional Graphics I Guoying Zhao 1 / 52 Geometry Guoying Zhao 2 / 52 Objectives Introduce the elements of geometry Scalars Vectors Points Develop mathematical operations among
More informationCS452/552; EE465/505. Geometry Transformations
CS452/552; EE465/505 Geometry Transformations 1-26-15 Outline! Geometry: scalars, points & vectors! Transformations Read: Angel, Chapter 4 (study cube.html/cube.js example) Appendix B: Spaces (vector,
More informationMachine Learning for Signal Processing Fundamentals of Linear Algebra
Machine Learning for Signal Processing Fundamentals of Linear Algebra Class Sep 4 Instructor: Bhiksha Raj Sep 4-755/8-797 Administrivia Info on second TA still awaited from ECE Registration: Anyone on
More informationMachine Learning for Signal Processing Fundamentals of Linear Algebra
Machine Learning for Signal Processing Fundamentals of Linear Algebra Class 3 Sep 3 Instructor: Bhiksha Raj 3 Sep 3-755/8-797 Administrivia Change of classroom: BH A5 Being broadcast to west coast Registration:
More informationCT5510: Computer Graphics. Transformation BOCHANG MOON
CT5510: Computer Graphics Transformation BOCHANG MOON 2D Translation Transformations such as rotation and scale can be represented using a matrix M.., How about translation? No way to express this using
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationUNIT 2 2D TRANSFORMATIONS
UNIT 2 2D TRANSFORMATIONS Introduction With the procedures for displaying output primitives and their attributes, we can create variety of pictures and graphs. In many applications, there is also a need
More informationGeometry. CS 537 Interactive Computer Graphics Prof. David E. Breen Department of Computer Science
Geometry CS 537 Interactive Computer Graphics Prof. David E. Breen Department of Computer Science E. Angel and D. Shreiner: Interactive Computer Graphics 6E Addison-Wesley 2012. 1 Objectives Introduce
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationConcept Based Search Using LSI and Automatic Keyphrase Extraction
Concept Based Search Using LSI and Automatic Keyphrase Extraction Ravina Rodrigues, Kavita Asnani Department of Information Technology (M.E.) Padre Conceição College of Engineering Verna, India {ravinarodrigues
More informationAH Matrices.notebook November 28, 2016
Matrices Numbers are put into arrays to help with multiplication, division etc. A Matrix (matrices pl.) is a rectangular array of numbers arranged in rows and columns. Matrices If there are m rows and
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationThis lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring
This lecture: IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring 1 Ch. 6 Ranked retrieval Thus far, our queries have all
More informationComputer Graphics. Coordinate Systems and Change of Frames. Based on slides by Dianna Xu, Bryn Mawr College
Computer Graphics Coordinate Systems and Change of Frames Based on slides by Dianna Xu, Bryn Mawr College Linear Independence A set of vectors independent if is linearly If a set of vectors is linearly
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John
More informationAnalysis and Latent Semantic Indexing
18 Principal Component Analysis and Latent Semantic Indexing Understand the basics of principal component analysis and latent semantic index- Lab Objective: ing. Principal Component Analysis Understanding
More informationPetShop (BYU Students, SIGGRAPH 2006)
Now Playing: PetShop (BYU Students, SIGGRAPH 2006) My Mathematical Mind Spoon From Gimme Fiction Released May 10, 2005 Geometric Objects in Computer Graphics Rick Skarbez, Instructor COMP 575 August 30,
More informationMultiple View Geometry in Computer Vision
Multiple View Geometry in Computer Vision Prasanna Sahoo Department of Mathematics University of Louisville 1 Projective 3D Geometry (Back to Chapter 2) Lecture 6 2 Singular Value Decomposition Given a
More informationHomework 5: Transformations in geometry
Math b: Linear Algebra Spring 08 Homework 5: Transformations in geometry This homework is due on Wednesday, February 7, respectively on Thursday February 8, 08. a) Find the reflection matrix at the line
More informationMIA - Master on Artificial Intelligence
MIA - Master on Artificial Intelligence 1 Hierarchical Non-hierarchical Evaluation 1 Hierarchical Non-hierarchical Evaluation The Concept of, proximity, affinity, distance, difference, divergence We use
More informationText Analytics (Text Mining)
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS
More informationCHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING
43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic
More informationDigital Libraries: Language Technologies
Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................
More informationLecture 2. Topology of Sets in R n. August 27, 2008
Lecture 2 Topology of Sets in R n August 27, 2008 Outline Vectors, Matrices, Norms, Convergence Open and Closed Sets Special Sets: Subspace, Affine Set, Cone, Convex Set Special Convex Sets: Hyperplane,
More informationHomework: Exercise 1. Homework: Exercise 2b. Homework: Exercise 2a. Homework: Exercise 2d. Homework: Exercise 2c
Homework: Exercise Information Retrieval and Web Search Engines How to classify a book about the Boolean retrieval model using the 998 ACM Computing Classification System? http://www.acm.org/about/class/ccs98-html
More informationPoints Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked
Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationTransformations Computer Graphics I Lecture 4
15-462 Computer Graphics I Lecture 4 Transformations Vector Spaces Affine and Euclidean Spaces Frames Homogeneous Coordinates Transformation Matrices January 23, 2003 [Angel, Ch. 4] Frank Pfenning Carnegie
More informationLecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur
Lecture 5: Matrices Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur 29 th July, 2008 Types of Matrices Matrix Addition and Multiplication
More informationTherefore, after becoming familiar with the Matrix Method, you will be able to solve a system of two linear equations in four different ways.
Grade 9 IGCSE A1: Chapter 9 Matrices and Transformations Materials Needed: Straightedge, Graph Paper Exercise 1: Matrix Operations Matrices are used in Linear Algebra to solve systems of linear equations.
More informationLatent Semantic Indexing
Latent Semantic Indexing Thanks to Ian Soboroff Information Retrieval 1 Issues: Vector Space Model Assumes terms are independent Some terms are likely to appear together synonyms, related words spelling
More informationMatrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation
Chapter 7 Introduction to Matrices This chapter introduces the theory and application of matrices. It is divided into two main sections. Section 7.1 discusses some of the basic properties and operations
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections
More informationDocument indexing, similarities and retrieval in large scale text collections
Document indexing, similarities and retrieval in large scale text collections Eric Gaussier Univ. Grenoble Alpes - LIG Eric.Gaussier@imag.fr Eric Gaussier Document indexing, similarities & retrieval 1
More informationCSE 494: Information Retrieval, Mining and Integration on the Internet
CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:
More informationVector Algebra Transformations. Lecture 4
Vector Algebra Transformations Lecture 4 Cornell CS4620 Fall 2008 Lecture 4 2008 Steve Marschner 1 Geometry A part of mathematics concerned with questions of size, shape, and relative positions of figures
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationComputer Graphics. Lecture 2. Doç. Dr. Mehmet Gokturk
Computer Graphics Lecture 2 Doç. Dr. Mehmet Gokturk Mathematical Foundations l Hearn and Baker (A1 A4) appendix gives good review l Some of the mathematical tools l Trigonometry l Vector spaces l Points,
More informationConic Duality. yyye
Conic Linear Optimization and Appl. MS&E314 Lecture Note #02 1 Conic Duality Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/
More informationRapid growth of massive datasets
Overview Rapid growth of massive datasets E.g., Online activity, Science, Sensor networks Data Distributed Clusters are Pervasive Data Distributed Computing Mature Methods for Common Problems e.g., classification,
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationIntroduction to Matrix Operations in Matlab
Introduction to Matrix Operations in Matlab Gerald W. Recktenwald Department of Mechanical Engineering Portland State University gerry@pdx.edu ME 350: Introduction to Matrix Operations in Matlab Overview
More informationOverview of Clustering
based on Loïc Cerfs slides (UFMG) April 2017 UCBL LIRIS DM2L Example of applicative problem Student profiles Given the marks received by students for different courses, how to group the students so that
More informationData Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationChapter 1: Number and Operations
Chapter 1: Number and Operations 1.1 Order of operations When simplifying algebraic expressions we use the following order: 1. Perform operations within a parenthesis. 2. Evaluate exponents. 3. Multiply
More informationECG782: Multidimensional Digital Signal Processing
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu ECG782: Multidimensional Digital Signal Processing Spring 2014 TTh 14:30-15:45 CBC C313 Lecture 06 Image Structures 13/02/06 http://www.ee.unlv.edu/~b1morris/ecg782/
More informationInformation Retrieval
Information Retrieval Suan Lee - Information Retrieval - 06 Scoring, Term Weighting and the Vector Space Model 1 Recap of lecture 5 Collection and vocabulary statistics: Heaps and Zipf s laws Dictionary
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationLecture 4: Transformations and Matrices. CSE Computer Graphics (Fall 2010)
Lecture 4: Transformations and Matrices CSE 40166 Computer Graphics (Fall 2010) Overall Objective Define object in object frame Move object to world/scene frame Bring object into camera/eye frame Instancing!
More informationUnsupervised Feature Selection for Sparse Data
Unsupervised Feature Selection for Sparse Data Artur Ferreira 1,3 Mário Figueiredo 2,3 1- Instituto Superior de Engenharia de Lisboa, Lisboa, PORTUGAL 2- Instituto Superior Técnico, Lisboa, PORTUGAL 3-
More informationMining di Dati Web. Lezione 3 - Clustering and Classification
Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised
More informationInformation Retrieval. hussein suleman uct cs
Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information
More informationFeature selection. LING 572 Fei Xia
Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection
More informationGeometric transformations assign a point to a point, so it is a point valued function of points. Geometric transformation may destroy the equation
Geometric transformations assign a point to a point, so it is a point valued function of points. Geometric transformation may destroy the equation and the type of an object. Even simple scaling turns a
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationThe Semantic Conference Organizer
34 The Semantic Conference Organizer Kevin Heinrich, Michael W. Berry, Jack J. Dongarra, Sathish Vadhiyar University of Tennessee, Knoxville, USA CONTENTS 34.1 Background... 571 34.2 Latent Semantic Indexing...
More informationConvex Optimization. 2. Convex Sets. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University. SJTU Ying Cui 1 / 33
Convex Optimization 2. Convex Sets Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 33 Outline Affine and convex sets Some important examples Operations
More informationInformation Retrieval
Information Retrieval Natural Language Processing: Lecture 12 30.11.2017 Kairit Sirts Homework 4 things that seemed to work Bidirectional LSTM instead of unidirectional Change LSTM activation to sigmoid
More informationLecture 7: Relevance Feedback and Query Expansion
Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Ronan Cummins Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk
More informationTransformations Computer Graphics I Lecture 4
15-462 Computer Graphics I Lecture 4 Transformations Vector Spaces Affine and Euclidean Spaces Frames Homogeneous Coordinates Transformation Matrices January 24, 2002 [Angel, Ch. 4] Frank Pfenning Carnegie
More informationHomework 5: Transformations in geometry
Math 21b: Linear Algebra Spring 2018 Homework 5: Transformations in geometry This homework is due on Wednesday, February 7, respectively on Thursday February 8, 2018. 1 a) Find the reflection matrix at
More information11/4/2015. Lecture 2: More Retrieval Models. Previous Lecture. Fuzzy Index Terms. Fuzzy Logic. Fuzzy Retrieval: Open Problems
/4/205 Previous Lecture Information Retrieval and Web Search Engines Lecture 2: More Retrieval Models November 05 th, 205 Boolean retrieval: Documents: Sets of words (index terms) Queries: Propositional
More informationClustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017
Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised
More informationDocument Clustering using Concept Space and Cosine Similarity Measurement
29 International Conference on Computer Technology and Development Document Clustering using Concept Space and Cosine Similarity Measurement Lailil Muflikhah Department of Computer and Information Science
More informationRecommender System. What is it? How to build it? Challenges. R package: recommenderlab
Recommender System What is it? How to build it? Challenges R package: recommenderlab 1 What is a recommender system Wiki definition: A recommender system or a recommendation system (sometimes replacing
More informationText Modeling with the Trace Norm
Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to
More informationSection III: TRANSFORMATIONS
Section III: TRANSFORMATIONS in 2-D 2D TRANSFORMATIONS AND MATRICES Representation of Points: 2 x 1 matrix: X Y General Problem: [B] = [T] [A] [T] represents a generic operator to be applied to the points
More informationTopics in Information Retrieval
Topics in Information Retrieval FSNLP, chapter 15 Christopher Manning and Hinrich Schütze 1999 2001 159 Information Retrieval Getting information from document repositories Normally text (though spoken,
More informationGeometry. Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts University of New Mexico
Geometry Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts University of New Mexico 1 Objectives Introduce the elements of geometry - Scalars - Vectors - Points
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More informationJune 15, Abstract. 2. Methodology and Considerations. 1. Introduction
Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may
More information2D Euclidean Geometric Algebra Matrix Representation
2D Euclidean Geometric Algebra Matrix Representation Kurt Nalt March 29, 2015 Abstract I present the well-known matrix representation of 2D Euclidean Geometric Algebra, and suggest a literal geometric
More informationMonday, 12 November 12. Matrices
Matrices Matrices Matrices are convenient way of storing multiple quantities or functions They are stored in a table like structure where each element will contain a numeric value that can be the result
More informationParallel and perspective projections such as used in representing 3d images.
Chapter 5 Rotations and projections In this chapter we discuss Rotations Parallel and perspective projections such as used in representing 3d images. Using coordinates and matrices, parallel projections
More informationVector: A series of scalars contained in a column or row. Dimensions: How many rows and columns a vector or matrix has.
ASSIGNMENT 0 Introduction to Linear Algebra (Basics of vectors and matrices) Due 3:30 PM, Tuesday, October 10 th. Assignments should be submitted via e-mail to: matlabfun.ucsd@gmail.com You can also submit
More informationCS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University
CS490W Text Clustering Luo Si Department of Computer Science Purdue University [Borrows slides from Chris Manning, Ray Mooney and Soumen Chakrabarti] Clustering Document clustering Motivations Document
More informationInformation Retrieval and Data Mining Part 1 Information Retrieval
Information Retrieval and Data Mining Part 1 Information Retrieval 2005/6, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Retrieval - 1 1 Today's Question 1. Information
More informationNATURAL LANGUAGE PROCESSING
NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity
More informationIntroduction to Information Retrieval
Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 2: More Retrieval Models April 13, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationTransformations. CSCI 420 Computer Graphics Lecture 4
CSCI 420 Computer Graphics Lecture 4 Transformations Jernej Barbic University of Southern California Vector Spaces Euclidean Spaces Frames Homogeneous Coordinates Transformation Matrices [Angel, Ch. 4]
More informationInforma(on Retrieval
Introduc)on to Informa)on Retrieval CS3245 Informa(on Retrieval Lecture 7: Scoring, Term Weigh9ng and the Vector Space Model 7 Last Time: Index Compression Collec9on and vocabulary sta9s9cs: Heaps and
More informationComputer Science 336 Fall 2017 Homework 2
Computer Science 336 Fall 2017 Homework 2 Use the following notation as pseudocode for standard 3D affine transformation matrices. You can refer to these by the names below. There is no need to write out
More informationText Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering
Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering A. Anil Kumar Dept of CSE Sri Sivani College of Engineering Srikakulam, India S.Chandrasekhar Dept of CSE Sri Sivani
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationMultiple View Geometry in Computer Vision
Multiple View Geometry in Computer Vision Prasanna Sahoo Department of Mathematics University of Louisville 1 Structure Computation Lecture 18 March 22, 2005 2 3D Reconstruction The goal of 3D reconstruction
More informationObjectives. Geometry. Coordinate-Free Geometry. Basic Elements. Transformations to Change Coordinate Systems. Scalars
Objecties Geometry CS Interactie Computer Graphics Prof. Daid E. Breen Department of Computer Science Introduce the elements of geometry - Scalars - Vectors - Points Deelop mathematical operations among
More informationXPM 2D Transformations Week 2, Lecture 3
CS 430/585 Computer Graphics I XPM 2D Transformations Week 2, Lecture 3 David Breen, William Regli and Maxim Peysakhov Geometric and Intelligent Computing Laboratory Department of Computer Science Drexel
More informationLecture 6: Multimedia Information Retrieval Dr. Jian Zhang
Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang NICTA & CSE UNSW COMP9314 Advanced Database S1 2007 jzhang@cse.unsw.edu.au Reference Papers and Resources Papers: Colour spaces-perceptual, historical
More informationGEOMETRIC TRANSFORMATIONS AND VIEWING
GEOMETRIC TRANSFORMATIONS AND VIEWING 2D and 3D 1/44 2D TRANSFORMATIONS HOMOGENIZED Transformation Scaling Rotation Translation Matrix s x s y cosθ sinθ sinθ cosθ 1 dx 1 dy These 3 transformations are
More informationCALCULATING TRANSFORMATIONS OF KINEMATIC CHAINS USING HOMOGENEOUS COORDINATES
CALCULATING TRANSFORMATIONS OF KINEMATIC CHAINS USING HOMOGENEOUS COORDINATES YINGYING REN Abstract. In this paper, the applications of homogeneous coordinates are discussed to obtain an efficient model
More informationBOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen
BOOLEAN MATRIX FACTORIZATIONS with applications in data mining Pauli Miettinen MATRIX FACTORIZATIONS BOOLEAN MATRIX FACTORIZATIONS o THE BOOLEAN MATRIX PRODUCT As normal matrix product, but with addition
More informationModern Multidimensional Scaling
Ingwer Borg Patrick Groenen Modern Multidimensional Scaling Theory and Applications With 116 Figures Springer Contents Preface vii I Fundamentals of MDS 1 1 The Four Purposes of Multidimensional Scaling
More informationToday. Today. Introduction. Matrices. Matrices. Computergrafik. Transformations & matrices Introduction Matrices
Computergrafik Matthias Zwicker Universität Bern Herbst 2008 Today Transformations & matrices Introduction Matrices Homogeneous Affine transformations Concatenating transformations Change of Common coordinate
More informationChapter 18 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.
Chapter 8 out of 7 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal 8 Matrices Definitions and Basic Operations Matrix algebra is also known
More informationNear Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri
Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions
More information