Vector Space Models: Theory and Applications

Vector Space Models: Theory and Applications Alexander Panchenko Centre de traitement automatique du langage (CENTAL) Université catholique de Louvain FLTR 2620 Introduction au traitement automatique du langage 8 December 2010 FLTR2620 - Vector-Space Models 1/55

Plan 1 Vector Algebra Basics 2 Vector Space Model 3 Applications of the Vector Space Models 4 References and Further Reading FLTR2620 - Vector-Space Models 2/55

Vector Space Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR2620 - Vector-Space Models 3/55

Vector Space Vector Space Vector Space Set of elements x 1, x 2, x 3,... called vector space L if this set is closed under vector addition and scalar multiplication operations. Elements of this set called vectors. The following conditions must hold for x 1, x 2, x 3 L and α, β: 1 Commutativity x 1 + x 2 = x 2 + x 1. 2 Associativity of vector addition: (x 1 + x 2 ) + x 3 = x 1 + (x 2 + x 3 ). 3 Additive identity: For all x, 0 + x = x + 0 = x. 4 Existence of additive inverse: For any x, there exists a x such that x + ( x) = 0. 5 Associativity of scalar multiplication: α(βx) = (αβ)x. 6 Distributivity of scalar sums: (α + β)x = αx + βx. 7 Distributivity of vector sums: α(x 1 + x 2 ) = αx 1 + αx 2. 8 Scalar multiplication identity: 1x = x. FLTR2620 - Vector-Space Models 4/55

Euclidean Space Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR2620 - Vector-Space Models 5/55

Euclidean Space Euclidean Space Euclidean Space Euclidean n-dimensional space R n is a vector space, where (1) scalars are real numbers, (2) every element is represented by a tuple of real numbers, (3) addition is componentwise, and (4) scalar multiplication is multiplication on each term separately. A scalar α is an element of the field of real numbers R: α R, for example α = 3.14, β = 5.25, γ = 1.45. FLTR2620 - Vector-Space Models 6/55

Euclidean Space Euclidean Space: Vectors A vector x is n-tuple of real numbers, an element of n-dimensional Euclidean space R n : x n 1 {}}{ x = x 2 R n = R R... R, x 3 for example 3.14 3.14 x 1 = 5.25 R 3 5.25, x 2 = 1.45 1.45 5.33 R5. 6.44 FLTR2620 - Vector-Space Models 7/55

Euclidean Space Euclidean Space: Column and Row Vectors By default the vectors are column vectors: x 1 x = x 2 x 3 The transpose of a column vector is a row vector: x 1 x T = x 2 x 3 T = (x 1, x 2, x 3 ). FLTR2620 - Vector-Space Models 8/55

Euclidean Space Euclidean Space: Vector Addition, Scalar Multiplication Vector addition is componentwise for example x 1 + x 2 = (x 11 + x 21, x 12 + x 22,..., x 1n + x 2n ) T, x 1 = (3.14, 5.25, 1.45) T, x 2 = (1.45, 5.25, 3.14) T. x 1 + x 2 = (4.59, 10.50, 4.59) T. Multiplication of a vector x by a scalar α: αx = (αx 1, αx 2,..., αx n ) T, for example α = 2, x = (3.14, 5.25, 1.45) T, αx = (6.28, 10.50, 2.90) T. FLTR2620 - Vector-Space Models 9/55

Euclidean Space Geometrical Interpretation FLTR2620 - Vector-Space Models 10/55

Euclidean Space Euclidean Space: Dot Product, Vector Norm Dot (inner) product of two vectors for example x 1 x 2 = x 11 x 21 + x 12 x 22 +... + x 1n x 2n = n x 1i x 2i, i=1 x 1 = (3.14, 5.25, 1.45) T, x 2 = (1.45, 5.25, 3.14). x 1 x 2 = 4.55 + 27.56 + 4.55 = 36.66 Euclidean norm of a vector x = x x = n xi 2, for example x 1 = 3.14 2 + 5.25 2 + 1.45 2 = 9.85 + 27.56 + 2.10 = 6.28 FLTR2620 - Vector-Space Models 11/55 i=1

Euclidean Space Euclidean Space: Cosine Cosine between two vectors for example cos(x 1, x 2 ) = x 1 x 2 x 1 x 2 x 1 = (3.14, 5.25, 1.45) T, x 2 = (0, 0, 1), cos(x 1, x 2 ) = 0 + 0 + 1.45 = 0.23( 77 ) 6.28 1 The cosine is defined in terms of vector norm, and inner product. Therefore, every linear space with inner product defines cosine between vectors. FLTR2620 - Vector-Space Models 12/55

Euclidean Space Geometrical Interpretation Length of the vector a is its norm a. Length of the projection of the vector a to the vector i equals: a x = a cos(a, i) = a i i = a i. FLTR2620 - Vector-Space Models 13/55

Vector Space Basis Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR2620 - Vector-Space Models 14/55

Vector Space Basis Linear Independence Linear Combination Linear combination of k vectors is an expression as following: α 1 x 1 + α 2 x 2 +... + α k x k, where α 1, α 2,..., α k R are scalars. Linearly Dependent and Independent Vectors Vectors x 1, x 2,...x k are linearly dependent iff there exist scalars α 1, α 2,..., α k, not all zero, such that α 1 x 1 + α 2 x 2 +... + α k x k = 0 If no such scalars exist, then the vectors are said to be linearly independent. FLTR2620 - Vector-Space Models 15/55

Vector Space Basis Basis Basis A basis of a vector space L is a subset b 1, b 2,..., b n of vectors in L such that all basis vectors are linearly independent and if every vector x L can be represented as a linear combination of basis vectors: For all x L exist α 1, α 1,..., α n R such that Uniqueness of representation x = α 1 b 1 + α 2 b 2 +...α n b n. A vector x L can be represented only in a one way with help of a basis of this vector space. FLTR2620 - Vector-Space Models 16/55

Vector Space Basis Standard Basis Standard Basis The standard basis for a Euclidean space consists of one unit vector pointing in the direction of each axis of the Cartesian coordinate system. The standard basis for the three-dimensional Euclidean space R 3 are three following orthogonal vectors of unit length: i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1). The standard basis for the n-dimensional Euclidean space R n is set of the following vectors: b 1 = (1, 0, 0, 0,..., 0) b 2 = (0, 1, 0, 0,..., 0)... b n = (0, 0, 0, 0,..., 1). FLTR2620 - Vector-Space Models 17/55

Matrices Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR2620 - Vector-Space Models 18/55

Matrices Matrix A m n matrix X is a rectangular array of scalars x ij R. x 11 x 12... x 1n X =.... R m n x m1 x m2... x mn for example 1.12 0.55 0.58 0.23 X = 5.52 0.03 1.96 0.03 R 3 4. 0.37 0.78 2.02 0.03 A matrix with m rows and n columns X can be represented as a set of m row vectors or as a set of n column vectors: X = (x 1, x 2,..., x m ) T, X = (x 1, x 2,..., x n ). FLTR2620 - Vector-Space Models 19/55

Matrices Matrix Operations Matrix addition C = A + B is elementwise c ij = a ij + b ij. Matrix multiplication by a scalar C = αa is multiplication on each element separately c ij = αa ij. Matrix Euclidean norm equals n A = Transpose of the matrix A T is the matrix obtained by exchanging A s rows and columns: a ij = a ji. i=1 FLTR2620 - Vector-Space Models 20/55 n j=1 a 2 ij

Matrices Matrix Product: Coordinate Form a 11 a 12... a 1n b 11 b 12... b 1k A =...., B =............. a m1 a m2... a mn b n1 b n2... b nk The product C = AB of two matrices A and B is defined as following: c ij = n a il b lj = a i b j. l=1 Matrix multiplication is defined only if the dimensions of the matrices A, and B are compatible: C A B {}}{{}}{{}}{ [m k] = [m n] [n k]. FLTR2620 - Vector-Space Models 21/55

Matrices Matrix Product: Vector Form The Row by Column Method Represent A as a set of m row vectors, and B as a set of k column vectors. Then if C = AB, element c ij of C is the inner product of the i-th row of A and the j-th column of B: c ij = a i b j, i = 1, m, j = 1, k. a 11 a 12... a 1n a 1 A =............ =., a m1 a m2... a mn a m b 11 b 12... b 1k B =............ = ( ) b 1, b 2,..., b k. b n1 b n2... b nk FLTR2620 - Vector-Space Models 22/55

Matrices Matrix Multiplication: Vector Form FLTR2620 - Vector-Space Models 23/55

Matrices Matrix Product: Example 2 4 6 4 1 For example, let A = 5 7 1 and B = 0 2. 2 3 5 5 1 The dimensions of the matrices agree matrix multiplication is defined: C A B {}}{{}}{{}}{ [3 2] = [3 3] [3 2]. The matrix multiplication equals (2 4 + 4 0 + 6 5) (2 1 + 4 2 + 6 1) 38 16 C = AB = (5 4 + 7 0 + 1 5) (5 1 + 7 2 + 1 1) = 25 20 (2 4 + 3 0 + 5 5) (2 1 + 3 2 + 5 1) 18 12 FLTR2620 - Vector-Space Models 24/55

Matrices Properties of Matrix Product Matrix multiplication is associative: A(BC) = (AB)C. Matrix multiplication is distributive over matrix addition: A(B + C) = AB + AC. Matrix product is compatible with scalar multiplication: α(ab) = (αa)b = A(αB). Matrix multiplication is NOT commutative: AB BA FLTR2620 - Vector-Space Models 25/55

Matrices Matrix Factorization Singular Value Decomposition is a factorization of a rectangular m n matrix A such that A = UDV T, where U is a m m matrix, and V is a n n matrix. These matrices are composed of orthogonal column vectors U T U = I, V T V = I. The m n matrix D has nonegative real numbers long the diagonal called singular values. FLTR2620 - Vector-Space Models 26/55

Definition Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR2620 - Vector-Space Models 27/55

Definition Main Characteristics of the Vector Space Model Vector Space Model (VSM) calculates similarity between m homogeneous objects O = {o 1, o 2,..., o m }. The model represents an object o as a vector (point) x in a n-dimensional Euclidean space R n. Every dimension of the vector space corresponds to a feature of an object. Set of all object are represented with a feature matrix X x 1 x 11 x 12... x 1n x 2 X =. = x 21 x 22... x 2n..... x m x m1 x m2... x mn The similarity between objects is modeled in terms of spatial distance between vectors (points). FLTR2620 - Vector-Space Models 28/55

Vector space model FLTR2620sometimes - Vector-Space Models called29/55 semantic space model in the Definition Vector Space Model Vector-Space Model Formally, Vector Space Model can be represented as a quadruple A, B, S, M, where B is a set b 1,..., b n of basis elements that determine the dimensionality of the space and the interpretation of each dimension. A specifies the weighting function A : R n R n. It takes as input a vector x representing an object o, and returns its normalized version. S is a similarity function S : R n 2 [0; 1] that maps pairs of vectors onto a scalar that represents measure of their similarity. M is a transformation that takes one vector space L and maps it onto another vector space L, in order to reduce dimensionality.

Basis Elements Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR2620 - Vector-Space Models 30/55

Basis Elements Interpretation: Basis Elements and Objects Basis elements b 1,..., b n define the interpretation of each dimension, or to the standard basis vectors b 1,..., b n. Type of objects defines the interpretation for each vector, represented by a VSM. The bag-of-words (BOW) is a vector space model, where objects are text documents, and basis elements are words of these text documents: Here b 1 = car, b 2 = auto, b 3 = insurance, b 4 = best, and o 1 = Doc1, o 2 = Doc2, o 3 = Doc3. FLTR2620 - Vector-Space Models 31/55

Basis Elements Interpretation: Feature Matrix Basis elements (features) can be also lemmas, multi-word expressions, named entities, documents, syntactic dependencies, morphemes, etc. Term-Document matrix: objects are documents, features are words of the document. Problem: information retrieval, text categorization and clustering. Term-Term matrix: objects are terms, features are context words / words from a dictionary definition. Problem: computational lexical semantics, distributional analysis. Term Senses-Terms matrix: objects are word senses, features are words. Problem: word sense disambiguation. Term-Syntactic Dependencies matrix: objects are terms, features are syntactic dependencies of a term. Problem: computational lexical semantics.... FLTR2620 - Vector-Space Models 32/55

Weighting Function Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR2620 - Vector-Space Models 33/55

Weighting Function Weighting Function Weighting Function Weighting function A : R n R n takes as input a vector x, representing an object o, and returns its normalized version. Weighting is used to adapt the feature value according to its actual importance. Identity function (trivial): A(x) = x. Logarithmic weighting function: A(x ij ) = 1 + log(x ij ), x ij > 0. Length-normalization with Euclidean norm: A(x) = x x. Convert to probability distribution: x ij A(x ij ) = p(i, j) = n j=1 x ij = x ij x i l. FLTR2620 - Vector-Space Models 34/55

Weighting Function Weighting Function Entropy weighting: ( A(x ij ) = x ij + 1 + n k=1 Pointwise Mutual Information: ) p ik log(p ik ), p ik = x ik log(n) n l=1 x. il p(i, j) A(x ij ) = log p(i)p(j). TF-IDF (Term Frequency - Inversed Document Frequency):... TF IDF {}}{{}}{ x ij m A(x ij ) = n k=1 x log ik {x lj > 0, l = 1, m} FLTR2620 - Vector-Space Models 35/55

Weighting Function Weighting Function: Example Consider the following term-document matrix X, where x ij is term frequency: Let us normalize it with the Euclidean norm: x Doc1 = x Doc1 x Doc1 = (27,3,0,14)T 27 2 +3 2 +0 2 +14 2 = (27,3,0,14)T 30.56 = (0.88, 0.10, 0, 0.46) T. Finally, we obtain the normalized term-document matrix: FLTR2620 - Vector-Space Models 36/55

Similarity Function Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR2620 - Vector-Space Models 37/55

Similarity Function Similarity Function Similarity Function A similarity function S(x, y) defines a measure of similarity of two vectors x, y R n. It should follow the following properties for any vectors x, y: Non-negativity: S(x, y) 0. Maximality: S(x, x) S(x, y). Symmetry : S(x, y) = S(y, x). FLTR2620 - Vector-Space Models 38/55

Similarity Function Distance Function Distance Function A distance (dissimilarity) function D(x, y) defines distance between two vectors x, y R n. It should follow the following properties for any vectors x, y, z: Non-negativity D(x, y) 0. Identity of indiscernibles D(x, y) = 0 iff x = y. Symmetry D(x, y) = D(y, x). Triangle inequality: D(x, z) D(x, y) + D(y, z). FLTR2620 - Vector-Space Models 39/55

Similarity Function Converting Distance to Similarity A distance measure between two vectors x, y R n can be converted to a similarity measure between them as following: S(x, y) = 1 D(x, y), if S(x, y) [0; 1] S(x, y) = 1 2D(x, y), if S(x, y) [ 1; +1] FLTR2620 - Vector-Space Models 40/55

Similarity Function Some Similarity and Distance Functions Minkowski distance (L q distance): D(x, y) = n q (x i y i ) q. i=1 Euclidean distance (L 2 distance): D(x, y) = n (x i y i ) 2 = x y. i=1 Manhattan or city block distance (L 1 distance): n D(x, y) = x i y i. i=1 FLTR2620 - Vector-Space Models 41/55

Similarity Function Some Similarity and Distance Functions Jaccard similarity: S(x, y) = n i=1 min(x i, y i ) n i=1 max(x i, y i ). Dice similarity: S(x, y) = 2 n i=1 min(x i, y i ) n i=1 (x. i, y i ) Cosine similarity: S(x, y) = x y x y. FLTR2620 - Vector-Space Models 42/55

Transformation Plan 1 Vector Algebra Basics Vector Space Euclidean Space Vector Space Basis Matrices 2 Vector Space Model Definition Basis Elements Weighting Function Similarity Function Transformation 3 Applications of the Vector Space Models 4 References and Further Reading FLTR2620 - Vector-Space Models 43/55

Transformation Transformation: Dimensionality Reduction Transformation M is a transformation that takes a vector space L and maps it onto another vector space L, in order to reduce dimensionality, so that dim(l) dim( L). The goal of a dimensionality reduction is to find a smaller number of uncorrelated or lowly correlated dimensions. Reasons for dimensionality reduction: The VSM assumes independence of dimensions. In practice, some dimensions are linear combinations of other dimensions: synonyms, various spellings, etc. High computational complexity in the high-dimensional space. Can help discover latent structure in the data. FLTR2620 - Vector-Space Models 44/55

Transformation Transformation: Dimensionality Reduction Simple dimensionality reduction can be done on the preprocessing stage: stop words, rare dimensions, etc. In addition, feature matrix factorization methods can be used for dimensionality reduction: Truncated Singular Value Decomposition (SVD) Non-Negative Matrix Factorization (NMF)... FLTR2620 - Vector-Space Models 45/55

Transformation Truncated Singular Value Decomposition FLTR2620 - Vector-Space Models 46/55

Various applications of the Vector Space Models 1 Information Retrieval 2 Computational Lexical Semantics 3 Word Sense Disambiguation 4 Other Applications FLTR2620 - Vector-Space Models 47/55

Information Retrieval Problem Formulation Given a user query q find the k most relevant documents {d 1,..., d k } from collection of n documents {d 1,..., d n }. A TF-IDF B Terms from all documents O Documents S Cosine similarity M Truncated SVD (Latent Semantic Indexing) Documents are represented as vectors in the bag-of-word space. User text query is represented as a vector in the same space as the documents. FLTR2620 - Vector-Space Models 48/55

Information Retrieval Let search query be q = car, then it will be represented as the following vector: q = (1, 0, 0, 0). FLTR2620 - Vector-Space Models 49/55

Computational Lexical Semantics Problem Formulation Given a term t find the k most semantically similar terms {t 1,..., t k } from the vocabulary of n terms {t 1,..., t n }. A Pointwise Mutual Information B Words / Terms / Syntactic Contexts O Terms S Cosine similarity / Kullback-Leibler divergence M Truncated SVD (Latent Semantic Analysis)/ Non-Negative Matrix Factorization Distributional hypothesis of Harris: terms are semantically similar if they appear within similar context windows. FLTR2620 - Vector-Space Models 50/55

Computational Lexical Semantics FLTR2620 - Vector-Space Models 51/55

Word Sense Disambiguation Problem Formulation Given a word occurrence w find its sense from the k possible senses {s 1,..., s k }. A Identity function / Length-normalization B Words / Terms O Term Senses S Inner Product (simplified Lesk) M No Term senses are represented as vectors in the BOW of the dictionary definitions. Term is represented as a vector in the same space as term senses. FLTR2620 - Vector-Space Models 52/55

Some Other Applications Named Entity Disambiguation Text Documents Clustering Text Documents Categorization Collaborative Recommendations... FLTR2620 - Vector-Space Models 53/55

References I Berry, M. W. and Browne, M. (2005). Understanding Search Engines: Mathematical Modeling and Text Retrieval (Software, Environments, Tools), Second Edition. SIAM, Society for Industrial and Applied Mathematics. Berry, M. W., Drmac, Z., and Jessup, E. R. (1999). Matrices, vector spaces, and information retrieval. SIAM Rev., 41:335 362. Lowe, W. Towards a theory of semantic space. Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, 1 edition. Van de Cruys, T. (2010). Mining for Meaning. The Extraction of Lexicosemantic Knowledge from Text. FLTR2620 - Vector-Space Models 54/55

Acknowledgments Some illustrations for this presentation were borrowed from [Manning et al., 2008], [Van de Cruys, 2010], and Wikipedia. I would like to thank the authors of these figures. FLTR2620 - Vector-Space Models 55/55