Knowledge Discovery and Data Mining 1 (VO) ( )

Similar documents
Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

BBS654 Data Mining. Pinar Duygulu

Instructor: Stefan Savev

Introduction to Data Mining

This lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

String Vector based KNN for Text Categorization

CS 124/LINGUIST 180 From Languages to Information

Keyword Extraction by KNN considering Similarity among Features

Mining Web Data. Lijun Zhang

Information Retrieval. (M&S Ch 15)

CS 124/LINGUIST 180 From Languages to Information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Vector Space Models: Theory and Applications

Encoding Words into String Vectors for Word Categorization

CS 124/LINGUIST 180 From Languages to Information

Information Retrieval

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri

Natural Language Processing

vector space retrieval many slides courtesy James Amherst

Informa(on Retrieval

COMP6237 Data Mining Searching and Ranking

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search

Mining Web Data. Lijun Zhang

Visualization and text mining of patent and non-patent data

Boolean Model. Hongning Wang

Chapter 2 Basic Structure of High-Dimensional Spaces

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Digital Libraries: Language Technologies

Knowledge Discovery and Data Mining 1 (KU)

Informa(on Retrieval

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

SOCIAL MEDIA MINING. Data Mining Essentials

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Chapter 6: Information Retrieval and Web Search. An introduction

Introduction to Information Retrieval

COMP6237 Data Mining Making Recommendations. Jonathon Hare

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"

Information Retrieval

Unsupervised Feature Selection for Sparse Data

Machine Learning using MapReduce

Data Mining Lecture 2: Recommender Systems

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

modern database systems lecture 4 : information retrieval

Information Retrieval

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval: Retrieval Models

Automatic Data Analysis in Visual Analytics Selected Methods

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

Predicting Popular Xbox games based on Search Queries of Users

Introduction to Information Retrieval

Problem 1: Complexity of Update Rules for Logistic Regression

Basic Elements. Geometry is the study of the relationships among objects in an n-dimensional space

Automatic Summarization

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Praxis Elementary Education: Mathematics Subtest (5003) Curriculum Crosswalk. Required Course Numbers. Test Content Categories.

CSAP Achievement Levels Mathematics Grade 6 March, 2006

Databases 2 (VU) ( / )

Lecture on Modeling Tools for Clustering & Regression

Text classification with Naïve Bayes. Lab 3

CS452/552; EE465/505. Geometry Transformations

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Big Mathematical Ideas and Understandings

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Document Clustering: Comparison of Similarity Measures

Introduction to Information Retrieval

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

Information Retrieval

Hierarchical Clustering

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

Natural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

The influence of social filtering in recommender systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

CS 6320 Natural Language Processing

Models for Document & Query Representation. Ziawasch Abedjan

Clustering and Visualisation of Data

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Supervised Random Walks

Recommender Systems - Content, Collaborative, Hybrid

Introduction to Data Mining

Singular Value Decomposition, and Application to Recommender Systems

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Grade Level Expectations for the Sunshine State Standards

V2: Measures and Metrics (II)

Midterm Exam Search Engines ( / ) October 20, 2015

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Exploratory Analysis: Clustering

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

International Journal of Advanced Research in Computer Science and Software Engineering

Transcription:

Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55

Big picture: KDDM Probability Theory Linear Algebra Hardware & Programming Model Information Theory Statistical Inference Mathematical Tools Infrastructure Preprocessing Transformation Knowledge Discovery Process Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 2 / 55

Outline 1 Recap 2 Data Representation 3 Data Matrix 4 Vector Space Model: Document-Term Matrix (Running Example 5 Recommender Systems: Utility Matrix (Running Example 2) Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 3 / 55

Recap Recap Recap Review of the preprocessing, feature extraction and selection phase Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 4 / 55

Recap Recap Preprocessing Initial phase of the Knowledge Discovery process... acquire the data to be analyzed e.g. by crawling the data from the Web... prepare the data e.g. by cleaning and removing outliers Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 5 / 55

Recap Recap Feature Extraction Example of features: Images colors, textures, contours,... Signals frequency, phase, samples, spectrum,... Time series ticks, trends, self-similarities,... Biomed dna sequence, genes,... Text words, POS tags, grammatical dependencies,... Features encode these properties in a way suitable for a chosen algorithm Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 6 / 55

Recap Feature Selection Recap Approach: select the sub-set of all features without redundant or irrelevant features Unsupervised, e.g. heuristics (black & white lists, score for ranking,... ) Supervised, e.g. using a training data set (calculate measure to assess how discriminative is a feature, e.g. information gain) Penalty for more features, e.g. regularization Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 7 / 55

Recap Text mining Recap Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 8 / 55

Data Representation Representing data Once when we have the features that we want: how can we represent them? Two representations: 1 Mathematical (model, analyze, and reason about data and algorithms in a formal way) 2 Representation in computers (data structures, implementation, performance) In this course we concentrate on mathematical models KDDM2 is about second representation Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 9 / 55

Data Representation Representing data Given: Preprocessed data objects as a set of features E.g. for text documents set of words, bigrams, n-grams,... Given: Feature statistics for each data object E.g. number of occurrences, magnitudes, ticks,... Find: Mathematical model for calculations E.g. similarity, distance, add, subtract, transform,... Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 10 / 55

Data Representation Representing data Let us think (geometrically) about features as dimensions (coordinates) in an m-dimensional space For two features we have 2D-space, for three features we have 3D-space, and so on For numeric features the feature statistics will be numbers coming from e.g. R Then each data object is a point in the m-dimensional space The value of a given feature is then the value for its coordinate in this m-dimensional space Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 11 / 55

Data Representation Representing data: Example Text documents Suppose we have the following documents: DocID d1 d2 d3 d4 d5 Document Chinese Chinese Chinese Chinese Tokyo Tokyo Beijing Beijing Beijing Chinese Tokyo Tokyo Tokyo Beijing Beijing Tokyo Tokyo Tokyo Tokyo Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 12 / 55

Data Representation Representing data: Example Now, we take words as features We take word occurrences as the feature values Three features: Beijing, Chinese, Tokyo Doc Feature Beijing Chinese Tokyo d1 0 2 0 d2 3 2 2 d3 2 1 3 d4 0 0 3 d5 0 0 1 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 13 / 55

Data Representation Representing data: Example Geometric representation of data d4:(0,0,3) d3:(2,1,3) 3 d5:(0,0, d1:(0,2,0) d2:(3,2,2) 2 1 z(tokyo) 0 3 0 1 x(beijing) 2 3 0 1 2 y(chinese) Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 14 / 55

Data Matrix Representing data Now, an intuitive representation of the data is a matrix In a general case Columns correspond to features, i.e. dimensions or coordinates in an m-dimensional space Rows correspond to data objects, i.e. data points An element d ij in the i-th row and the j-th column is the j-th coordinate of the i-th data point The data is represented by a matrix D R n m, where n is the number of data points and m the number of features Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 15 / 55

Data Matrix Representing data For text Columns correspond to terms, words, and so on I.e. each word is a dimension or a coordinate in an m-dimensional space Rows correspond to documents An element d ij in the i-th row and the j-th column is the e.g. number of occurrences of the j-th word in the i-th document This matrix is called document-term matrix Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 16 / 55

Data Matrix Representing data: example 0 2 0 3 2 2 D = 2 1 3 0 0 3 0 0 1 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 17 / 55

Vector interpretation Alternative and very useful interpretation of data matrices is in terms of vectors Each data point, i.e. each row in the data matrix can be interpreted as a vector in an m-dimensional space This allows us to work with: 1 the vector direction, i.e. the line connecting the origin and a given data point 2 the vector magnitude (length), i.e. the distance from the origin and a given data point feature weighting 3 similarity, distance, correlations between vectors 4 manipulate vectors, and so on. Vector space model Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 18 / 55

Vector interpretation: Example Geometric representation of data d4:(0,0,3) d3:(2,1,3) 3 d5:(0,0, d1:(0,2,0) d2:(3,2,2) 2 1 z(tokyo) 0 3 0 1 x(beijing) 2 3 0 1 2 y(chinese) Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 19 / 55

Vector space model: feature weighting What we did so far is that we counted word occurrences and used these counts as the coordinates in a given dimension This weighting scheme is referred to as term frequency (TF) Three features: Beijing, Chinese, Tokyo Doc Feature Beijing Chinese Tokyo d1 0 2 0 d2 3 2 2 d3 2 1 3 d4 0 0 3 d5 0 0 1 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 20 / 55

Vector space model: TF Please note the difference between the vector space model with TF weighting and e.g. Standard boolean model in information retrieval Standard boolean model uses binary feature values If the feature j occurs in the document i (regardless how many times) then d ij = 1 Otherwise d ij = 0 Allows very simple manipulation of the model, but does not capture the relative importance of the feature j for the document i If a feature occurs more frequently in a document then it is more important for that document and TF captures this intuition Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 21 / 55

Vector space model: feature weighting However, raw TF has a critical problem: all terms are considered equally important for assessing relevance In fact, certain terms have no discriminative power at all considering relevance For example, a collection of documents on the auto industry will have most probably term auto in every document We need to penalize terms which occur too often in the complete collection We start with the notion of document frequency (DF) of a term This is the number of documents that contain a given term Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 22 / 55

Vector space model: IDF If a term is discriminative then it will only appear in a small number of documents I.e. its DF will be low, or its inverse document frequency (IDF) will be high In practice we typically calculate IDF as: IDF j = log( N DF j ), where N is the number of documents and DF j is the document frequency of the term j. Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 23 / 55

Vector space model: TFxIDF Finally, we combine TF and IDF to produce a composite weight for a given feature TFxIDF ij = TF ij log( N DF j ) Now, this quantity will be high if the feature j occurs only a couple of times in the whole collection, and most of these occurrences are in the document i Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 24 / 55

TFxIDF: Example Text documents Suppose we have the following documents: DocID d1 d2 d3 d4 d5 Document Chinese Chinese Chinese Chinese Tokyo Tokyo Beijing Beijing Beijing Chinese Tokyo Tokyo Tokyo Beijing Beijing Tokyo Tokyo Tokyo Tokyo Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 25 / 55

TFxIDF: Example Three features: Beijing, Chinese, Tokyo TF: Doc Feature Beijing Chinese Tokyo d1 0 2 0 d2 3 2 2 d3 2 1 3 d4 0 0 3 d5 0 0 1 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 26 / 55

TFxIDF: Example Three features: Beijing, Chinese, Tokyo DF: DF Beijing = 2 DF Chinese = 3 DF Tokyo = 4 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 27 / 55

TFxIDF: Example Three features: Beijing, Chinese, Tokyo IDF: IDF j = log( N DF j ) IDF Beijing = 0.3979 IDF Chinese = 0.2218 IDF Tokyo = 0.0969 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 28 / 55

TFxIDF: Example Three features: Beijing, Chinese, Tokyo TFxIDF: TFxIDF ij = TF ij log( N DF j ) Doc Feature Beijing Chinese Tokyo d1 0 0.4436 0 d2 1.1937 0.4436 0.1938 d3 0.7958 0.2218 0.2907 d4 0 0 0.2907 d5 0 0 0.0969 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 29 / 55

TFxIDF: Example Geometric representation of data 1.0 d4 d5 d1 d3 d2 0.8 0.6 0.4 0.2 0.0 z(tokyo) 0.0 0.2 0.4 0.6 x(beijing) 0.8 1.0 0.0 0.20.4 0.6 0.81.0 y(chinese) Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 30 / 55

TFxIDF: Example Geometric representation of data d4:(0,0,3) d3:(2,1,3) 3 d5:(0,0, d1:(0,2,0) d2:(3,2,2) 2 1 z(tokyo) 0 3 0 1 x(beijing) 2 3 0 1 2 y(chinese) Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 31 / 55

Vector space model: TFxIDF Further TF variants are possible Sub-linear TF scaling, e.g. TF or 1 + log(tf ) Unlikely that e.g. 20 occurrences of a single term carry 20 times the weight of a single occurrence Also very often TF-normalization by e.g. dividing by the maximal TF in the collection Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 32 / 55

Vector space model: vector manipulation Vector addition: concatenate two documents Vector subtraction: diff two documents Transform the vector space in a new vector space: projection and feature selection Similarity calculations Distance calculations Scaling for visualization Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 33 / 55

Vector space model: similarity and distance Similarity and distance allow us to order documents This is useful in many applications: 1 Relevance ranking (information retrieval) 2 Classification 3 Clustering 4 Transformation and projection Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 34 / 55

Properties of similarity and distance Definition Let D be the set of all data objects, e.g. all documents. Similarity and distance are functions sim : DxD R, dist : DxD R with the following properties: Symmetry Self-similarity (self-distance) sim(d i, d j ) = sim(d j, d i ) dist(d i, d j ) = dist(d j, d i ) sim(d i, d j ) sim(d i, d i ) = sim(d j, d j ) dist(d i, d j ) dist(d i, d i ) = dist(d j, d j ) = 0 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 35 / 55

Properties of similarity and distance Additional properties Some additional properties of the sim function: Normalization of similarity to the interval [0, 1] sim(d i, d j ) 0 sim(d i, d i ) = 1 Normalization of similarity to the interval [ 1, 1] sim(d i, d j ) 1 sim(d i, d i ) = 1 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 36 / 55

Properties of similarity and distance Additional properties Some additional properties of the dist functions: Triangle inequality dist(d i, d j ) dist(d i, d k ) + dist(d k, d j ) If triangle inequality is satisfied then dist function defines a norm on the vector space Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 37 / 55

Euclidean distance Norm We can apply the Euclidean or l 2 norm as a distance metric in the vector space model. x 2 = n i=1 x 2 i x 2 = x T x Euclidean distance of two vectors d i and d j is the length of their displacement vector x = d i d j Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 38 / 55

Euclidean distance: Example Geometric representation of data d4:(0,0,3) d3:(2,1,3) 3 d5:(0,0, d1:(0,2,0) d2:(3,2,2) 2 1 z(tokyo) 0 3 0 1 x(beijing) 2 3 0 1 2 y(chinese) Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 39 / 55

Euclidean distance d T 1 d T n d T D = 2. Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 40 / 55

Euclidean distance 0 2 0 3 2 2 D = 2 1 3 0 0 3 0 0 1 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 41 / 55

Euclidean distance 0. 3.60555128 3.74165739 3.60555128 2.23606798 3.60555128 0. 1.73205081 3.74165739 3.74165739 dist = 3.74165739 1.73205081 0. 2.23606798 3. 3.60555128 3.74165739 2.23606798 0. 2. 2.23606798 3.74165739 3. 2. 0. Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 42 / 55

Cosine similarity Angle between vectors We can think of the angle ϕ between two vectors d i and d j as a measure of their similarity. Smaller angles mean that the vectors point in two directions that are close to each other Thus, smaller angles mean that the vectors are more similar Larger angles imply smaller similarity To obtain a numerical value from the interval [0, 1] we can calculate cosϕ Cosine of an angle can be also negative! Why do we always get the values from [0, 1] Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 43 / 55

Cosine similarity: Example Geometric representation of data d4:(0,0,3) d3:(2,1,3) 3 d5:(0,0, d1:(0,2,0) ϕ 2,3 d2:(3,2,2) 2 1 z(tokyo) ϕ 1,5 0 3 0 1 x(beijing) 2 3 0 1 2 y(chinese) Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 44 / 55

Cosine similarity We can calculate the cosine similarity we make use of the dot product and Euclidean norm: sim(d i, d j ) = d T i d j d i 2 d j 2 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 45 / 55

Cosine similarity 0 2 0 3 2 2 D = 2 1 3 0 0 3 0 0 1 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 46 / 55

Cosine similarity 1. 0.48507125 0.26726124 0. 0. 0.48507125 1. 0.90748521 0.48507125 0.48507125 sim = 0.26726124 0.90748521 1. 0.80178373 0.80178373 0. 0.48507125 0.80178373 1. 1. 0. 0.48507125 0.80178373 1. 1. Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 47 / 55

Recommender Systems: Utility Matrix (Running Example 2) Representing data as matrices There are many other sources of the data that can be represented as large matrices We use also matrices to represent the social networks, etc. Measurement data, and much more In recommender systems, we represent user ratings of items as a utility matrix Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 48 / 55

Recommender Systems: Utility Matrix (Running Example 2) Recommender systems Recommender systems predict user responses to options E.g. recommend news articles based on prediction of user interests E.g. recommend products based on the predictions of what user might like Recommender systems are necessary whenever users interact with huge catalogs of items E.g. thousands, even millions of products, movies, music, and so on. Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 49 / 55

Recommender Systems: Utility Matrix (Running Example 2) Recommender systems These huge numbers of items arise because online you can interact also with the long tail items These are not available in retail because of e.g. limited shelf space Recommender systems try to support this interaction by suggesting certain items to the user The suggestions or recommendations are based on what the systems know about the user and about the items Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 50 / 55

Recommender Systems: Utility Matrix (Running Example 2) Formal model: the utility matrix In a recommender system there are two classes of entities: users and items Utility Function Let us denote the set of all users with U and the set of all items with I. We define the utility function u : UxI R, where R is a set of ratings and is a totally ordered set. For example, R = {1, 2, 3, 4, 5} set of star ratings, or R = [0, 1] set of real numbers from that interval. Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 51 / 55

Recommender Systems: Utility Matrix (Running Example 2) The Utility Matrix The utility function maps pairs of users and items to numbers These numbers can be represented by a utility matrix M R n m, where n is the number of users and m the number of items The matrix gives a value for each user-item pair where we know about the preference of that user for that item E.g. the values can come from an ordered set (1 to 5) and represent a rating that a user gave for an item Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 52 / 55

Recommender Systems: Utility Matrix (Running Example 2) The Utility Matrix We assume that the matrix is sparse This means that most entries are unknown The majority of the user preferences for specific items is unknown An unknown rating means that we do not have explicit information It does not mean that the rating is low Formally: the goal of a recommender system is to predict the blank in the utility matrix Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 53 / 55

Recommender Systems: Utility Matrix (Running Example 2) The Utility Matrix: example User Movie HP1 HP2 HP3 Hobbit SW1 SW2 SW3 A 4 5 1 B 5 5 4 C 2 4 5 D 3 3 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 54 / 55

Recommender Systems: Utility Matrix (Running Example 2) Recommender systems: short summary Three key problems in the recommender systems: Collecting data for the utility matrix 1 Explicit by e.g. collecting ratings 2 Implicit by e.g. interactions (users buy a product implies a high rating) Predicting missing values in the utility matrix 1 Problems are sparsity, cold start, and so on. 2 Content-based, collaborative filtering, matrix factorization Evaluating predictions 1 Training dataset, test dataset, error Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 55 / 55