# Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci

2 Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules ( traditional data-mining) Classification and regression learning Memory-based CF vs. Model-Based CF Reliability of the user-to-user similarity Evaluating the importance of each rating in user-touser similarity Computational complexity of collaborative filtering. 2

3 Example of Evaluation of a Collaborative Filtering Recommender System Movie data: 3500 users, 3000 movies, random selection of 100,000 ratings obtained a matrix of 943 users and 1682 movies n Sparsity 1 100,000 /(943*1682) n On average there are / ratings per user E-Commerce data: 6,502 customers, 23,554 products and 97,045 purchase records n Sparsity n On average 14.9 ratings per user Sparsity is the proportion of missing ratings over all the possible ratings n #missing-ratings/#all-possible-ratings. All the possible ratings [Sarwar et al., 2000] 3

4 Sparsity: visual example A set of known ratings 94% sparsity 99,9% sparsity 4

5 Evaluation Procedure They evaluate top-n recommendation (10 recommendations for each user) Separate ratings in training and test sets (80% Train - 20% Test) Use the training to compute the prediction (top-n) Compare (precision and recall) the items in the test set of a user with the top N recommendations for that user Hit set is the intersection of the top N with the test set (selected-relevant) Precision size of the hit set / size of the top-n set Recall size of the hit set / size of the test set they assume that all the items not rated are not relevant They used the cosine metric in the CF prediction method. 5

6 Generation of recommendations Instead of using the weighted average of the ratings * v v + K u ( v v ) ij i? v kj ik kj k They used the most-frequent item recommendation method n Looks in the neighbors (users similar to the target user) scanning the purchase data n Compute the user frequency of the products in the neighbors purchases - not already in the (training part of the) profile of the target user n Sort the products according to the frequency n Returns the N most frequent products. 6

7 Frequency of product in neighbors purchases data (training) Prediction for this user? 0 Top-4 recommended P2/4 R2/3 train test Assume that all the depicted users are neighbors of the first one 7

8 Neighbor Sensitivity Why F1 is so small? What is the maximum average recall? EC ecommerce data; ML MovieLens data Splitting the entire data set into 80% train and 20% test Top 10 recommendations. 8

9 Neighbor Size Reducing the neighbor size is important for performance considerations (why?) If the neighbor size is too large then we are using in the prediction users that are not very similar to the target user hence accuracy should decrease Selection can be made with a threshold similarity: the drawback is that as the number of ratings increases we may have too many neighbors Selection can be made with a fixed k accuracy for users with more unique preferences will be lower Advanced techniques use adaptive neighbor formation algorithm the size depends on the global data characteristics and the user and item specific ratings. Remember the discussion on knn optimality 9

10 Users with 2 ratings 4 nearest neighbors Movie2's rating 0 nearest neighbors 4 nearest neighbors Movie1's rating Fixed similarity threshold in two different points (users) may mean completely different neighbors 10

11 Comparison with Association Rules When we do not have enough data Lo and Hi means low (20) and original dimensionality personalization is not useful all these for the products methods tend dimension to perform achieved similarly with LSI (Latent Semantic Indexing) 11

12 Comparison with Association Rules Lo and Hi means low (20) and original dimensionality for the products dimension achieved with LSI (Latent Semantic Indexing) 12

13 Association Rules Discovering association between sets of co-purchased products the presence of a set of products implies the presence of others {p 1,, p m } P are products, and a transaction T is a subset of products P Association rule: X à Y X,Y are not overlapping subsets of P The meaning of X à Y is that in a collection of transactions T j (j1, N), if X are present it is likely that Y are also present We may generate a transaction for each user in the CF system: contains the products that have been rated by the user. 13

14 Transactions and Ratings 0 Not Bought 1 Like or Dislike What is the rationale? Current User st item 14 th item UsersTransactions Items In [Sarwar et al., 2000] a transaction is made of all the products bought/rated by a user - not exploiting the rating values. 14

15 Very Basic Probability Concepts Probability space (Ω,F,P): Ω is the set of outcomes (states of nature), F is a set of subsets of Ω (Events), and P is a probability measure Given a probability space (Ω,F,P) and two events A and B with P(B) > 0, the conditional probability of A given B is defined by n P(A B) P(A B) / P(B) If P(B) 0 then is undefined Two events A and B are statistically independent if P(A B) P(A)*P(B) n Therefore in this case: P(A B) P(A), P(B A) P(B). 15

16 Support and Confidence Support is a measure of the frequency that a pattern (X and Y) is observed in the transactions n S(X à Y) (# transactions containing X and Y) / (# transactions) Confidence is a measure of the strength of the rule, i.e., the probability that a transaction that contains X will also contain Y n C(X à Y) (# transactions containing X and Y) / (# transactions containing X) Confidence is the conditional probability that Y can be observed in a transaction given the fact that we observed X C(X à Y) S(X à Y) / S(X). 16

17 Example Support proportion of transactions that contains X and Y 3/6 Confidence proportion of those that contains X and Y over those containing X3/4 UsersTransactions Items X Y 17

18 Example Beverage Meat Vegetables Milk Cafe Coke Porky Potato No No Coke Pork Potato Yes No Beer Pork Potato No Yes Wine Beef Potato No Yes Wine Chicken Tomato No Yes Wine Chicken Tomato Yes No Beer Chicken Tomato Yes Yes Coke Beef Potato No No Coke Chicken Tomato No Yes Wine Beef Tomato No Yes Coke Beef Tomato Yes Yes Beer Beef Potato Yes Yes Beer Pork Tomato No Yes wine Beef Potato Yes No 18

19 Item sets (some of them) One-item sets Two-item sets Three-item sets Four-item sets beveragecoke (5) beveragebeer (4) beveragewine(5) meatchicken (4) meatbeef(6) meatpork (4) Beveragecoke Meatbeef (2) Beveragecoke Meatpork (2) Beveragecoke Vegetabletomato (2) Beveragecoke Vegetablepotato (3) Beveragecoke milkyes (2) beveragecoke milkno (3) beveragecoke meatpork vegetablespotato (2) beveragecoke meatpork cafeno (2) beveragecoke vegetablestomato cafeyes (2) beveragecoke vegetablespotato milkno (2) vegetablestomato milkno cafeyes (4) beveragecoke vegetablespotato cafeno (3) beveragecoke meatpork vegetablespotato cafeno (2) beveragecoke vegetablespotato milkno cafeno (2) beverageovercast meatpork milkno cafeno (2) beveragerainy meatbeef milkno cafeyes (2) beveragerainy vegetablestomato milkno cafeyes (2) meatcool vegetablestomato milkno cafeyes (2) 19

20 Rules vegetablestomato, milkno, cafeyes (4) : leads to seven potential rules (why 7?) n If vegetablestomato and milkno à caféyes (4/4) n If vegetablestomato and cafeyes à milkno (4/6) n If milkno and caféyes à vegetablestomato (4/6) n If vegetablestomato à milkno and caféyes (4/7) n If milkno à vegetablestomato and caféyes (4/8) n If caféyes à vegetablestomato and milkno (4/9) n If à vegetablestomato and milkno and caféyes (4/14) The number in bracket is the confidence (the proportion of the transactions for which the rule is correct) n E.g.: (milkno) is found 8 times but only in 4 cases the consequence (vegetablestomato and caféyes) is true. The support of these rules is 4/14 (# transactions containing X and Y) / (# transactions) 20

21 Association Rules and Recommender Long term user profile: a transaction for each user containing all the products both in the past Short term user profile: a transaction for each bundle of products bought during a shopping experience (e.g. a travel) 1. Build a set of association rule (e.g. with the "Apriori" algorithm) with at least a minimum confidence and support (using all the users or only the users close to the target) l [Sarwar et al., 2000] used all the users-transactions to build the association rules 2. Find the rules R supported by a user profile (i.e. X is in the user profile) 3. Rank a product in the right-hand-side of some rules in R with the (maximal) confidence of the rules that predict it 4. Select the top-n [Sarwar et al., 2000] 21

22 Classification Learning outlook temperature humidity windy Play/CLASS sunny FALSE no sunny TRUE no overcast FALSE yes rainy FALSE yes rainy FALSE yes rainy TRUE no overcast TRUE yes sunny FALSE no sunny FALSE yes rainy FALSE yes sunny TRUE yes overcast TRUE yes overcast FALSE yes rainy TRUE? Set of classified examples Given a set of examples for which we know the class predict the class for an unclassified examples. 22

23 Regression Learning outlook temperature humidity windy Play duration sunny FALSE 5 sunny TRUE 0 overcast FALSE 55 rainy FALSE 40 rainy FALSE 65 rainy TRUE 45 overcast TRUE 60 sunny FALSE 0 sunny FALSE 70 rainy FALSE 45 sunny TRUE 50 overcast TRUE 55 overcast FALSE 75 rainy TRUE? Set of examples - target feature is numeric Given a set of examples for which we know the target (duration), predict it for examples where it is unknown. 23

24 Learning In order to predict the class several Machine Learning techniques can be used, e.g.: n Naïve Bayes n K-nn n Perceptron n Neural Network n Decision Trees (ID3) n Classification Rules (AC4.5) n Support Vector Machine 24

25 Matrix of ratings Items Users 25

26 Recommendation and Classification Users instances (or items are the instances?) Class any product for which you d like to predict the actual rating Class-value rating Differences n Data are sparse n There is no preferred class-item to predict n In fact, the main problem is related to determining the classes (items) that it is better to predict n You cannot (?) predict all the class values and then choose the classes-items with higher class values (too expensive) n Unless there are few items (? Generalize the notion of item?). 26

27 Lazy and Eager Learning Lazy: wait for query before generalizing n k-nearest Neighbor, Case based reasoning Eager: generalize, i.e., build a model, before seeing query n Radial basis function networks, ID3, Neural Networks, Naïve Bayes, SVM Does it matter? n Eager learner must create global approximation n Lazy learner can create many local approximations 27

28 Model-Based Collaborative Filtering Previously seen approach is called lazy or memory-based as the original examples (the vectors of user ratings) are used when a prediction is required (no computation when data is collected) Model based approaches build and store a probabilistic model and use it to make the prediction v * ij E( v ij ) 5 r 1 r P( v ij I Where r1, 5 are the possible values of the rating and I i is the set of (indexes of) products rated by user I E(X) is the Expectation (i.e., the average value of the random variable X) r { v The probabilities above are estimated with a classifier producing the probability for an example to belong to a class (the class of products having a rating r), e.g., Naïve Bayes (but also k-nearest neighbor!). ik, k i }) User model 28

29 29 Naïve Bayes P(H E) [P(E H) * P(H)] / P(E) Example: n P(flue fever) P(fever flue) * [P (flue) / P(fever)] n P(flue fever) 0.99 * P(flue) / P(fever) 0.99*0.01/ In case of ratings, a class is a particular rating for a particular product (e.g., the first product). X i is a variable representing the rating of a generic user for product i Assuming the independence of the ratings on different products ),, ( ) ( ),, ( ),, ( n n n n n n v X v X P r X P r X v X v X P v X v X r X P ),, ( ) ( ) ( ),, ( n n n j j j n n v X v X P r X P r X v X P v X v X r X P

30 Improvements of CF Using a significance weighting that tells how dependable the measure of similarity is measured as a count of the number of items used in the similarity computation [Herlocker et al., 1999] n Min( P i P j, 50)/50 : where P i, P j are the product rated by user i and j (50 is a parameter 50 items are considered enough to have optimal similarity computation) n This approach improves the prediction accuracy (MAE) Not all items may be informative in the similarity computation items that have low variance in their ratings are supposed to be less informative than items that have more diverse ratings [Herlocker et al., 1999] n He gave more importance, in the similarity computation, to the products having larger variance in ratings this did not improve accuracy of the prediction. 30

31 Dependability in the similarity measure p1 p2 p3... u Overlapping ratings with u1 Means that there is a rating for that user-item pair 31

32 Low variance vs. High variance items u Overlapping ratings with u1 32

33 Adaptive Similarity Metric Is it possible to identify - for each user-item pair the best set of ratings upon which to compute the user-to-user similarity? Feature selection Example: when predicting a rating for a movie by Schwarzenegger use in the userto-user similarity computation only the ratings for other movies by Schwarzy and Stallone. 33

34 Correlated vs. De-correlated items p 1 p 2 p 3 u Assume that we want to predict the rating of user u 1 for item p 3. Would it be more useful to base the similarity on the ratings for item p 1 or p 2? 34

35 Item Selection In adaptive user-to-user similarity use item p if: n there are ratings for item p in both users' profiles n the item p has a great correlation/importance for the item whose rating has to be predicted (target) Items Correlation Weights n Variance Weighting: the larger the variance of the item, the bigger is the weight n IPCC: the larger is the Pearson correlation, the larger the weight n Mutual Information: weight is computed as the information that predictive item provides to the knowledge of target item n Genre weighting: the more genres the two items share the larger is the weight. 35

36 Performance of Adaptive Item Selection Movielens Dataset Less data can lead to a better prediction Selecting a small number of items overlapping in the profiles of the users whose similarity is computed improves the accuracy. 36

37 Problems of CF : Sparsity Typically we have large product sets and user ratings for a small percentage of them Example Amazon: millions of books and a user may have bought hundreds of books n the probability that two users that have bought 100 books have at least a common book (in a catalogue of 1 million books) is 0.01 (with 50 and 10 millions is ) if all books are equally likely to be bought! n Exercise: what is the probability that they have 10 books in common. 37

38 Sparsity Popular Items u Not popular Item 38

39 Popular vs Not Popular Predicting the rating of popular items is easier than for not popular ones n The prediction can be based on many neighbors The usefulness of predicting the rating of popular items is questionable: n It could be guessed in a simpler way n The system will not appear as much smart to the user Predicting the rating of unpopular items is n Risky: not many neighbors on which to base the prediction n But could really bring a lot of value to the user! 39

40 Problems of CF : Scalability Nearest neighbor algorithms require computations that grows with both the number of customers and products With millions of customers and products a web-based recommender will suffer serious scalability problems The worst case complexity is O(m*n) (m customers and n products) But in practice the complexity is O(m + n) since for each customer only a small number of products are considered (one loop on the m customers to compute similarity PLUS one on the n products to compute the prediction). 40

41 Computational Issues u 1 To compute the similarity of u1 with the other users we must scan the users database m (large) but only 3 products will be considered (in this example) Represent a user as u1 ((1, v 1 1 ), (6, v 1 6 ), (12, v 1 12 )) 41

42 Computational Issues u 1 When you have selected the neighbors u3 ((1, v 3 1 ), (3, v 3 3 ), (8, v 3 8 ), (12, v 3 12 )) u4 ((3, v 4 3 ), (6, v 4 6 ), (12, v 4 12 )) You must only scan the union of the products in the neighbors profiles and identify those not yet rated by the target user u1 42

43 Some Solutions for Addressing the Computational Complexity Discard customers with few purchases Discard very popular items Partition the products into categories Dimensionality reduction (LSI or clustering data) All of these methods also reduce recommendation quality (according to [Linden et al., 2003]). [Linden et al., 2003] 43

44 Summary Example of usage of precision and recall in the evaluation of a CF system CF using only the presence or absence of a rating Association rules Comparison between CF and association rules Illustrated the relationships between CF and classification learning Briefly discussed the notion of model-based CF Naïve Bayes methods Discussed some problems of CF recommendation: sparsity and scalability Discussed the computational complexity of CF 44

45 Questions What are the two alternative approaches for selecting the nearest neighbors, given a user-to-user similarity metric? How could be defined a measure of reliability of the similarity between two users? What is the computational complexity of a naïve implementation of the CF algorithm? What is its complexity in practice? How CF compares with association rules? What are the main differences between the recommendation learning and classification learning? How CF would work with very popular or very rare products? Will CF work better for popular or not popular items? 45

### CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES

### CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

### CS570: Introduction to Data Mining

CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

### Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter

### High dim. data. Graph data. Infinite data. Machine learning. Apps. Locality sensitive hashing. Filtering data streams.

http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Network Analysis

### Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 06/0/ Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

### Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview

### Supervised and Unsupervised Learning (II)

Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised

### Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules

### Property1 Property2. by Elvir Sabic. Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14

Property1 Property2 by Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14 Content-Based Introduction Pros and cons Introduction Concept 1/30 Property1 Property2 2/30 Based on item

### Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

### Mining Web Data. Lijun Zhang

Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

### Weka ( )

Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

### Jarek Szlichta

Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns

### Association Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12

Association Rules Charles Sutton Data Mining and Exploration Spring 2012 Based on slides by Chris Williams and Amos Storkey The Goal Find patterns : local regularities that occur more often than you would

### CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

### Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the

### VECTOR SPACE CLASSIFICATION

VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture

### Performance Measures

1 Performance Measures Classification F-Measure: (careful: similar but not the same F-measure as the F-measure we saw for clustering!) Tradeoff between classifying correctly all datapoints of the same

### Collaborative Filtering based on User Trends

Collaborative Filtering based on User Trends Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos Papadopoulos, and Yannis Manolopoulos Aristotle University, Department of Informatics, Thessalonii 54124,

### Mining Web Data. Lijun Zhang

Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

### Data Mining and Analytics

Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/

### COMP33111: Tutorial and lab exercise 7

COMP33111: Tutorial and lab exercise 7 Guide answers for Part 1: Understanding clustering 1. Explain the main differences between classification and clustering. main differences should include being unsupervised

### Probabilistic Learning Classification using Naïve Bayes

Probabilistic Learning Classification using Naïve Bayes Weather forecasts are usually provided in terms such as 70 percent chance of rain. These forecasts are known as probabilities of precipitation reports.

### Data Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM

Practice notes 4..9 Practice plan Data Mining and Knowledge Discovery Knowledge Discovery and Knowledge Management in e-science Petra Kralj Novak Petra.Kralj.Novak@ijs.si Practice, 9//4 9//: Predictive

CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Computer Science 591Y Department of Computer Science University of Massachusetts Amherst February 3, 2005 Topics Tasks (Definition, example, and notes) Classification

### WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov

WEKA: Practical Machine Learning Tools and Techniques in Java Seminar A.I. Tools WS 2006/07 Rossen Dimov Overview Basic introduction to Machine Learning Weka Tool Conclusion Document classification Demo

### Association Rule Learning

Association Rule Learning 16s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 15, 2016 COMP9417 ML & DM (CSE, UNSW) Association

### Recommender Systems. Collaborative Filtering & Content-Based Recommending

Recommender Systems Collaborative Filtering & Content-Based Recommending 1 Recommender Systems Systems for recommending items (e.g. books, movies, CD s, web pages, newsgroup messages) to users based on

### ANU MLSS 2010: Data Mining. Part 2: Association rule mining

ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements

### Association Pattern Mining. Lijun Zhang

Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

### Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

Data Mining and Machine Learning Instance-Based Learning Rote Learning k Nearest-Neighbor Classification Prediction, Weighted Prediction choosing k feature weighting (RELIEF) instance weighting (PEBLS)

### Data Mining Algorithms

Algorithms Fall 2017 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Looking for patterns in data Machine

### Jeff Howbert Introduction to Machine Learning Winter

Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d

### Distribution-free Predictive Approaches

Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

### Classification Algorithms in Data Mining

August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

### Part I: Data Mining Foundations

Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

### Chapter 4: Association analysis:

Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

### Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

### CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 1/8/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 Supermarket shelf

### 2. Discovery of Association Rules

2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining

### Recommendation Systems

Recommendation Systems CS 534: Machine Learning Slides adapted from Alex Smola, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Lester Mackey, Dietmar Jannach, and Gerhard Friedrich Recommender Systems (RecSys)

### 7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Foundations of Machine Learning CentraleSupélec Paris Fall 2016 7. Nearest neighbors Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning

### Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Stefan Hauger 1, Karen H. L. Tso 2, and Lars Schmidt-Thieme 2 1 Department of Computer Science, University of

### SUGGEST. Top-N Recommendation Engine. Version 1.0. George Karypis

SUGGEST Top-N Recommendation Engine Version 1.0 George Karypis University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis, MN 55455 karypis@cs.umn.edu Last updated on

### Clustering Part 4 DBSCAN

Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

### Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

### CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.

### 1 Machine Learning System Design

Machine Learning System Design Prioritizing what to work on: Spam classification example Say you want to build a spam classifier Spam messages often have misspelled words We ll have a labeled training

### Summary. Machine Learning: Introduction. Marcin Sydow

Outline of this Lecture Data Motivation for Data Mining and Learning Idea of Learning Decision Table: Cases and Attributes Supervised and Unsupervised Learning Classication and Regression Examples Data:

### CSE 494: Information Retrieval, Mining and Integration on the Internet

CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:

### Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation

Prediction Prediction What is Prediction Simple methods for Prediction Classification by decision tree induction Classification and regression evaluation 2 Prediction Goal: to predict the value of a given

### Machine Learning: Symbolische Ansätze

Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the

### Nearest Neighbor Classification

Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

### Data Mining in Bioinformatics Day 1: Classification

Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls

### Web Personalization & Recommender Systems

Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender

### A Dendrogram. Bioinformatics (Lec 17)

A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and

### Semi-supervised Learning

Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty

### MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering

### Evaluating Classifiers

Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

### Data Mining Concepts & Tasks

Data Mining Concepts & Tasks Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Sept 9, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos Last Time

### Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background

### Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

### New user profile learning for extremely sparse data sets

New user profile learning for extremely sparse data sets Tomasz Hoffmann, Tadeusz Janasiewicz, and Andrzej Szwabe Institute of Control and Information Engineering, Poznan University of Technology, pl.

### Survey on Collaborative Filtering Technique in Recommendation System

Survey on Collaborative Filtering Technique in Recommendation System Omkar S. Revankar, Dr.Mrs. Y.V.Haribhakta Department of Computer Engineering, College of Engineering Pune, Pune, India ABSTRACT This

### Introduction. Chapter Background Recommender systems Collaborative based filtering

ii Abstract Recommender systems are used extensively today in many areas to help users and consumers with making decisions. Amazon recommends books based on what you have previously viewed and purchased,

### ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

### Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

### 9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o

### Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points

Lunds Tekniska Högskola EDA132 Institutionen för datavetenskap VT 2017 Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen 2016 03 15, 14.00 19.00, MA:8 You can give your answers

### Michele Gorgoglione Politecnico di Bari Viale Japigia, Bari (Italy)

Does the recommendation task affect a CARS performance? Umberto Panniello Politecnico di Bari Viale Japigia, 82 726 Bari (Italy) +3985962765 m.gorgoglione@poliba.it Michele Gorgoglione Politecnico di Bari

### The Explorer. chapter Getting started

chapter 10 The Explorer Weka s main graphical user interface, the Explorer, gives access to all its facilities using menu selection and form filling. It is illustrated in Figure 10.1. There are six different

### Information Retrieval & Text Mining

Information Retrieval & Text Mining Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References 2 Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

### Data Mining and Knowledge Discovery Practice notes 2

Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

### Web Personalization & Recommender Systems

Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender

### Semantic text features from small world graphs

Semantic text features from small world graphs Jurij Leskovec 1 and John Shawe-Taylor 2 1 Carnegie Mellon University, USA. Jozef Stefan Institute, Slovenia. jure@cs.cmu.edu 2 University of Southampton,UK

### Frequent Pattern Mining

Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B

### Content-based Recommender Systems

Recuperação de Informação Doutoramento em Engenharia Informática e Computadores Instituto Superior Técnico Universidade Técnica de Lisboa Bibliography Pasquale Lops, Marco de Gemmis, Giovanni Semeraro:

### Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp Sumedh Sawant sumedh@stanford.edu Team 38 December 10, 2013 Abstract We implement a personal recommendation

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,

### Intelligent Techniques for Web Personalization

Intelligent Techniques for Web Personalization Sarabjot Singh Anand 1 and Bamshad Mobasher 2 1 Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK s.s.anand@warwick.ac.uk 2 Center

### CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

### Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

### Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

### Data Mining Part 4. Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato

Data Mining Part 4 Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato Algorithms: The basic methods Inferring rudimentary rules Statistical modeling Constructing

### We will be releasing HW1 today It is due in 2 weeks (1/25 at 23:59pm) The homework is long

1/21/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 1 We will be releasing HW1 today It is due in 2 weeks (1/25 at 23:59pm) The homework is long Requires proving theorems

### PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 15-1: Support Vector Machines Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,

### Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before

### Artificial Neural Networks (Feedforward Nets)

Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x

### Available online at ScienceDirect. Procedia Technology 17 (2014 )

Available online at www.sciencedirect.com ScienceDirect Procedia Technology 17 (2014 ) 528 533 Conference on Electronics, Telecommunications and Computers CETC 2013 Social Network and Device Aware Personalized

### Association Rule Mining. Introduction 46. Study core 46

Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

### Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

### Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature

### D B M G Data Base and Data Mining Group of Politecnico di Torino

DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results

### Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal