Introduction. Chapter Background Recommender systems Collaborative based filtering

Size: px
Start display at page:

Download "Introduction. Chapter Background Recommender systems Collaborative based filtering"

Transcription

1 ii Abstract Recommender systems are used extensively today in many areas to help users and consumers with making decisions. Amazon recommends books based on what you have previously viewed and purchased, Netflix presents you with shows and movies you might enjoy based on your interactions with the platform and Facebook serves personalized ads to every user based on gathered browsing information. These systems are based on shared similarities and there are several ways to develop and model them. This study compares two methods, user and item-based filtering in k nearest neighbours systems. The methods are compared on how much they deviate from the true answer when predicting user ratings of movies based on sparse data. The study showed that none of the methods could be considered objectively better than the other and that the choice of system should be based on the data set.

2 Chapter 1 Introduction 1.1 Background In everyday life, it is often necessary to make choices without sufficient personal experience of the alternatives. We then rely on recommendations from other people to make as smart choices as possible. E.g., when shopping at a shoe store, a customer could describe features of previously owned shoes to a clerk and then the clerk would make recommendations for new shoes based on the customer s past experiences. A dedicated clerk could, besides providing recommendations, also remember past choices and experiences of customers. This would allow the clerk to make personalised recommendations to returning customers. The way we transform this experience to the digital era is by using recommender systems [1] Recommender systems Recommender systems can be viewed as a digital representation of the clerk in the previous example. The goal of a recommender system is to make predictions of what items users might be interested in by analysing gathered data. Gathering data can be done with an implicit and/or an explicit approach. An implicit approach records users behaviour when reacting to incoming data (e.g. by recording for how long a user actually watched a movie before switching to something else). This can be done without user knowledge. The explicit approach depends on the user explicitly specifying their preferences regarding items, e.g. by rating a movie. Input to a recommender system is the gathered data and the output is a prediction or recommendation for the user [2]. A recommender system s predictions will generally be more accurate the more data it can base its predictions on. Having a small amount of data to base predictions on is known as the sparse data problem and is expanded upon in section Collaborative based filtering Collaborative Filtering (CF) is a common algorithm used in recommender systems. CF provides predictions and recommendations based on other users and/or items in the system. We assume that similar users or items in the system can be used to predict each other s ratings. If we know that Haris likes the same things as Alex and Alex also likes candy then we can predict that Haris will most likely also enjoy candy [3, 4]. 1

3 2 CHAPTER 1. INTRODUCTION Two common methods for implementing collaborative filtering are user and itembased filtering. Both of these methods create a similarity matrix where the similarities between users (or items) is calculated and stored in a matrix. The distance (similarity) between users can be calculated in several ways and two common methods are the Pearson correlation coefficient or the cosine similarity Calculating the similarity between users To calculate how similar users are, a matrix is used where the users are rows and different items are columns. One can then look at how similar users are by comparing their ratings for every item. Below is an example matrix and table with 3 users (Amy, Bill and Jim) and only 2 items (Snow Crash and Girl with the Dragon Tattoo). Figure 1.1: Comaprison matrix [guidetodatamining.com] Figure 1.2: Comaprison table [guidetodatamining.com] The figures 1.1 and 1.2 show Bill and Jim having more in common than any other pair. There are several ways to give a value to this similarity. Some common approaches are:

4 CHAPTER 1. INTRODUCTION 3 Manhattan distance The Manhattan distance is a simple form of similarity calculation. It is the sum of the differences between ratings in every axis. In the above case, where the matrix is in 2D, the Manhattan distance between Bill, at index 1, and Jim, at index 2, would be: x 1 x 2 + y 1 y 2 = = 2 Euclidean distance The Euclidean distance uses the difference of every axis and applies the Pythagorean Theorem to calculate the "straight line distance" between two objects in the matrix. Pythagorean theorem: a 2 + b 2 = c 2 Euclidean distance between Jim, at index 1, and Amy, at index 3, is calculated with the equation: ( x1 x 3 2 ) + ( y 1 y 3 2 ) = ( ) + ( 1 5 ) 2 = Correlation An issue that isn t visualized by this example is what happens when there is incomplete data. As in, some users haven t rated some items of the matrix. If users A and B have rated the same 100 items but A and C only have 10 rated items in common, the similarity calculation between A and B should obviously be stronger as it is based on more data. Using the Manhattan or Euclidean distance however, this will not be accounted for, making these methods poor when data is missing [5]. To account for this, two other methods, Pearson correlation coefficient and cosine similarity can be used. Pearson correlation coefficient (PCC) The PCC draws a line between two users ratings to get a correlation value where a straight, increasing line represents a high correlation while a decreasing line shows that the compared units do not correlate much. Figure 1.3: Example of a correlation table [guidetodatamining.com] The figures 1.3 and 1.4, show an example of positive correlation. The Pearson correlation coefficient takes what is known as "grade inflation" into account [5]. This is the phenomenon of users rating things differently even though they feel the same way about them. In the above example, Weird Al is the band Clara dislikes the most yet they are still rated at 4. Robert also dislikes Weird Al but gives them a rating of 1. In the Manhattan or Euclidean calculations, this would represent a big difference between the users but

5 4 CHAPTER 1. INTRODUCTION Figure 1.4: Graphing the table shows a positive correlation [guidetodatamining.com] the graph shows that they are very much alike. When placing these 5 bands in order of preference, they agree completely. The formula for calculating PCC is: r = n (x i x)(y i ȳ) i=1 (1.1) n n (x i x) 2 (y i ȳ) 2 i=1 i=1 Cosine similarity Cosine similarity is another way of calculating the similarity between users preferences. Here the users and their ratings of items are represented as two vectors and their similarity is based on the cosine of the angle between them. Cosine similarity is often used for recommender systems since it ignores items which both users haven t rated, so called 0-0 matches, which are in abundance when dealing with sparse data. The cosine similarity is calculated as: cos( x, y ) = x y x y Where the dot in the numerator represents the dot product and x in the denominator indicates the length of vector x k Nearest Neighbours (knn) K nearest neighbours is the method of looking at some number (k) of users or items that are similar to make predictions. Meaning that not all users, or items, are accounted for when making a prediction. The difference between user or item-based filtering is creating a matrix of similar users or similar items. Similar users are users who often share sentiment/rating of items. When recommender systems were first developed, user-based filtering was used but it has issues with scalability. As the amount of data increases, calculating the similarity matrix raises exponentially. To combat this, Amazon developed item-based filtering which labels similar items into groups so that once a user rates some (1.2)

6 CHAPTER 1. INTRODUCTION 5 item highly, the algorithm recommends other similar items from the same group. Itembased filtering scales better than the user-based approach [3, 5, 6] Evaluation Two common methods for evaluating recommender systems are used in this study. The Root Mean Squared Error (RMSE) is calculated by: RMSE = 1 n n d 2 i (1.3) i=1 and the Mean Absolute Error (MAE) is calculated by: MAE = 1 n n d i (1.4) Where n is the number of predictions made and d is the distance between the recommender system s prediction and the correct answer. The closer the RMSE and MAE values are to 0 the better accuracy the recommender system has. RMSE disproportionally penalizes large errors while MAE does not mirror many small errors properly so both measurements should be used when evaluating the accuracy [7, 8, 9]. To provide test data for evaluation, a dataset is divided into two parts. One part is used for building the similarity matrix and the other part is used for evaluation Sparse data problem Sparse data is a common problem in recommender systems where the dataset consists of few ratings compared to the number of users. This issue was simulated by splitting the dataset into two asymmetric parts. The smaller part is then used to make predictions for all objects in the larger part [10]. i=1 1.2 Datasets Three datasets where used in this study. These are all datasets involving user ratings of movies. The datasets have all been previously used in studies about recommender systems [10]. The datasets are: FilmTrust FilmTrust was an old film rating website that has now been shut down. The data was crawled from the FilmTrust website in June 2011 as part of a research paper on recommender systems [11]. The FilmTrust database has users and items. There is a total of ratings where the scale goes from 1 to 5. CiaoDVD CiaoDVD was a DVD rating website where users could share their reviews of movies and give recommendations for stores with the best prices. The data was crawled from dvd.ciao.co.uk in December 2013 as part of a research paper on trust prediction [12]. The

7 6 CHAPTER 1. INTRODUCTION CiaoDVD database has 920 users and items. There is a total of ratings and the scale goes from 1 to 5. MovieLens MovieLens is a well-known dataset used in many scientific papers. It consists of a collection of movie ratings from the MovieLens web site. The dataset was collected over various periods of time [13]. The MovieLens database has users and items. There are a total number of ratings and the scale goes from 1 to 5. In this dataset, all users have rated at least 20 items. 1.3 Surprise There are multiple free and available to use implementations of recommender systems. The algorithms in this study was implemented using the python library Surprise [14]. Surprise is licensed under the BSD 3-Clause license [15]. 1.4 Purpose The study compares how well the two collaborative based filtering systems user-based and item-based perform when predictions are based on sparse data, known as the sparse data problem. The sparse data problem is a common one in the field of machine learning [16] and understanding how effective these different methods are, is of great value for future implementations. 1.5 Research question How do the two filtering systems user-based and item-based compare when making predictions based on sparse data? 1.6 Scope and constraints The different datasets that were used are from MovieLens, FilmTrust and CiaoDVD. The python library Surprise was used to conduct all tests. This study will only compare the correctness of predictions when these are based on sparse data. Other factors such as speed and memory efficiency will not be taken into consideration. The correctness will be measured using the RMSE and MAE.

8 Chapter 2 Method Running the two filtering methods, user and item-based filtering, on a dataset is henceforth referred to as a "test". Every test was conducted 10 times with randomized sets of training and test data. The mean value of these 10 runs represent the result of a test. 2.1 Data handling Before use, the data needed processing. Following are the methods used to prepare the data for testing Simulating sparse data In the study, sparse data is defined by using 20% of the dataset for training and 80% for verification. This ratio has been used in similar studies [17] Formatting data The dataset provided from MovieLens and FilmTrust use a format that Surprise can handle natively. The dataset from CiaoDVD was formatted before use. The python script in appendix B.3 was used to retrieve only the columns with user id, movie id and rating Creating test data The data was split using a python script, see appendix B.2, that first read all the data from file into an array. Then a shuffle of the array was done by providing a seed value, ranging from 1 to 10, to the shuffle function in the python library. After that every fifth rating (20%) was written to one file and the rest was written to another. The smaller file was then used as training data for the recommender system and the bigger file was used as test data. This was repeated 10 times with different seeds for each dataset. 2.2 Conducting the tests The created test and training datasets were used to build models, run the prediction algorithm and evaluate the result. See appendix B.1 for code. 7

9 8 CHAPTER 2. METHOD Building similarity model A PCC and cosine similarity model was built for each dataset. Note that the models had to be created for each dataset and only one model could be evaluated in each run. This was configured with built in functions in the Surprise library Building the prediction algorithm Built-in methods in Surprise were used to create the prediction algorithm. In table 2.1 the configurations for the different prediction algorithms are shown. All setups used a minimum of 1 neighbour for predictions. Test Filtering method Similarity model Max Neighbours used 1 Item-based cosine 40 2 User-based cosine 40 3 Item-based pearson 40 4 User-based pearson 40 Table 2.1: Configurations for prediction algorithms Evaluating the algorithms Evaluation of the algorithms was done with the built-in function, evaluate(), in the Surprise library. Each test was run with all (10) test and training data combinations for each dataset. For both correlation evaluations (PCC and cosine similarity) and each dataset a mean value for the RMSE and MAE score was calculated based on the evaluation of the 10 different seeded partitions of the data. An average was used to prevent strong influences from deviating scores in the case of bad data in the results.

10 Chapter 3 Results The following structure will be used to present the results of the study: Two sections are used showing results based on each of the similarity matrix structures, Pearson correlation coefficient (Pearson) or cosine similarity (Cosine). For all datasets, user and item-based filtering will be compared side by side in a plot for each metric, MAE or RMSE. The plot shows the average value of the 10 test runs. The lower the value, the better predictions have been made. Following the plot of average scores there is another plot which shows the max deviation for the scores. This is the difference between the highest and lowest score of the 10 test runs for each dataset and filtering method. The lower the difference, the smaller the spread which has been observed between different test runs. This plot is included to give an idea of how much the tests varied which is relevant as we use an average value. The full metrics of the tests are presented in appendix A. 3.1 Pearson The following results were obtained using the Pearson method for the similarity matrix. Figure 3.1: MAE, Pearson 9

11 10 CHAPTER 3. RESULTS The plot in figure 3.1 shows the results for the MAE scores. The plot shows a small advantage for item-based filtering for the FilmTrust dataset while there is an opposite advantage for the MovieLens dataset. For the CiaoDVD dataset user and item-based based filtering score about the same. Figure 3.2: Max MAE score deviation for Pearson The difference plot in figure 3.2 shows that the difference of the max and min value is less than for all the datasets. FilmTrust has highest value for user-based filtering. The scores have a deviation of around 3%. The plot also shows that there is a big difference for user and item-based deviation for FilmTrust. Figure 3.3: RMSE, Pearson The RMSE scores, plotted in figure 3.3, give hints about the same trends as the MAE scores. The dataset for FilmTrust had better accuracy when item-based filtering was used

12 CHAPTER 3. RESULTS 11 and MovieLens had better accuracy when user-based was used. CiaoDVD had about the same accuracy for both filtering methods. Figure 3.4: Max RMSE score deviation for Pearson The difference plot in figure 3.4 shows the same max deviation for the FilmTrust dataset with less than difference between the max and min values. The difference between the user and item-based approaches for the FilmTrust dataset which was observed in figure 3.2 is present here as well.

13 12 CHAPTER 3. RESULTS 3.2 Cosine The following results were obtained using the cosine similarity method for the similarity matrix. Figure 3.5: MAE, Cosine In figure 3.5 the same trend which was observed for the pearson matrices in figure 3.1 are still visible. However, user and item-based filtering scored slightly closer to each other. Figure 3.6: Max MAE score deviation for cosine For the cosine similarity matrix, the difference between the max and min scores are much closer than for the Pearson similarity matrices. From figure 3.6 we see that the max score deviation is less than 0.01 points. However, there is a slightly lesser deviation for

14 CHAPTER 3. RESULTS 13 item-based filtering for all datasets. Notice that the big deviation for user-based filtering for the FilmTrust dataset which was observed when using the Pearson method is not present here. Figure 3.7: RMSE, Cosine The RMSE score using the cosine similarity matrix plotted in figure 3.7 shows the same trends as the RMSE score for the Pearson similarity matrix in figure 3.3. Figure 3.8: Max RMSE score deviation for cosine As opposed to the MAE score we see a slightly smaller deviation of the scores for user-based filtering. The deviation is less than 0.01 points which is very low.

15 Chapter 4 Discussion The discussion section has been divided into three parts with one part discussing our results and how the study was conducted, one part talking about external dependencies and the last part analysing the current state of the art and the relevancy of the study. Figures show a clear pattern where neither user nor item-based filtering has a clear advantage over the other, independent of error and correlation measurements (MAE, RMSE and Pearson, cosine). The results suggest that the choice of filtering method should be based on the data set. Exactly what properties of the data set that one should look for when determining filtering method is hard to say based on this study as it only contains 3 different ones with several differences (making it hard to pinpoint determining factors). Our experiments show a clear correlation between the two error measurements where both give the same result for every dataset on what filtering method performed best. The MAE scores being lower than the respective RMSE ones across the board is expected as MAE can never produce a higher value than RMSE, only an equal one (if all errors have the same magnitude). The maximum k value for the k-nearest neighbours algorithm which denotes how many items or users one makes the recommendations based on was chosen to be 40 in all tests. Choosing the optimal k value is not a simple task and there are many suggestions for how one should go about doing it but no agreed upon best method [18]. Using cross validation with different k values and comparing results is one recommended method but this approach depends on the data set. Since different data sets are used in this study, different k values might be needed for the datasets to enable the system to perform at optimal capacity. Other ways of calculating an optimal k value are discussed in [19]. Calculating an optimal k value for every data set was considered outside of this study s scope and the default value of the Surprise library (40) was used instead. This value is, as stated, the maximum number of neighbours which the algorithm will consider. If there are not 40 users (or items) which are similar enough to be considered neighbours, Surprise will use a lower amount (to a minimum of 1). Using a different maximum k value may have an impact on the results if this study s experiments are to be remade. Every test result is a mean average of 10 runs where the training and test data sets were randomized. This method was used because it was a fair compromise when considering its correctness and the scope of the study. One can naturally get a more statistically sound value by averaging 1000 test runs instead of 10 but running the tests is 14

16 CHAPTER 4. DISCUSSION 15 time consuming (computationally) and it is hard to set a limit for how many data points are needed for a fair assessment. One more thing which our method doesn t account for is outliers which can skew the mean considerably. However, only running each test 10 times allowed us to see that no big statistical outliers were present in the mean calculations. This is shown in the figures (3.2, 3.4, 3.6, 3.8) 4.1 External dependencies Two of the datasets, FilmTrust and CiaoDVD, were acquired from a scientific paper and not taken directly from their respective source. They were both collected by crawling the websites while these were online (they have been shut down at the time of writing). This makes it hard to control the correctness of the data. The dataset from CiaoDVD came in a non-compatible format for the python program so the data had to be processed and formatted which leaves room for human error. An important attribute of the MovieLens dataset is that all users have made at least 20 ratings. There are no known similar minimum thresholds for the other datasets. To raise the confidence of the drawn conclusions, more datasets should be used of varying sizes and from areas other than movie ratings. Initially the paper included a dataset from Yelp of restaurant reviews but because of its different data format and time restrictions, this dataset could not be used in this study. We have no reason to doubt the Surprise software. All our tests have returned reasonable results and Surprise looks like a professionally built product for all intents and purposes. It is open source, actively maintained (latest commit was within 24 hours of writing ( )), well documented and written by a Ph.D. student at IRIT (Toulouse Institute of Computer Science Research). To confirm the accuracy of the software, one can use the same data sets and algorithms of this study and input these into another working recommender system and check if the results are identical. 4.2 State of the art and relevancy Many companies use recommender systems today. Some bigger ones are Amazon, Facebook, Linkedin and Youtube. Finding out exactly what algorithms these companies use and how they are implemented has proven very difficult. There are two major reasons for this. One is that such information is part of their (often) closed source code. The other is that there is no simple answer to the question as most modern recommender systems are based on a plethora of algorithms. One famous case where this was displayed was the Netflix Prize, a contest for developing a better recommender system for Netflix with a price pool of a million dollars [20]. The best (winning) algorithms were in fact never implemented by Netflix as their huge complexity and engineering effort required overshadowed the slightly better predictions they would bring [21]. The relevancy of the study can be questioned since its scope is quite narrow. Limiting itself to only comparing the accuracy of the two methods and dismissing other factors such as memory efficiency and computational demand/speed may make the results irrelevant if one of the methods can t ever be feasibly applied because of such limitations. However, even if such limitations do exist, this and similar studies could provide valuable insight for if pursuing a solution to such limitations is worth putting effort into.

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

COMP6237 Data Mining Making Recommendations. Jonathon Hare

COMP6237 Data Mining Making Recommendations. Jonathon Hare COMP6237 Data Mining Making Recommendations Jonathon Hare jsh2@ecs.soton.ac.uk Introduction Recommender systems 101 Taxonomy of recommender systems Collaborative Filtering Collecting user preferences as

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

Using Social Networks to Improve Movie Rating Predictions

Using Social Networks to Improve Movie Rating Predictions Introduction Using Social Networks to Improve Movie Rating Predictions Suhaas Prasad Recommender systems based on collaborative filtering techniques have become a large area of interest ever since the

More information

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen Recommendation Algorithms: Collaborative Filtering CSE 6111 Presentation Advanced Algorithms Fall. 2013 Presented by: Farzana Yasmeen 2013.11.29 Contents What are recommendation algorithms? Recommendations

More information

Project Report. An Introduction to Collaborative Filtering

Project Report. An Introduction to Collaborative Filtering Project Report An Introduction to Collaborative Filtering Siobhán Grayson 12254530 COMP30030 School of Computer Science and Informatics College of Engineering, Mathematical & Physical Sciences University

More information

Why Use Graphs? Test Grade. Time Sleeping (Hrs) Time Sleeping (Hrs) Test Grade

Why Use Graphs? Test Grade. Time Sleeping (Hrs) Time Sleeping (Hrs) Test Grade Analyzing Graphs Why Use Graphs? It has once been said that a picture is worth a thousand words. This is very true in science. In science we deal with numbers, some times a great many numbers. These numbers,

More information

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as

More information

Singular Value Decomposition, and Application to Recommender Systems

Singular Value Decomposition, and Application to Recommender Systems Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.

More information

A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES

A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES Zui Zhang, Kun Liu, William Wang, Tai Zhang and Jie Lu Decision Systems & e-service Intelligence Lab, Centre for Quantum Computation

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Data Mining Lecture 2: Recommender Systems

Data Mining Lecture 2: Recommender Systems Data Mining Lecture 2: Recommender Systems Jo Houghton ECS Southampton February 19, 2019 1 / 32 Recommender Systems - Introduction Making recommendations: Big Money 35% of Amazons income from recommendations

More information

Hybrid Recommendation System Using Clustering and Collaborative Filtering

Hybrid Recommendation System Using Clustering and Collaborative Filtering Hybrid Recommendation System Using Clustering and Collaborative Filtering Roshni Padate Assistant Professor roshni@frcrce.ac.in Priyanka Bane B.E. Student priyankabane56@gmail.com Jayesh Kudase B.E. Student

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems Recommender Systems - Introduction Making recommendations: Big Money 35% of amazons income from recommendations Netflix recommendation engine worth $ Billion per year And yet, Amazon seems to be able to

More information

CS224W Project: Recommendation System Models in Product Rating Predictions

CS224W Project: Recommendation System Models in Product Rating Predictions CS224W Project: Recommendation System Models in Product Rating Predictions Xiaoye Liu xiaoye@stanford.edu Abstract A product recommender system based on product-review information and metadata history

More information

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the

More information

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide - Chapter Statistics. Sampling Techniques Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information

More information

Recommendation Systems

Recommendation Systems Recommendation Systems CS 534: Machine Learning Slides adapted from Alex Smola, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Lester Mackey, Dietmar Jannach, and Gerhard Friedrich Recommender Systems (RecSys)

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Recommendation Systems The task: Find new items (movies, books, music, ) you may like based on what you have liked before

More information

AI Dining Suggestion App. CS 297 Report Bao Pham ( ) Advisor: Dr. Chris Pollett

AI Dining Suggestion App. CS 297 Report Bao Pham ( ) Advisor: Dr. Chris Pollett AI Dining Suggestion App CS 297 Report Bao Pham (009621001) Advisor: Dr. Chris Pollett Abstract Trying to decide what to eat can be challenging and time-consuming. Google or Yelp are two popular search

More information

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University  Infinite data. Filtering data streams /9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets Training data 100 million ratings, 80,000 users, 17,770

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Training data 00 million ratings, 80,000 users, 7,770 movies 6 years of data: 000 00 Test data Last few ratings of

More information

The influence of social filtering in recommender systems

The influence of social filtering in recommender systems The influence of social filtering in recommender systems 1 Introduction Nick Dekkers 3693406 Recommender systems have become more and more intertwined in our everyday usage of the web. Think about the

More information

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies. CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct

More information

CptS 570 Machine Learning Project: Netflix Competition. Parisa Rashidi Vikramaditya Jakkula. Team: MLSurvivors. Wednesday, December 12, 2007

CptS 570 Machine Learning Project: Netflix Competition. Parisa Rashidi Vikramaditya Jakkula. Team: MLSurvivors. Wednesday, December 12, 2007 CptS 570 Machine Learning Project: Netflix Competition Team: MLSurvivors Parisa Rashidi Vikramaditya Jakkula Wednesday, December 12, 2007 Introduction In current report, we describe our efforts put forth

More information

Recommender Systems New Approaches with Netflix Dataset

Recommender Systems New Approaches with Netflix Dataset Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

Statistics can best be defined as a collection and analysis of numerical information.

Statistics can best be defined as a collection and analysis of numerical information. Statistical Graphs There are many ways to organize data pictorially using statistical graphs. There are line graphs, stem and leaf plots, frequency tables, histograms, bar graphs, pictographs, circle graphs

More information

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Collaborative Filtering using Euclidean Distance in Recommendation Engine Indian Journal of Science and Technology, Vol 9(37), DOI: 10.17485/ijst/2016/v9i37/102074, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Collaborative Filtering using Euclidean Distance

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Collaborative Filtering using a Spreading Activation Approach

Collaborative Filtering using a Spreading Activation Approach Collaborative Filtering using a Spreading Activation Approach Josephine Griffith *, Colm O Riordan *, Humphrey Sorensen ** * Department of Information Technology, NUI, Galway ** Computer Science Department,

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

GLOSSARY OF TERMS. Commutative property. Numbers can be added or multiplied in either order. For example, = ; 3 x 8 = 8 x 3.

GLOSSARY OF TERMS. Commutative property. Numbers can be added or multiplied in either order. For example, = ; 3 x 8 = 8 x 3. GLOSSARY OF TERMS Algorithm. An established step-by-step procedure used 1 to achieve a desired result. For example, the 55 addition algorithm for the sum of two two-digit + 27 numbers where carrying is

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #7: Recommendation Content based & Collaborative Filtering Seoul National University In This Lecture Understand the motivation and the problem of recommendation Compare

More information

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of

More information

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017 CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other

More information

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly.

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly. GRAPHING We have used statistics all our lives, what we intend to do now is formalize that knowledge. Statistics can best be defined as a collection and analysis of numerical information. Often times we

More information

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES

More information

ES-2 Lecture: Fitting models to data

ES-2 Lecture: Fitting models to data ES-2 Lecture: Fitting models to data Outline Motivation: why fit models to data? Special case (exact solution): # unknowns in model =# datapoints Typical case (approximate solution): # unknowns in model

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.

More information

Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8

Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8 Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8 Grade 6 Grade 8 absolute value Distance of a number (x) from zero on a number line. Because absolute value represents distance, the absolute value

More information

Similarity and recommender systems

Similarity and recommender systems Similarity and recommender systems Andreas C. Kapourani January 8 Introduction In this lab session we will work with some toy data and implement a simple collaborative filtering recommender system (RS),

More information

VCEasy VISUAL FURTHER MATHS. Overview

VCEasy VISUAL FURTHER MATHS. Overview VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that

More information

Building Better Parametric Cost Models

Building Better Parametric Cost Models Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute

More information

Towards a hybrid approach to Netflix Challenge

Towards a hybrid approach to Netflix Challenge Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the

More information

Bar Graphs and Dot Plots

Bar Graphs and Dot Plots CONDENSED LESSON 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs

More information

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Alejandro Bellogín 1,2, Thaer Samar 1, Arjen P. de Vries 1, and Alan Said 1 1 Centrum Wiskunde

More information

By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin

By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth Under The Guidance of Dr. Richard Maclin Outline Problem Statement Background Proposed Solution Experiments & Results Related Work Future

More information

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Graphical Analysis of Data using Microsoft Excel [2016 Version] Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:

More information

Image Compression With Haar Discrete Wavelet Transform

Image Compression With Haar Discrete Wavelet Transform Image Compression With Haar Discrete Wavelet Transform Cory Cox ME 535: Computational Techniques in Mech. Eng. Figure 1 : An example of the 2D discrete wavelet transform that is used in JPEG2000. Source:

More information

A Recommender System Based on Improvised K- Means Clustering Algorithm

A Recommender System Based on Improvised K- Means Clustering Algorithm A Recommender System Based on Improvised K- Means Clustering Algorithm Shivani Sharma Department of Computer Science and Applications, Kurukshetra University, Kurukshetra Shivanigaur83@yahoo.com Abstract:

More information

Technical Arts 101 Prof. Anupam Saxena Department of Mechanical engineering Indian Institute of Technology, Kanpur. Lecture - 7 Think and Analyze

Technical Arts 101 Prof. Anupam Saxena Department of Mechanical engineering Indian Institute of Technology, Kanpur. Lecture - 7 Think and Analyze Technical Arts 101 Prof. Anupam Saxena Department of Mechanical engineering Indian Institute of Technology, Kanpur Lecture - 7 Think and Analyze Last time I asked you to come up with a single funniest

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions

More information

Data Mining. Lecture 03: Nearest Neighbor Learning

Data Mining. Lecture 03: Nearest Neighbor Learning Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F. Provost

More information

Sample some Pi Monte. Introduction. Creating the Simulation. Answers & Teacher Notes

Sample some Pi Monte. Introduction. Creating the Simulation. Answers & Teacher Notes Sample some Pi Monte Answers & Teacher Notes 7 8 9 10 11 12 TI-Nspire Investigation Student 45 min Introduction The Monte-Carlo technique uses probability to model or forecast scenarios. In this activity

More information

Preparing for AS Level Further Mathematics

Preparing for AS Level Further Mathematics Preparing for AS Level Further Mathematics Algebraic skills are incredibly important in the study of further mathematics at AS and A level. You should therefore make sure you are confident with all of

More information

Course Outline for Grade 12 College Foundations MAP4C

Course Outline for Grade 12 College Foundations MAP4C Course Outline for Grade 12 College Foundations MAP4C UNIT 1 TRIGONOMETRY Pearson Pg. 8-12 #2-5 1.1, Introduction to Trigonometry, Primary Trig Ratios C3.1 solve problems in two dimensions using metric

More information

International Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System

International Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 3, March -2017 A Facebook Profile Based TV Shows and Movies Recommendation

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm Majid Hatami Faculty of Electrical and Computer Engineering University of Tabriz,

More information

Statistics 1 - Basic Commands. Basic Commands. Consider the data set: {15, 22, 32, 31, 52, 41, 11}

Statistics 1 - Basic Commands. Basic Commands. Consider the data set: {15, 22, 32, 31, 52, 41, 11} Statistics 1 - Basic Commands http://mathbits.com/mathbits/tisection/statistics1/basiccommands.htm Page 1 of 3 Entering Data: Basic Commands Consider the data set: {15, 22, 32, 31, 52, 41, 11} Data is

More information

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of

More information

6 TOOLS FOR A COMPLETE MARKETING WORKFLOW

6 TOOLS FOR A COMPLETE MARKETING WORKFLOW 6 S FOR A COMPLETE MARKETING WORKFLOW 01 6 S FOR A COMPLETE MARKETING WORKFLOW FROM ALEXA DIFFICULTY DIFFICULTY MATRIX OVERLAP 6 S FOR A COMPLETE MARKETING WORKFLOW 02 INTRODUCTION Marketers use countless

More information

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2

More information

Hierarchical Clustering

Hierarchical Clustering What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering

More information

Personalized Web Search

Personalized Web Search Personalized Web Search Dhanraj Mavilodan (dhanrajm@stanford.edu), Kapil Jaisinghani (kjaising@stanford.edu), Radhika Bansal (radhika3@stanford.edu) Abstract: With the increase in the diversity of contents

More information

COMP 465: Data Mining Recommender Systems

COMP 465: Data Mining Recommender Systems //0 movies COMP 6: Data Mining Recommender Systems Slides Adapted From: www.mmds.org (Mining Massive Datasets) movies Compare predictions with known ratings (test set T)????? Test Data Set Root-mean-square

More information

Lagrange Multipliers and Problem Formulation

Lagrange Multipliers and Problem Formulation Lagrange Multipliers and Problem Formulation Steven J. Miller Department of Mathematics and Statistics Williams College Williamstown, MA 01267 Abstract The method of Lagrange Multipliers (and its generalizations)

More information

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys Metallica

More information

City, University of London Institutional Repository. This version of the publication may differ from the final published version.

City, University of London Institutional Repository. This version of the publication may differ from the final published version. City Research Online City, University of London Institutional Repository Citation: Überall, Christian (2012). A dynamic multi-algorithm collaborative-filtering system. (Unpublished Doctoral thesis, City

More information

Smarter Balanced Vocabulary (from the SBAC test/item specifications)

Smarter Balanced Vocabulary (from the SBAC test/item specifications) Example: Smarter Balanced Vocabulary (from the SBAC test/item specifications) Notes: Most terms area used in multiple grade levels. You should look at your grade level and all of the previous grade levels.

More information

Algebra 2 Chapter Relations and Functions

Algebra 2 Chapter Relations and Functions Algebra 2 Chapter 2 2.1 Relations and Functions 2.1 Relations and Functions / 2.2 Direct Variation A: Relations What is a relation? A of items from two sets: A set of values and a set of values. What does

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.

More information

I can solve simultaneous equations algebraically, where one is quadratic and one is linear.

I can solve simultaneous equations algebraically, where one is quadratic and one is linear. A* I can manipulate algebraic fractions. I can use the equation of a circle. simultaneous equations algebraically, where one is quadratic and one is linear. I can transform graphs, including trig graphs.

More information

Middle School Math Course 3

Middle School Math Course 3 Middle School Math Course 3 Correlation of the ALEKS course Middle School Math Course 3 to the Texas Essential Knowledge and Skills (TEKS) for Mathematics Grade 8 (2012) (1) Mathematical process standards.

More information

Recommender Systems 6CCS3WSN-7CCSMWAL

Recommender Systems 6CCS3WSN-7CCSMWAL Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Mastery. PRECALCULUS Student Learning Targets

Mastery. PRECALCULUS Student Learning Targets PRECALCULUS Student Learning Targets Big Idea: Sequences and Series 1. I can describe a sequence as a function where the domain is the set of natural numbers. Connections (Pictures, Vocabulary, Definitions,

More information

Correlation. January 12, 2019

Correlation. January 12, 2019 Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology ❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous

More information

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017 CPSC 340: Machine Learning and Data Mining Recommender Systems Fall 2017 Assignment 4: Admin Due tonight, 1 late day for Monday, 2 late days for Wednesday. Assignment 5: Posted, due Monday of last week

More information

Fractions. 7th Grade Math. Review of 6th Grade. Slide 1 / 306 Slide 2 / 306. Slide 4 / 306. Slide 3 / 306. Slide 5 / 306.

Fractions. 7th Grade Math. Review of 6th Grade. Slide 1 / 306 Slide 2 / 306. Slide 4 / 306. Slide 3 / 306. Slide 5 / 306. Slide 1 / 06 Slide 2 / 06 7th Grade Math Review of 6th Grade 2015-01-14 www.njctl.org Slide / 06 Table of Contents Click on the topic to go to that section Slide 4 / 06 Fractions Decimal Computation Statistics

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu /2/8 Jure Leskovec, Stanford CS246: Mining Massive Datasets 2 Task: Given a large number (N in the millions or

More information

Recommender Systems using Collaborative Filtering D Yogendra Rao

Recommender Systems using Collaborative Filtering D Yogendra Rao Recommender Systems using Collaborative Filtering D Yogendra Rao Department of Computer Science and Engineering National Institute of Technology Rourkela Rourkela 769 008, India Recommender Systems using

More information

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 60 - Section - Fall 06 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Recommender Systems The Long Tail (from: https://www.wired.com/00/0/tail/)

More information

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian Demystifying movie ratings 224W Project Report Amritha Raghunath (amrithar@stanford.edu) Vignesh Ganapathi Subramanian (vigansub@stanford.edu) 9 December, 2014 Introduction The past decade or so has seen

More information

Lines of Symmetry. Grade 3. Amy Hahn. Education 334: MW 8 9:20 a.m.

Lines of Symmetry. Grade 3. Amy Hahn. Education 334: MW 8 9:20 a.m. Lines of Symmetry Grade 3 Amy Hahn Education 334: MW 8 9:20 a.m. GRADE 3 V. SPATIAL SENSE, GEOMETRY AND MEASUREMENT A. Spatial Sense Understand the concept of reflection symmetry as applied to geometric

More information