COMP 465: Data Mining Recommender Systems


 Nickolas Bradley
 1 years ago
 Views:
Transcription
1 //0 movies COMP 6: Data Mining Recommender Systems Slides Adapted From: (Mining Massive Datasets) movies Compare predictions with known ratings (test set T)????? Test Data Set Rootmeansquare error (RMSE) r xi r (x,i) T xi N where N = T r xi is predicted rating r xi is the actual rating of x on i
2 //0 Narrow focus on accuracy sometimes misses the point Prediction Diversity Prediction Context Order of predictions In practice, we care only to predict high ratings: RMSE might penalize a method that does well for high ratings and badly for others Alterative: precision at top k Percentage of predictions in the user s top k withheld ratings 6 Training data 00 million ratings, 80,000, 7,770 movies 6 years of data: Test data Last few ratings of each user ( million) Evaluation criterion: Root Mean Square Error (RMSE) = rxi r R (i,x) R xi Netflix s system RMSE: 0 Competition,700+ teams $ million prize for 0% improvement on Netflix Matrix R 7,700 movies 80,
3 //0 Matrix R 7,700 movies Training Data Set?? RMSE = R 80,000??? (i,x) R r,6 Test Data Set rxi r xi Predicted rating True rating of user x on item i 9 Training data 00 million ratings, 80,000, 7,770 movies 6 years of data: Test data Last few ratings of each user ( million) Evaluation criterion: Root Mean Square Error (RMSE) = rxi r R (i,x) R xi Netflix s system RMSE: 0 Competition,700+ teams $ million prize for 0% improvement on Netflix 0 The winner of the Netflix Challenge! Multiscale modeling of the data: Combine top level, regional modeling of the data, with a refined, local view: Global: Overall deviations of /movies Factorization: Addressing regional effects Collaborative filtering: Extract local patterns Global effects Factorization Collaborative filtering Global: Mean movie rating: stars The Sixth Sense is 0. stars above avg. Joe rates 0. stars below avg. Baseline estimation: Joe will rate The Sixth Sense stars Local neighborhood (CF/NN): Joe didn t like related movie Signs Final estimate: Joe will rate The Sixth Sense stars
4 //0 Earliest and most popular collaborative filtering method Derive unknown ratings from those of similar movies (itemitem variant) Define similarity measure s ij of i and j Select knearest neighbors, compute the rating N(i; x): most similar to i that were rated by x rˆ xi j N ( i; x) s ij jn ( i; x) r s ij xj s ij similarity of i and j r xj rating of user x on item j N(i;x) set of similar to item i that were rated by x In practice we get better estimates if we model deviations: ^ rxi b xi baseline estimate for r xi b xi = μ + b x + b i μ = overall mean rating b x = rating deviation of user x = (avg. rating of user x) μ b i = (avg. rating of movie i) μ jn ( i; x) s ij ( r jn ( i; x) xj s ij b Problems/Issues: ) Similarity measures are arbitrary ) Pairwise similarities neglect interdependencies among ) Taking a weighted average can be restricting Solution: Instead of s ij use w ij that we estimate directly from data xj ) Basic Collaborative filtering: 0 CF+Biases+learned weights: 0 Global average: 6 User average:.06 Movie average:.0 Netflix: 0 Grand Prize: 06 Goal: Make good recommendations uantify goodness using RMSE: Lower RMSE better recommendations Want to make good recommendations on that user has not yet seen. Can t really do this! Let s set build a system such that it works well on known (user, item) ratings And hope the system will also predict well the unknown ratings 6
5 //0 SVD on Netflix data: R R For now let s assume we can approximate the rating matrix R as a product of thin R has missing entries but let s ignore that for now! Basically, we will want the reconstruction error to be small on known ratings and we don t care about the values on the missing ones SVD: A = U V T females The Color Purple Sense and Sensibility The Princess Diaries Serious Amadeus Ocean s The Lion King Funny Braveheart Independence Day Lethal Weapon males Dumb and Dumber 7 8 How to estimate the missing rating of user x for item i? r xi = q i p x ? = q if p xf. . f q i = row i of p x = column x of How to estimate the missing rating of user x for item i? r xi = q i p x ? = q if p xf. . f q i = row i of p x = column x of
6 f Factor Factor //0 How to estimate the missing rating of user x for item i? r xi = q i p x ? f = q if p xf. . f q i = row i of p x = column x of females The Color Purple Sense and Sensibility The Princess Diaries Serious Amadeus The Lion King Funny Braveheart Lethal Weapon Ocean s Factor males Independence Day Dumb and Dumber females The Color Purple Sense and Sensibility The Princess Diaries Serious Amadeus The Lion King Funny Braveheart Lethal Weapon Ocean s Factor males Independence Day Dumb and Dumber SVD: A: Input data matrix U: Left singular vecs V: Right singular vecs : Singular values So in our case: SVD on Netflix data: R A = R, = U, = V T m n A m U n V T r xi = q i p x 6
7 //0 SVD gives minimum reconstruction error (Sum of Squared Errors): min A ij UΣV T ij U,V,Σ ij A Note two things: SSE and RMSE are monotonically related: RMSE = SSE Great news: SVD is minimizing RMSE c Complication: The sum in SVD error term is over all entries (norating in interpreted as zerorating). But our R has missing entries! SVD isn t defined when entries are missing! Use specialized methods to find P, min P, r xi q i p i,x R x rxi = q i p x Note: We don t require cols of P, to be orthogonal/unit length P, map /movies to a latent space The most popular model among Netflix contestants Sudden rise in the average movie rating (early 00) Improvements in Netflix GUI improvements Meaning of rating changed Movie age Users prefer new movies without any reasons Older movies are just inherently better than newer ones Y. Koren, Collaborative filtering with temporal dynamics, KDD
8 RMSE //0 0 CF (no time bias) 0 Basic Latent Factors CF (time bias) 0 Latent Factors w/ Biases 00 + Linear time 0 + Perday user biases + CF Millions of parameters Basic Collaborative filtering: 0 Collaborative filtering++: 0 Latent : 00 Latent +Biases: 09 Latent +Biases+Time: 076 Global average: 6 User average:.06 Movie average:.0 Netflix: 0 Grand Prize: 06 Still no prize! Getting desperate. Try a kitchen sink approach! 0 June 6 th submission triggers 0day last call Ensemble team formed Group of other teams on leaderboard forms a new team Relies on combining their models uickly also get a qualifying score over 0% BellKor Continue to get small improvements in their scores Realize that they are in direct competition with Ensemble Strategy Both teams carefully monitoring the leaderboard Only sure way to check for improvement is to submit a set of predictions This alerts the other team of your latest score 8
9 //0 Submissions limited to a day Only final submission could be made in the last h hours before deadline BellKor team member in Austria notices (by chance) that Ensemble posts a score that is slightly better than BellKor s Frantic last hours for both teams Much computer time on final optimization Carefully calibrated to end about an hour before deadline Final submissions BellKor submits a little early (on purpose), 0 mins before deadline Ensemble submits their final entry 0 mins later.and everyone waits
CS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Training data 00 million ratings, 80,000 users, 7,770 movies 6 years of data: 000 00 Test data Last few ratings of
More informationData Mining Techniques
Data Mining Techniques CS 6  Section  Spring 7 Lecture JanWillem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Project Project Deadlines Feb: Form teams of  people 7 Feb:
More informationRecommendation Systems
Recommendation Systems CS 534: Machine Learning Slides adapted from Alex Smola, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Lester Mackey, Dietmar Jannach, and Gerhard Friedrich Recommender Systems (RecSys)
More informationYelp Recommendation System
Yelp Recommendation System Jason Ting, Swaroop Indra Ramaswamy Institute for Computational and Mathematical Engineering Abstract We apply principles and techniques of recommendation systems to develop
More informationUse of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University
Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the
More informationGeneral Instructions. Questions
CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationBy Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin
By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth Under The Guidance of Dr. Richard Maclin Outline Problem Statement Background Proposed Solution Experiments & Results Related Work Future
More informationPerformance Comparison of Algorithms for Movie Rating Estimation
Performance Comparison of Algorithms for Movie Rating Estimation Alper Köse, Can Kanbak, Noyan Evirgen Research Laboratory of Electronics, Massachusetts Institute of Technology Department of Electrical
More informationProgress Report: Collaborative Filtering Using Bregman Coclustering
Progress Report: Collaborative Filtering Using Bregman Coclustering Wei Tang, Srivatsan Ramanujam, and Andrew Dreher April 4, 2008 1 Introduction Analytics are becoming increasingly important for business
More informationCollaborative Filtering for Netflix
Collaborative Filtering for Netflix Michael Percy Dec 10, 2009 Abstract The Netflix movierecommendation problem was investigated and the incremental Singular Value Decomposition (SVD) algorithm was implemented
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationCollaborative Filtering with Temporal Dynamics
Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo! Research, Haifa, Israel yehuda@yahooinc.com ABSTRACT Customer preferences for products are drifting over time. Product perception and
More informationJeff Howbert Introduction to Machine Learning Winter
Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d
More informationFactor in the Neighbors: Scalable and Accurate Collaborative Filtering
1 Factor in the Neighbors: Scalable and Accurate Collaborative Filtering YEHUDA KOREN Yahoo! Research Recommender systems provide users with personalized suggestions for products or services. These systems
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering
More informationExtension Study on ItemBased PTree Collaborative Filtering Algorithm for Netflix Prize
Extension Study on ItemBased PTree Collaborative Filtering Algorithm for Netflix Prize Tingda Lu, Yan Wang, William Perrizo, Amal Perera, Gregory Wettstein Computer Science Department North Dakota State
More informationPredicting Popular Xbox games based on Search Queries of Users
1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which
More informationFeature Selection Using ModifiedMCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using ModifiedMCA Based Scoring Metric for Classification
More informationImproved Neighborhoodbased Collaborative Filtering
Improved Neighborhoodbased Collaborative Filtering Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park Ave, Florham Park, NJ 07932 {rbell,yehuda}@research.att.com ABSTRACT Recommender systems
More informationIntroduction. Chapter Background Recommender systems Collaborative based filtering
ii Abstract Recommender systems are used extensively today in many areas to help users and consumers with making decisions. Amazon recommends books based on what you have previously viewed and purchased,
More informationCollaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp
Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp Sumedh Sawant sumedh@stanford.edu Team 38 December 10, 2013 Abstract We implement a personal recommendation
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationFootball result prediction using simple classification algorithms, a comparison between knearest Neighbor and Linear Regression
EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP STOCKHOLM, SVERIGE 2016 Football result prediction using simple classification algorithms, a comparison between knearest Neighbor and Linear Regression PIERRE
More informationarxiv: v4 [cs.ir] 28 Jul 2016
ReviewBased Rating Prediction arxiv:1607.00024v4 [cs.ir] 28 Jul 2016 Tal Hadad Dept. of Information Systems Engineering, BenGurion University Email: tah@post.bgu.ac.il Abstract Recommendation systems
More informationPredicting Gene Function and Localization
Predicting Gene Function and Localization By Ankit Kumar and Raissa Largman CS 229 Fall 2013 I. INTRODUCTION Our data comes from the 2001 KDD Cup Data Mining Competition. The competition had two tasks,
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationPerformance of Recommender Algorithms on TopN Recommendation Tasks
Performance of Recommender Algorithms on Top Recommendation Tasks Paolo Cremonesi Politecnico di Milano Milan, Italy paolo.cremonesi@polimi.it Yehuda Koren Yahoo! Research Haifa, Israel yehuda@yahooinc.com
More informationCluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010
Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,
More informationDS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand
More informationApplication of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set
Application of Additive Groves Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Daria Sorokina Carnegie Mellon University Pittsburgh PA 15213
More informationBordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering
BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering Yeming TANG Department of Computer Science and Technology Tsinghua University Beijing, China tym13@mails.tsinghua.edu.cn Qiuli
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning ChoJui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationPart 12: Advanced Topics in Collaborative Filtering. Francesco Ricci
Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca DolocMihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIITDelhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIITDelhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationHybrid Recommendation Models for Binary User Preference Prediction Problem
JMLR: Workshop and Conference Proceedings 18:137 151, 2012 Proceedings of KDDCup 2011 competition Hybrid Recommation Models for Binary User Preference Prediction Problem Siwei Lai swlai@nlpr.ia.ac.cn
More informationImproving the Accuracy of TopN Recommendation using a Preference Model
Improving the Accuracy of TopN Recommendation using a Preference Model Jongwuk Lee a, Dongwon Lee b,, YeonChang Lee c, WonSeok Hwang c, SangWook Kim c a Hankuk University of Foreign Studies, Republic
More informationTopic 7 Machine learning
CSE 103: Probability and statistics Winter 2010 Topic 7 Machine learning 7.1 Nearest neighbor classification 7.1.1 Digit recognition Countless pieces of mail pass through the postal service daily. A key
More informationThe exam is closed book, closed notes except your onepage (twosided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your onepage (twosided) cheat sheet. No calculators or
More informationSupport Vector Machines + Classification for IR
Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines
More informationCS2941 Assignment 2 Report
CS2941 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The
More informationInformation Retrieval: Retrieval Models
CS473: Web Information Retrieval & Management CS473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models
More informationObservations. Basic iteration Line estimated from 2 inliers
Line estimated from 2 inliers 3 Observations We need (in this case!) a minimum of 2 points to determine a line Given such a line l, we can determine how well any other point y fits the line l For example:
More informationThe exam is closed book, closed notes except your onepage (twosided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your onepage (twosided) cheat sheet. No calculators or
More informationClustering. Bruno Martins. 1 st Semester 2012/2013
Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 Motivation Basic Concepts
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationProperty1 Property2. by Elvir Sabic. Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14
Property1 Property2 by Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14 ContentBased Introduction Pros and cons Introduction Concept 1/30 Property1 Property2 2/30 Based on item
More information2007 Canadian Computing Competition: Senior Division. Sponsor:
2007 Canadian Computing Competition: Senior Division Sponsor: Canadian Computing Competition Student Instructions for the Senior Problems 1. You may only compete in one competition. If you wish to write
More informationRecommender system techniques applied to Netflix movie data
Recommender system techniques applied to Netflix movie data Research Paper Business Analytics Steven Postmus (s.h.postmus@student.vu.nl) Supervisor: Sandjai Bhulai (s.bhulai@vu.nl) Vrije Universiteit Amsterdam,
More informationOnline Social Networks and Media
Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model
More informationParallel machine learning using Menthor
Parallel machine learning using Menthor Studer Bruno Ecole polytechnique federale de Lausanne June 8, 2012 1 Introduction The algorithms of collaborative filtering are widely used in website which recommend
More informationDistributionfree Predictive Approaches
Distributionfree Predictive Approaches The methods discussed in the previous sections are essentially modelbased. Modelfree approaches such as treebased classification also exist and are popular for
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205206 Pietro Guccione, PhD DEI  DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationCSE152 Introduction to Computer Vision Assignment 3 (SP15) Instructor: Ben Ochoa Maximum Points : 85 Deadline : 11:59 p.m., Friday, 29May2015
Instructions: CSE15 Introduction to Computer Vision Assignment 3 (SP15) Instructor: Ben Ochoa Maximum Points : 85 Deadline : 11:59 p.m., Friday, 9May015 This assignment should be solved, and written
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]
More informationPairwise Sequence Alignment: Dynamic Programming Algorithms. COMP Spring 2015 Luay Nakhleh, Rice University
Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571  Spring 2015 Luay Nakhleh, Rice University DP Algorithms for Pairwise Alignment The number of all possible pairwise alignments (if
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationRating Prediction Using Preference Relations Based Matrix Factorization
Rating Prediction Using Preference Relations Based Matrix Factorization Maunendra Sankar Desarkar and Sudeshna Sarkar Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur,
More informationA Recommender System. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2018
A Recommender System John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Obvious Applications We are now advanced enough that we can aspire to a serious application.
More informationNeural Network Weight Selection Using Genetic Algorithms
Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationFrequency Distributions and Descriptive Statistics in SPS
230 Combs Building 859.622.3050 studentcomputing.eku.edu studentcomputing@eku.edu Frequency Distributions and Descriptive Statistics in SPSS In this tutorial, we re going to work through a sample problem
More informationData clustering & the kmeans algorithm
April 27, 2016 Why clustering? Unsupervised Learning Underlying structure gain insight into data generate hypotheses detect anomalies identify features Natural classification e.g. biological organisms
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationDoes Wikipedia Information Help Netflix Predictions?
Does Wikipedia Information Help Netflix Predictions? John LeesMiller, Fraser Anderson, Bret Hoehn, Russell Greiner University of Alberta Department of Computing Science {leesmill, frasera, hoehn, greiner}@cs.ualberta.ca
More informationExcUseMe: Asking Users to Help in Item ColdStart Recommendations
ExcUseMe: Asking Users to Help in Item ColdStart Recommendations Michal Aharon Yahoo Labs, Haifa, Israel michala@yahooinc.com Dana DrachslerCohen Technion, Haifa, Israel ddana@cs.technion.ac.il Oren
More informationData Mining Lab 2: A Basic Tree Classifier
Data Mining Lab 2: A Basic Tree Classifier 1 Introduction In this lab we are going to look at the Titanic data set, which provides information on the fate of passengers on the maiden voyage of the ocean
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationLinear and Quadratic Least Squares
Linear and Quadratic Least Squares Prepared by Stephanie Quintal, graduate student Dept. of Mathematical Sciences, UMass Lowell in collaboration with Marvin Stick Dept. of Mathematical Sciences, UMass
More informationApplications Video Surveillance (Online or offline)
Face Face Recognition: Dimensionality Reduction Biometrics CSE 190a Lecture 12 CSE190a Fall 06 CSE190a Fall 06 Face Recognition Face is the most common biometric used by humans Applications range from
More informationRobert Collins CSE486, Penn State. Lecture 09: Stereo Algorithms
Lecture 09: Stereo Algorithms left camera located at (0,0,0) Recall: Simple Stereo System Y y Image coords of point (X,Y,Z) Left Camera: x T x z (, ) y Z (, ) x (X,Y,Z) z X right camera located at (T x,0,0)
More informationInclass activities: Sep 25, 2017
Inclass activities: Sep 25, 2017 Activities and group work this week function the same way as our previous activity. We recommend that you continue working with the same 3person group. We suggest that
More informationCHAPTER 2 DESCRIPTIVE STATISTICS
CHAPTER 2 DESCRIPTIVE STATISTICS 1. StemandLeaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of
More informationCPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017
CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationPredicting Results of a Biological Experiment Using Matrix Completion Algorithms
Predicting Results of a Biological Experiment Using Matrix Completion Algorithms by Trevor Sabourin A research paper presented to the University of Waterloo in partial fulfillment of the requirement for
More informationCHAPTER 4 KMEANS AND UCAM CLUSTERING ALGORITHM
CHAPTER 4 KMEANS AND UCAM CLUSTERING 4.1 Introduction ALGORITHM Clustering has been used in a number of applications such as engineering, biology, medicine and data mining. The most popular clustering
More informationCSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..
More informationLandslide Monitoring Point Optimization. Deployment Based on Fuzzy Cluster Analysis.
Journal of Geoscience and Environment Protection, 2017, 5, 118122 http://www.scirp.org/journal/gep ISSN Online: 23274344 ISSN Print: 23274336 Landslide Monitoring Point Optimization Deployment Based
More informationUniversity of WisconsinMadison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2
Assignment goals Use mutual information to reconstruct gene expression networks Evaluate classifier predictions Examine Gibbs sampling for a Markov random field Control for multiple hypothesis testing
More informationPredicting Bus Arrivals Using One Bus Away RealTime Data
Predicting Bus Arrivals Using One Bus Away RealTime Data 1 2 3 4 5 Catherine M. Baker Alexander C. Nied Department of Computer Science Department of Computer Science University of Washington University
More informationCoding for Random Projects
Coding for Random Projects CS 584: Big Data Analytics Material adapted from Li s talk at ICML 2014 (http://techtalks.tv/talks/codingforrandomprojections/61085/) Random Projections for HighDimensional
More informationCollaborative Filtering based on User Trends
Collaborative Filtering based on User Trends Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos Papadopoulos, and Yannis Manolopoulos Aristotle University, Department of Informatics, Thessalonii 54124,
More informationSpatial Variation of SeaLevel Sea level reconstruction
Spatial Variation of SeaLevel Sea level reconstruction Biao Chang Multimedia Environmental Simulation Laboratory School of Civil and Environmental Engineering Georgia Institute of Technology Advisor:
More informationSIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014
SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image
More informationFeature Extractors. CS 188: Artificial Intelligence Fall NearestNeighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationNeighborhoodBased Collaborative Filtering
Chapter 2 NeighborhoodBased Collaborative Filtering When one neighbor helps another, we strengthen our communities. Jennifer Pahlka 2.1 Introduction Neighborhoodbased collaborative filtering algorithms,
More informationBayesian Personalized Ranking for Las Vegas Restaurant Recommendation
Bayesian Personalized Ranking for Las Vegas Restaurant Recommendation Kiran Kannar A53089098 kkannar@eng.ucsd.edu Saicharan Duppati A53221873 sduppati@eng.ucsd.edu Akanksha Grover A53205632 a2grover@eng.ucsd.edu
More information11/17/2009 Comp 590/Comp Fall
Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 79090 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected
More informationRouter placement. Problem statement for Final Round, Hash Code 2017
Router placement Problem statement for Final Round, Hash Code 2017 Introduction Who doesn't love wireless Internet? Millions of people rely on it for productivity and fun in countless cafes, railway stations
More informationData Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM
Practice notes 4..9 Practice plan Data Mining and Knowledge Discovery Knowledge Discovery and Knowledge Management in escience Petra Kralj Novak Petra.Kralj.Novak@ijs.si Practice, 9//4 9//: Predictive
More informationClustering. Distance Measures Hierarchical Clustering. k Means Algorithms
Clustering Distance Measures Hierarchical Clustering k Means Algorithms 1 The Problem of Clustering Given a set of points, with a notion of distance between points, group the points into some number of
More informationUnsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis
7 Supervised learning vs unsupervised learning Unsupervised Learning Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute These patterns are then
More informationJune 15, Abstract. 2. Methodology and Considerations. 1. Introduction
Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More information