Author(s): Rahul Sami, 2009

Similar documents
Author(s): Rahul Sami, 2009

For more information about how to cite these materials visit

SI Special Topics: Data Security and Privacy: Legal, Policy and Enterprise Issues, Winter 2010

Author(s): August E. Evrard, PhD. 2010

For more information about how to cite these materials visit

Author(s): Vic Divecha, 2011

For more information about how to cite these materials visit

Recommender Systems: User Experience and System Issues

Machine Learning using MapReduce

Mining Web Data. Lijun Zhang

For more information about how to cite these materials visit

CSE 454 Final Report TasteCliq

Recommender Systems. Collaborative Filtering & Content-Based Recommending

Automatically Building Research Reading Lists

Big Data Analytics CSCI 4030

Mining Web Data. Lijun Zhang

Recommender Systems. Techniques of AI

Project Report. An Introduction to Collaborative Filtering

Towards a hybrid approach to Netflix Challenge

A Time-based Recommender System using Implicit Feedback

SOFIA: Social Filtering for Niche Markets

SI Networks: Theory and Application, Fall 2008

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

Data Mining Techniques

Recommender Systems 6CCS3WSN-7CCSMWAL

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Data Mining Techniques

Property1 Property2. by Elvir Sabic. Recommender Systems Seminar Prof. Dr. Ulf Brefeld TU Darmstadt, WS 2013/14

Introduction to Data Mining

Recommender Systems. Master in Computer Engineering Sapienza University of Rome. Carlos Castillo

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Study on Recommendation Systems and their Evaluation Metrics PRESENTATION BY : KALHAN DHAR

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Part 11: Collaborative Filtering. Francesco Ricci

Predicting User Ratings Using Status Models on Amazon.com

Web Personalization & Recommender Systems

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness

CS570: Introduction to Data Mining

Web Spam. Seminar: Future Of Web Search. Know Your Neighbors: Web Spam Detection using the Web Topology

STREAMING RANKING BASED RECOMMENDER SYSTEMS

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

Absorbing Random walks Coverage

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Seek and Ye shall Find

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala

Web Personalization & Recommender Systems

Distributed Itembased Collaborative Filtering with Apache Mahout. Sebastian Schelter twitter.com/sscdotopen. 7.

Managing Performance Variance of Applications Using Storage I/O Control

Enterprise Miner Software: Changes and Enhancements, Release 4.1

Effectiveness of Crawling Attacks Against Web-based Recommender Systems

You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors

Singular Value Decomposition, and Application to Recommender Systems

Search Engine Optimization. MBA 563 Week 6

arxiv: v4 [cs.ir] 28 Jul 2016

Recap: Project and Practicum CS276B. Recommendation Systems. Plan for Today. Sample Applications. What do RSs achieve? Given a set of users and items

LECTURE 8 TEST DESIGN TECHNIQUES - I

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Predictive Indexing for Fast Search

Search Engines Considered Harmful In Search of an Unbiased Web Ranking

By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin

Recommender Systems New Approaches with Netflix Dataset

Distributed CUR Decomposition for Bi-Clustering

COMP6237 Data Mining Making Recommendations. Jonathon Hare

Robustness and Accuracy Tradeoffs for Recommender Systems Under Attack

Community-Based Recommendations: a Solution to the Cold Start Problem

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO Lecture 7: Information Retrieval II. Aidan Hogan

Recommender Systems (RSs)

The Normal Distribution & z-scores

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Link Prediction in Graph Streams

Absorbing Random walks Coverage

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Efficient Orienteering-Route Search over Uncertain Spatial Datasets

Lecture 8: Linkage algorithms and web search

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm

COMS 4721: Machine Learning for Data Science Lecture 23, 4/20/2017

Search Engines Considered Harmful In Search of an Unbiased Web Ranking

Recommender Systems. Nivio Ziviani. Junho de Departamento de Ciência da Computação da UFMG

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data

Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Introduction to Data Mining

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Sentiment analysis under temporal shift

Introduction. Chapter Background Recommender systems Collaborative based filtering

1 Overview Definitions (read this section carefully) 2

Part 11: Collaborative Filtering. Francesco Ricci

Influence in Ratings-Based Recommender Systems: An Algorithm-Independent Approach

Seek and Ye shall Find

Information Retrieval

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

The Normal Distribution & z-scores

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

Knowledge Discovery and Data Mining 1 (VO) ( )

Classification: Feature Vectors

Full Text Search Engine as Scalable k-nearest Neighbor Recommendation System

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Transcription:

Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/ We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact open.michigan@umich.edu with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use.

Citation Key for more information see: http://open.umich.edu/wiki/citationpolicy Use + Share + Adapt { Content the copyright holder, author, or law permits you to use, share and adapt. } Public Domain Government: Works that are produced by the U.S. Government. (USC 17 105) Public Domain Expired: Works that are no longer protected due to an expired copyright term. Public Domain Self Dedicated: Works that a copyright holder has dedicated to the public domain. Creative Commons Zero Waiver Creative Commons Attribution License Creative Commons Attribution Share Alike License Creative Commons Attribution Noncommercial License Creative Commons Attribution Noncommercial Share Alike License GNU Free Documentation License Make Your Own Assessment { Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. } Public Domain Ineligible: Works that are ineligible for copyright protection in the U.S. (USC 17 102(b)) *laws in your jurisdiction may differ { Content Open.Michigan has used under a Fair Use determination. } Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (USC 17 107) *laws in your jurisdiction may differ Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use this content you should do your own independent analysis to determine whether or not your use will be Fair.

Lecture 12: Explanations; Scalable Implementation; Manipulation SI583: Recommender Systems

4 Explanations in recommender systems Moving away from the black-box oracle model justify why a certain item is recommended maybe also converse to reach a recommendation

5 Amazon.com

Amazon explanations (contd.) 6 Amazon.com

7 Why have explanations? [Tintarev & Masthoff] Transparency Scrutability : correct errors in learnt preference model Trust/Confidence in system Effectiveness & efficiency(speed) Satisfaction/enjoyment

8 Example: explanations for transparency and confidence Movie X was recommended to you because it is similar to movie Y, Z that you recently watched Movie X was recommended to you because you liked other comedies Other users who bought book X also bought book Y

9 Generating explanations Essentially, explain the steps of the CF algorithm, picking the most prominent neighbors User-user Item-item Harder to do for SVD and other abstract model-fitting recommender algorithms

10 Conversational recommenders Example transcript: (from [McSherry, Explanation in Recommender Systems, AI Review 2005]): Top case: please enter your query User: Type = wandering, month = aug Top Case: the target case is aug, tyrol,... other competing cases include... Top case: What is the preferred location? User: why? Top case: It will help eliminate... alternatives User: alps..

11 Conversational recommenders One view: CF using some navigational data as well as ratings More structured approach: incremental collaborative filtering similarity metric changes as the query is refined e.g., incremental Nearest-Neighbor algorithm [McSherry, AI Review 2005]

12 Scalable Implementations Learning objective: see some techniques that are used for large-scale recommenders Know where to start looking for more information

13 Google News Personalization [Das et al, WWW 07] describe algo. and arch. Specific challenges: News relevant items are frequently changing users long-lived, but often new users Very fast response times needed Specific challenges: Google scale! many items, many many users need to parallelize complex computations

14 Algorithms Input data: clicks eg, user J clicked on article X Use a combination of three reco algos: user-user (with a simple similarity measure) SVD ( PLSI ) Item-item (mainly for new users; simple covisitation similarity measure)

15 Tricks/approximations for scalable computing User-user: calculate weighted avg. over only a cluster of users J and K in same cluster if they have a high fraction of overlapped clicks clustering is precomputed offline (using a fast MinHash algorithm) SVD : Precompute user-side weights; update only item-side weights in real time gives an approximate SVD Tweak offline algorithms for parallel computing on Google s map-reduce infrastructure

16 Architecture (from Das et al) Das et al.

17 Experiences with the Netflix prize challenge Difference: static dataset My architecture (such as it was): A clustered user-user randomly chosen clusters (not optimal) cluster size to fit user-user calc in 1GB memory Preprocess, create indices (perl scripts) Calculate similarities (in C) {memory bottleneck} Generate predictions (perl) Evaluate accuracy on test set (perl)

18 Manipulation..

19 Why manipulate a recommender? Examples?

20 Why manipulate a recommender? Examples? Digg/Slashdot: get an article read PageRank: get your site high on results page Books: Author wants his book recommended Spam How?

21 Example: User-User Algorithm user item Joe A B C... X 7 4 4 2 5? Sue 7 5 6 5 6 8 John 2 3 7 2 i s informativeness score = correlation coefficient of i s past ratings with Joe s past ratings Prediction for item X = average of ratings of X, weighted by the rater s scores

22 Cloning Attack: Strategic copying Attacker may copy past ratings to look informative, gain influence. Joe 7 4 4 2 5? FreeMeds 7 4 4 2 5 10 Even if ratings are not directly visible, attacker may be able to infer something about ratings from her own recommendations, publicly available statistics Worse if many accounts can be created (sybil attack)

23 One approach: profile analysis This problem of shilling attacks has been noted earlier [Lam and Riedl] [O Mahoney et al] Many papers on empirical measurements and statistical detection of attack profiles Problem: attackers may get better at disguising their profiles.

24 Results we cannot achieve Prevent any person J from manipulating the prediction on a single item X. Cannot distinguish deliberate manipulation from different tastes on item X Fairness, ie., two raters with identical information get exactly the same influence, regardless of rating order. Cannot distinguish second rater with identical information from an informationless clone.

25 The influence limiter: Key Ideas [Resnick and Sami, Proceedings of RecSys 07 conference] Limit influence until rater demonstrates informativeness Informative only if you re the first to provide the information

26 Predictions on an Item: A Dynamic View 0 predicted 1 probability of HIGH Recommender algorithm ratings

27 Predictions on an Item: A Dynamic View 0 predicted 1 probability of HIGH Recommender algorithm ratings

28 Predictions on an Item: A Dynamic View 0 predicted 1 probability of HIGH Recommender algorithm ratings

29 Predictions on an Item: A Dynamic View eventu al label by target: 0 predicted 1HIGH probability of HIGH Recommender algorithm ratings

30 Predictions on an Item: A Dynamic View eventu al label contribution by target: 0 predicted 1HIGH probability of HIGH Recommender algorithm ratings

31 Our approach Information-theoretic measure of contribution and damage Limit influence a rater can have had based on past contribution This limits net damage an attacker can cause contribution 0 predicted 1 probability ratings Recommender algorithm eventu al label

32 Our Model Binary rating system (HIGH/LOW) Recommendations for a single target person Any recommender algorithm Powerful attackers: Can create up to n sybil identities Can clone existing rating profiles No assumptions on non-attackers: Attacker s sybils may form majority Do not depend on honest raters countering attacks

33 Overview of Results Influence-limiter algorithm can be overlaid on any recommender algorithm to satisfy (with caveats): Limited damage: An attacker with up to n sybils can never cause net total damage greater than O(1) units of prediction error Bounded information loss: In expectation, O(log n) units of information discarded from each genuine rater in total.

34 Influence Limiter: Architecture reputation s R j Scoring q~ 0 ~q 1 ~ q n Influence Limiter target rating to target q 0 q 1 q n Recommender algo ratings

35 Influence Limiter Algorithm: Illustration 0 1 limited ~ prediction q j-1 0 Influence Limiter raw 1 predictions q j-1 ratings Recommender algorithm

36 Influence Limiter Algorithm: Illustration A rater with R=0.25 puts in a rating 0 1 limited prediction ~q j-1 0 Influence Limiter raw 1 q predictions j-1 q j Recommender algorithm ratings

37 Influence Limiter Algorithm: Illustration A rater with R=0.25 puts in a rating 0 1 limited prediction ~q j-1 ~ qj 0 Influence Limiter raw 1 q predictions j-1 q j Recommender algorithm ratings