ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1
Selected Works of Activity 4 2
Selected Works of Activity 4 3
Last lecture 4
Mid-term project (individual, Oct 22 due) 1. 4-page project paper. Topic (your idea related to the course, or our suggestions online) Datasets (from online, or collection if you can) Data processing (from lecture + tutorial) Methodologies for justifications (from lecture) Conclusions (from what you learnt) Details and formats from the course webpage. 2. Deadline: 22-Oct (2 weeks from now) 3. Hints: Using interesting datasets 5
Summary of this lecture 1. Recommendation 2. Machine Learning 3. Recommendation Examples 4. Scraper 6
Recommendation 7
Recall - Software Features/Functions Features Recommendation /Feeds / Notification Profile/ Membership/ friends network Image/ Video/ Location Sharing Tags/ Messages/ Comments Search/ Result Users Accessing 8 Devices
Different types of recommendations 1. Image (Flickr) 2. Video (YouTube, Youku, Netflix) 3. Cuisine (Openrice, Dianping, Foursquare) 4. Friend/Member/Articles (Facebook, Renren, Google+, WeChat, Line, etc.) 5. Books (Amazon) 6. Webpage/ bookmarks (Delicious) 7. Product (ebay) 9
Recommendation Predicts users interests / preferences & suggests items (e.g., books) or social element (e.g., friends) Facebook Amazon 10
Recommendations by simple search/ matching criteria Highlight Meetup By simple searching/ matching criteria: timing. location. interest choice. 11
Recommendations using trends or other users info. Google+ ebay 12
Recommendations using social data YouTube Without availabilities of historical info., or matched criteria: 1. Your friend s subscriptions/ behaviours 2. Like-minded people s subscriptions/ behaviours 3. Your interactions with certain things 13
Common Approaches in S.N. today Statistical/ Data Mining Approach Social Networking Approach Based on the historic data had been stored, such as viewing history Based on the social info. from others, such as friends 14
User/System histories: Simplest techniques Learning user interests by checking user viewing histories and behavioral records e.g., content, category and ratings Recommendation based on system access statistics and the trending of content e.g., most popular and new videos 15
Common Techniques in Social Networks Content-based recommendation Understand the item and user properties for recommendation e.g., BoF tagging for image Collaborative filtering Understand user properties for recommendation e.g., tagging for image 16
Content-based recommendation Match the content characteristics and the user s preference learnt from their profiles: Steps: 1. analyze the characteristics of an item (e.g., movies) 2. compare to users preferences learnt from their profile 3. recommend if results of step 1 and step 2 in a ranked list Like: Action War Which one to recommend? 17
Content-based recommendation An example framework Recommender 18
19 Collaborative filtering (CF) Leveraging other social info. abcd How it works Should item 1 be recommended to Tim? Item to Item 2 approaches: user to user content to content User to User
Collaborative filtering (CF) User-to-user Finding users with similar tastes e.g., Jaccard similarity abcd Jane and Tim both liked item 2 and disliked item 3 they have similar tastes J(U i, U j ) = U i U j U i U j Item 1 is recommended to Tim 20
21 Collaborative filtering (CF) Item-to-item Finding items that have similar subscribers Dom and Sandra are 2 users both like item 1 & 4 Users like item 4, also like item 1 at the same time item 1 will be recommended to Tim.
Methods/Algorithm for Machine Learning 22
K-nearest neighbors (K-NN) the simplest machine learning algorithm 2 common applications: classification: a class membership, with the object being assigned to the class most common among its k nearest neighbors after some voting mechanism; regression: a property value for an object, which is the average of the values of its k nearest neighbours. Different neighbors could have different weights e.g., the nearest one has a higher weight, formulated by 1/d where d is the distance. 23
k-nn: An algorithm to find users/object with similar tastes/subscribers Step 2: Find the k nearest neighbors, the k objects with highest similarity Sandra Step 1: Collect a set of labeled samples, e.g., each item as an object Tim If K=3, then in this case query instance will be classified as positive since 2 nearest neighbors are positive Jane Tim Tim Don Step 3: Classify the input object. e.g., like or dislike the item 24
k-nn: An algorithm to find users/object with similar tastes/subscribers Step 2: Find the k nearest neighbors, the k objects with highest similarity Step 1: Collect a set of labeled samples, e.g., each item as an object If K=3, then in this case query instance will be classified as positive since 3 nearest neighbors are positive Step 3: Classify the input object. e.g., like or dislike the item 25
k-nn Algorithm Step 1: Collect a set of labeled samples (e.g., scraping) y i : prediction we want to make ( + and - in the previous slide) x i : parameters of the object (the users/ the position of label in the previous slide) Step 2: Find the k nearest neighbors of it Step 3: Classify the input object 26
An example using k-nn- in Facebook k-nn user-to-user based CF for friend recommendations 27
Recommendation Example 28
Will it be cool If you can identify user gender? More suggestions can be made based on your gender 29
An Analytic System for User Gender Identification Through User Shared Images Ming Cheung James She 30
Our Motivation Can user gender be reflect by their shared images Dataset name, gender and more images shared by users Within the dataset: 6036/1414 users 1,598,769/1,553,575 user shared images 31
Methodology 1. Some similarity model to evaluate tags on user shared images 2. Some machine learning models (e.g., k-nn) 3. Numerical evaluation with ground truth Tags of user shared pictures 33
accuracy accuracy Results - 2 Impacts of k on gender identification Fotolog Flickr # of testing users=500 K # of testing users=150 K 34
accuracy accuracy Results - 2 Comparison with other approaches Fotolog maximum mean minimum Flickr maximum mean minimum K-NN wk-nn SVM Rand K-NN wk-nn SVM Rand 35
Conclusion You shared information more than your pictures Your pictures reveals your genders Privacy protection vs limited intelligences to serve Potential values and issues of users shared images 36
Social Media Data Collection building a data scraper 37
What if we don t have the dataset Recall - Architecture of Software Applications 38
How do we collect social media data? Databases of a social network (Facebook) may not be directly accessible However, the view of a SN webpage or web services is accessible Database for your SNA Collect data manually? User Apps/ Web API 39
Accessing the web/web services by a scraper Let scraper to view a webpage for you Locate and save the required info. you want by the scraper Database for your SNA SN data Profile Friends list etc. scraper Apps/ Web API 40
-End of Lecture 4 Questions / Comments? 41