ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

Similar documents
ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 9. Prof. James She

SOCIAL MEDIA. Charles Murphy

Part 11: Collaborative Filtering. Francesco Ricci

Making Recommendations by Integrating Information from Multiple Social Networks

CS5670: Computer Vision

Advanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications

Hybrid Recommendation System Using Clustering and Collaborative Filtering

Data Mining Concepts & Tasks

Part 11: Collaborative Filtering. Francesco Ricci

Recap: Project and Practicum CS276B. Recommendation Systems. Plan for Today. Sample Applications. What do RSs achieve? Given a set of users and items

You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

An Analytic System for User Gender Identification through User Shared Images

Data Mining Concepts & Tasks

LECTURE 12. Web-Technology

Collective Intelligence in Action

PERSONALIZED TAG RECOMMENDATION

Pinterest. What is Pinterest?

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

Semantic Web and Web2.0. Dr Nicholas Gibbins

COMP6237 Data Mining Making Recommendations. Jonathon Hare

Big Data Analytics CSCI 4030

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Functionality, Challenges and Architecture of Social Networks

GeoTemporal Reasoning for the Social Semantic Web

GETTING STARTED WITH THE SDLAP NING

Big Data Analytics CSCI 4030

Recognition Tools: Support Vector Machines

DIGITAL MARKETING Your revolution starts here

Unstructured Data. CS102 Winter 2019

The Mashup Atelier. Cesare Pautasso, Monica Frisoni. Faculty of Informatics University of Lugano (USI), Switzerland

Speaker Pages For CoMeT System

DATA COLLECTION. Slides by WESLEY WILLETT 13 FEB 2014

Digital Marketing Proposal

Introduction to Web 2.0 Data Mashups

Mobile Social Media Services

Recommender System using Collaborative Filtering and Demographic Characteristics of Users

Classroom Blogging. Training wiki:

SOCIAL MEDIA MINING. Data Mining Essentials

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Based on Big Data: Hype or Hallelujah? by Elena Baralis

Web Personalization & Recommender Systems

Recommender Systems 6CCS3WSN-7CCSMWAL

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

MATH36032 Problem Solving by Computer. Data Science

Tag based Recommender System using Pareto Principle

Kristina Lerman University of Southern California. This lecture is partly based on slides prepared by Anon Plangprasopchok

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Data Mining Techniques

Web-Scale Image Search and Their Applications

Marketing & Back Office Management

Get the Yale Events App for Commencement!

Slice Intelligence!

Introduction to Data Mining and Data Analytics

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Introduction to Data Mining

How Social Is Social Bookmarking?

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Data Integration. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Measurement and evaluation: Web analytics and data mining. MGMT 230 Week 10

The Web: Concepts and Technology. January 15: Course Overview

CSE 454 Final Report TasteCliq

Scalable Web Programming. CS193S - Jan Jannink - 3/02/10

OPTIMIZING YOUR VIDEOS FOR SEARCH ENGINES(SEO)

Knowledge Discovery and Data Mining

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

FACEBOOK FOR MOBILE MARKETING

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Slides for Data Mining by I. H. Witten and E. Frank

What s a module? Some modules. it s so simple to make your page unique

Distribution-free Predictive Approaches

Evaluation of GIST descriptors for web scale image search

History and Backgound: Internet & Web 2.0

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Thursday, 26 January, 12. Web Site Design

Big Data: From Transactions, To Interactions

Make the most of your access to ScienceDirect

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Web Personalization & Recommender Systems

Analytics Building Blocks

WELCOME Mobile Applications Testing. Copyright

Web 2.0 Tutorial. Jacek Kopecký STI Innsbruck

Character Recognition from Google Street View Images

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion

Lecture #11: The Perceptron

ECS289: Scalable Machine Learning

Module 1: Internet Basics for Web Development (II)

CS570: Introduction to Data Mining

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala

Topics in Machine Learning

T-Alert: Analyzing Terrorism Using Python

#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks

Classification Algorithms in Data Mining

Introduction to Data Mining

Supervised Learning: Nearest Neighbors

Transcription:

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1

Selected Works of Activity 4 2

Selected Works of Activity 4 3

Last lecture 4

Mid-term project (individual, Oct 22 due) 1. 4-page project paper. Topic (your idea related to the course, or our suggestions online) Datasets (from online, or collection if you can) Data processing (from lecture + tutorial) Methodologies for justifications (from lecture) Conclusions (from what you learnt) Details and formats from the course webpage. 2. Deadline: 22-Oct (2 weeks from now) 3. Hints: Using interesting datasets 5

Summary of this lecture 1. Recommendation 2. Machine Learning 3. Recommendation Examples 4. Scraper 6

Recommendation 7

Recall - Software Features/Functions Features Recommendation /Feeds / Notification Profile/ Membership/ friends network Image/ Video/ Location Sharing Tags/ Messages/ Comments Search/ Result Users Accessing 8 Devices

Different types of recommendations 1. Image (Flickr) 2. Video (YouTube, Youku, Netflix) 3. Cuisine (Openrice, Dianping, Foursquare) 4. Friend/Member/Articles (Facebook, Renren, Google+, WeChat, Line, etc.) 5. Books (Amazon) 6. Webpage/ bookmarks (Delicious) 7. Product (ebay) 9

Recommendation Predicts users interests / preferences & suggests items (e.g., books) or social element (e.g., friends) Facebook Amazon 10

Recommendations by simple search/ matching criteria Highlight Meetup By simple searching/ matching criteria: timing. location. interest choice. 11

Recommendations using trends or other users info. Google+ ebay 12

Recommendations using social data YouTube Without availabilities of historical info., or matched criteria: 1. Your friend s subscriptions/ behaviours 2. Like-minded people s subscriptions/ behaviours 3. Your interactions with certain things 13

Common Approaches in S.N. today Statistical/ Data Mining Approach Social Networking Approach Based on the historic data had been stored, such as viewing history Based on the social info. from others, such as friends 14

User/System histories: Simplest techniques Learning user interests by checking user viewing histories and behavioral records e.g., content, category and ratings Recommendation based on system access statistics and the trending of content e.g., most popular and new videos 15

Common Techniques in Social Networks Content-based recommendation Understand the item and user properties for recommendation e.g., BoF tagging for image Collaborative filtering Understand user properties for recommendation e.g., tagging for image 16

Content-based recommendation Match the content characteristics and the user s preference learnt from their profiles: Steps: 1. analyze the characteristics of an item (e.g., movies) 2. compare to users preferences learnt from their profile 3. recommend if results of step 1 and step 2 in a ranked list Like: Action War Which one to recommend? 17

Content-based recommendation An example framework Recommender 18

19 Collaborative filtering (CF) Leveraging other social info. abcd How it works Should item 1 be recommended to Tim? Item to Item 2 approaches: user to user content to content User to User

Collaborative filtering (CF) User-to-user Finding users with similar tastes e.g., Jaccard similarity abcd Jane and Tim both liked item 2 and disliked item 3 they have similar tastes J(U i, U j ) = U i U j U i U j Item 1 is recommended to Tim 20

21 Collaborative filtering (CF) Item-to-item Finding items that have similar subscribers Dom and Sandra are 2 users both like item 1 & 4 Users like item 4, also like item 1 at the same time item 1 will be recommended to Tim.

Methods/Algorithm for Machine Learning 22

K-nearest neighbors (K-NN) the simplest machine learning algorithm 2 common applications: classification: a class membership, with the object being assigned to the class most common among its k nearest neighbors after some voting mechanism; regression: a property value for an object, which is the average of the values of its k nearest neighbours. Different neighbors could have different weights e.g., the nearest one has a higher weight, formulated by 1/d where d is the distance. 23

k-nn: An algorithm to find users/object with similar tastes/subscribers Step 2: Find the k nearest neighbors, the k objects with highest similarity Sandra Step 1: Collect a set of labeled samples, e.g., each item as an object Tim If K=3, then in this case query instance will be classified as positive since 2 nearest neighbors are positive Jane Tim Tim Don Step 3: Classify the input object. e.g., like or dislike the item 24

k-nn: An algorithm to find users/object with similar tastes/subscribers Step 2: Find the k nearest neighbors, the k objects with highest similarity Step 1: Collect a set of labeled samples, e.g., each item as an object If K=3, then in this case query instance will be classified as positive since 3 nearest neighbors are positive Step 3: Classify the input object. e.g., like or dislike the item 25

k-nn Algorithm Step 1: Collect a set of labeled samples (e.g., scraping) y i : prediction we want to make ( + and - in the previous slide) x i : parameters of the object (the users/ the position of label in the previous slide) Step 2: Find the k nearest neighbors of it Step 3: Classify the input object 26

An example using k-nn- in Facebook k-nn user-to-user based CF for friend recommendations 27

Recommendation Example 28

Will it be cool If you can identify user gender? More suggestions can be made based on your gender 29

An Analytic System for User Gender Identification Through User Shared Images Ming Cheung James She 30

Our Motivation Can user gender be reflect by their shared images Dataset name, gender and more images shared by users Within the dataset: 6036/1414 users 1,598,769/1,553,575 user shared images 31

Methodology 1. Some similarity model to evaluate tags on user shared images 2. Some machine learning models (e.g., k-nn) 3. Numerical evaluation with ground truth Tags of user shared pictures 33

accuracy accuracy Results - 2 Impacts of k on gender identification Fotolog Flickr # of testing users=500 K # of testing users=150 K 34

accuracy accuracy Results - 2 Comparison with other approaches Fotolog maximum mean minimum Flickr maximum mean minimum K-NN wk-nn SVM Rand K-NN wk-nn SVM Rand 35

Conclusion You shared information more than your pictures Your pictures reveals your genders Privacy protection vs limited intelligences to serve Potential values and issues of users shared images 36

Social Media Data Collection building a data scraper 37

What if we don t have the dataset Recall - Architecture of Software Applications 38

How do we collect social media data? Databases of a social network (Facebook) may not be directly accessible However, the view of a SN webpage or web services is accessible Database for your SNA Collect data manually? User Apps/ Web API 39

Accessing the web/web services by a scraper Let scraper to view a webpage for you Locate and save the required info. you want by the scraper Database for your SNA SN data Profile Friends list etc. scraper Apps/ Web API 40

-End of Lecture 4 Questions / Comments? 41