COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

Similar documents
Location-Based Social Networks: Locations

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data

Mining Web Data. Lijun Zhang

Context based Re-ranking of Web Documents (CReWD)

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

Recommendation System for Location-based Social Network CS224W Project Report

Mining Web Data. Lijun Zhang

Knowledge Discovery and Data Mining 1 (VO) ( )

Learning Travel Recommendations From User- Generated GPS Traces

Tag Based Image Search by Social Re-ranking

Part 11: Collaborative Filtering. Francesco Ricci

CLR: A Collaborative Location Recommendation Framework based on Co-Clustering

Matrix Co-factorization for Recommendation with Rich Side Information HetRec 2011 and Implicit 1 / Feedb 23

Trajectory analysis. Ivan Kukanov

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning Graph-based POI Embedding for Location-based Recommendation

Introduction to Data Mining

Constructing Popular Routes from Uncertain Trajectories

PERSONALIZED TAG RECOMMENDATION

Information Retrieval

IALP 2016 Improving the Effectiveness of POI Search by Associated Information Summarization

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches

MadLINQ: Large-Scale Disributed Matrix Computation for the Cloud

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

WebSci and Learning to Rank for IR

Integrating User Preference with Theft Identification and Profile Changer in LBSNs Divya. M, V.R.Balasaraswathi, A. Abirami, G.

Link Prediction for Social Network

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

Supervised Random Walks

Preference-Aware POI Recommendation with Temporal and Spatial Influence

Mobile, Smartphones, Wi-Fi, and Apps

Chapter 1 Introduction to Computers

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

Mining User Similarity Based on Location History

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report

Personalized Tour Planning System Based on User Interest Analysis

iaide Mid-term report Moquan Chen & Stian Kilaas

Social Itinerary Recommendation from User-Generated Digital Trails

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Recommender Systems - Content, Collaborative, Hybrid

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

Location, Location, Location

Recommender Systems. Master in Computer Engineering Sapienza University of Rome. Carlos Castillo

Comparative Study of Subspace Clustering Algorithms

Heterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Density Based Co-Location Pattern Discovery

Place Recommendation from Check-in Spots on Location-Based Online Social Networks

Urban Point-of-Interest Recommendation by Mining User Check-in Behaviors

Trip Mining and Recommendation from Geo-tagged Photos

Research on Destination Prediction for Urban Taxi based on GPS Trajectory

Find, New, Copy, Web, Page - Tagging for the (Re-)Discovery of Web Pages

T2CBS: Mining Taxi Trajectories for Customized Bus Systems

Visual Query Suggestion

A System for Discovering Regions of Interest from Trajectory Data

An Algorithm of Parking Planning for Smart Parking System

Recommender Systems 6CCS3WSN-7CCSMWAL

A System for Identifying Voyage Package Using Different Recommendations Techniques

Mobile based Text Image Translation System for Smart Tourism. Saw Zay Maung Maung UCSY, Myanmar. 23 November 2017, Brunei

Time Distortion Anonymization for the Publication of Mobility Data with High Utility

Chapter 1: Function Sense Activity 1.2 & 3

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

APPENDIX A: INSTRUMENTS

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

idiary: from GPS signals to a text-searchable diary

Bilevel Sparse Coding

PERSONALIZED MOBILE SEARCH ENGINE BASED ON MULTIPLE PREFERENCE, USER PROFILE AND ANDROID PLATFORM

Crime - Based Predictive Analysis and Warning System

Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky

METRO BUS TRACKING SYSTEM

CSE 258. Web Mining and Recommender Systems. Advanced Recommender Systems

Document Clustering: Comparison of Similarity Measures

International Journal of Advanced Research in Computer Science and Software Engineering

Clustering and Visualisation of Data

Extracting Rankings for Spatial Keyword Queries from GPS Data

Tourism Statistics in Azerbaijan

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO Lecture 7: Information Retrieval II. Aidan Hogan

Personalized Web Search

Clustering: Overview and K-means algorithm

Identifying The Stay Point Using GPS Trajectory of Taxis Hao Xiao 1,a, Wenjun Wang 2,b, Xu Zhang 3,c

Mining Social Network Graphs

Tag-based Social Interest Discovery

Word Embeddings in Search Engines, Quality Evaluation. Eneko Pinzolas

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Reviewer Profiling Using Sparse Matrix Regression

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Part II Workflow discovery algorithms

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

Link Analysis and Web Search

A Memetic Heuristic for the Co-clustering Problem

Mining the Most Influential k-location Set From Massive Trajectories

Km4City Smart City API: an integrated support for mobility services

CyLab Mobility Research Center. Martin Griss & Priya Narasimhan

Advanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions

TrajAnalytics: A software system for visual analysis of urban trajectory data

Transcription:

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia WWW 2010

AGENDA Introduction System Architecture Experiments Discussion and Conclusion

INTRODUCTION AND MOTIVATION Users now sharing GPS trajectories on the Web Wisdom of crowd: incorporating users knowledge Travel experience: Some places are more popular than the others User activities: The food is delicious --> dining at that place A comment A GPS trajectory

GOAL: TO ANSWER 2 TYPICAL QUESTIONS Q1: what can I do there if I visit some place? (Activity recommendation given location query) Q2: where should I go if I want to do something? (Location recommendation given activity query) Location query Recommended activity list Activity query Recommended location list A recommended location

PROBLEM DEFINITION How to well model the location-activity relation Encode it into a matrix Activities Activities An entry denotes how popular an activity is performed at a location Locations Locations Ranking along the Columns or rows Example ()*+,--./"0,12" 3,*-45"6.51" 78)/99:;/<:/" Location recommendation =):*,5>" AB8,+,1,)/" C8)DD,/9"!" #" $" #" %" &" &" $" '" Activity recommendation!"#$%&"'()*#"++*',$%&"'( =):*,5>?" ()*+,--./"0,12"@"3,*-45"6.51"@"78)/99:;/<:/" -#%&.&%/()*#"++*',$%&"'( ()*+,--./"0,12?" =):*,5>"@"AB8,+,1,)/"@"C8)DD,/9"

CONTRIBUTIONS In practice, it s sparse! User comments are few (in out dataset, <0.6% entries are filled) 1'2(*34" 567*)*/*'-" 87'99*-:" &'()*++,-".*/0" ;*(+<3"=,3/" >7'-::2?-@2-"!" #" #" #" $" #" $" #" %"!"#$%&"'(.+$,/,$,"'(.+$,/,$,"'( )*+#$,*-'( )*+#$,*-'( 2(.+$,/,$,"'(!"#$%&"'()*'#%&"'$+&%&,-(.#%&/&%0(#"11,+$%&"'-(

SYSTEM ARCHITECTURE 3 Location-Activity Matrix 6 Laptops and PCs PDAs and Smart-phones Location-based Activity Statistics Collaborative Location and Activity Recommender 2 Stay Regions 4 Location-Feature Matrix 5 Activity-Activity Matrix Grid-based Clustering Location Feature Extraction Activity Correlation Mining 1 GPS Log POI Category Database World Wide Web 1 Data Inputs 2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature Extraction 5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

SYSTEM COMPONENTS!"#$%&"'(.+$,/,$,"'(.+$,/,$,"'( )*+#$,*-'( )*+#$,*-'( 2(.+$,/,$,"'(!"#$%&"'()*'#%&"'$+&%&,-(.#%&/&%0(#"11,+$%&"'-( 3 Location-Activity Matrix 6 Laptops and PCs PDAs and Smart-phones Location-based Activity Statistics Collaborative Location and Activity Recommender 2 Stay Regions 4 Location-Feature Matrix 5 Activity-Activity Matrix Grid-based Clustering Location Feature Extraction Activity Correlation Mining 1 GPS Log POI Category Database World Wide Web 1 Data Inputs 2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature Extraction 5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

SYSTEM COMPONENTS!"#$%&"'(.+$,/,$,"'(.+$,/,$,"'( )*+#$,*-'( )*+#$,*-'( 2(.+$,/,$,"'(!"#$%&"'()*'#%&"'$+&%&,-(.#%&/&%0(#"11,+$%&"'-( 3 Location-Activity Matrix 6 Laptops and PCs PDAs and Smart-phones Location-based Activity Statistics Collaborative Location and Activity Recommender 2 Stay Regions 4 Location-Feature Matrix 5 Activity-Activity Matrix Grid-based Clustering Location Feature Extraction Activity Correlation Mining 1 GPS Log POI Category Database World Wide Web 1 Data Inputs 2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature Extraction 5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

GPS LOG PROCESSING GPS trajectories Latitude, Longitude, Arrival Timestamp p 1 : 39.975, 116.331, 9/9/2009 17:54 p 2 : 39.978, 116.308, 9/9/2009 18:08 p K : 39.992, 116.333, 9/12/2009 13:56 a GPS trajectory p 1 P 2 p 3 p 4 a stay point s p 6 p 5 p 7!"#$%&'()*+%!" Raw GPS Points Stay Points Stay Regions Stand for a geo-spot where a user has stayed for a while Preserve the sequence and vicinity info Stand for a geo-region that we may recommend Discover the meaningful locations

Grid-based clustering Greedy algorithm Easy, fast and effective O(n log n), due to sorting Return fixed-size regions Example STAY REGION EXTRACTION 3 10 2 5 1 8 6 A big shopping area ( Zhonggancun ) in west Beijing, >6km2 4!+#,-./012$ 3,$4$"556$!"#$%&'()*$!"#$%$$&'()*+,-.(#(/0$!7#$89:;('$!"#$%$1'()*+,-.(#(/0$!<#$=>?@$ABC2D/>?1E$ 3@47556$

Grid-based clustering Greedy algorithm Easy, fast and effective O(n log n), due to sorting Return fixed-size regions Example STAY REGION EXTRACTION 3 10 2 5 1 8 6 A big shopping area ( Zhonggancun ) in west Beijing, >6km2 4!+#,-./012$ 3,$4$"556$!"#$%&'()*$!"#$%$$&'()*+,-.(#(/0$!7#$89:;('$!"#$%$1'()*+,-.(#(/0$!<#$=>?@$ABC2D/>?1E$ 3@47556$

Grid-based clustering Greedy algorithm Easy, fast and effective O(n log n), due to sorting Return fixed-size regions Example STAY REGION EXTRACTION 3 10 2 5 1 8 6 A big shopping area ( Zhonggancun ) in west Beijing, >6km2 4!+#,-./012$ 3,$4$"556$!"#$%&'()*$!"#$%$$&'()*+,-.(#(/0$!7#$89:;('$!"#$%$1'()*+,-.(#(/0$!<#$=>?@$ABC2D/>?1E$ 3@47556$

Grid-based clustering Greedy algorithm Easy, fast and effective O(n log n), due to sorting Return fixed-size regions Example STAY REGION EXTRACTION 3 10 2 5 1 8 6 A big shopping area ( Zhonggancun ) in west Beijing, >6km2 4!+#,-./012$ 3,$4$"556$!"#$%&'()*$!"#$%$$&'()*+,-.(#(/0$!7#$89:;('$!"#$%$1'()*+,-.(#(/0$!<#$=>?@$ABC2D/>?1E$ 3@47556$

Grid-based clustering Greedy algorithm Easy, fast and effective O(n log n), due to sorting Return fixed-size regions Example STAY REGION EXTRACTION 3 10 2 5 1 8 6 A big shopping area ( Zhonggancun ) in west Beijing, >6km2 4!+#,-./012$ 3,$4$"556$!"#$%&'()*$!"#$%$$&'()*+,-.(#(/0$!7#$89:;('$!"#$%$1'()*+,-.(#(/0$!<#$=>?@$ABC2D/>?1E$ 3@47556$

LOCATION-ACTIVITY EXTRACTION Location-activity matrix GPS: 39.903, 116.391, 14/9/2009 15:25 Stay Region: 39.910, 116.400 (Forbidden City) We took a tour bus to see around along the forbidden city moat Activity: tourism $%&'())*+#,(-.# 34%+5506+70+# 8# /%0&(12#!"# $%%)# 8#!"#$%&"'()#%&*&%+,-$%.&/, User comments are few -> this matrix is sparse! Our objective: to fill this matrix.

SYSTEM COMPONENTS!"#$%&"'(.+$,/,$,"'(.+$,/,$,"'( )*+#$,*-'( )*+#$,*-'( 2(.+$,/,$,"'(!"#$%&"'()*'#%&"'$+&%&,-(.#%&/&%0(#"11,+$%&"'-( 3 Location-Activity Matrix 6 Laptops and PCs PDAs and Smart-phones Location-based Activity Statistics Collaborative Location and Activity Recommender 2 Stay Regions 4 Location-Feature Matrix 5 Activity-Activity Matrix Grid-based Clustering Location Feature Extraction Activity Correlation Mining 1 GPS Log POI Category Database World Wide Web 1 Data Inputs 2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature Extraction 5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

LOCATION FEATURE EXTRACTION Location features: Points of Interests (POIs) Stay Region: 39.980, 116.306 (Zhongguancun)!"#$%&!%'$( [restaurant, bank, shop] = [3, 1, 1] restaurant!"#$%&!%'$( )%'*( TF-IDF style normalization: feature = [0.13, 0.32, 0.18] #+,--.'/(0%11( '()*+,,-.%/+01% )-2034)3.0% *3.5% 6% 78(.9943.:4.%!"#$%!"$&% 6%!"#$%&"'()*$%+,*-.$%,&/-

TF-IDF Term-Frequency Inverse Document Frequency restaurant!"#$%&!%'$(!"#$%&!%'$( )%'*( #+,--.'/(0%11( Example Assume in 10 locations, 8 have restaurants (less distinguishing), while 2 have banks and 4 have shops tf-idf(restaurant) = (3/5)*log(10/8) = 0.13 tf-idf(bank) = (1/5)*log(10/2) = 0.32 tf-idf(shop) = (1/5)*log(10/4) = 0.18

SYSTEM COMPONENTS!"#$%&"'(.+$,/,$,"'(.+$,/,$,"'( )*+#$,*-'( )*+#$,*-'( 2(.+$,/,$,"'(!"#$%&"'()*'#%&"'$+&%&,-(.#%&/&%0(#"11,+$%&"'-( 3 Location-Activity Matrix 6 Laptops and PCs PDAs and Smart-phones Location-based Activity Statistics Collaborative Location and Activity Recommender 2 Stay Regions 4 Location-Feature Matrix 5 Activity-Activity Matrix Grid-based Clustering Location Feature Extraction Activity Correlation Mining 1 GPS Log POI Category Database World Wide Web 1 Data Inputs 2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature Extraction 5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

ACTIVITY CORRELATION EXTRACTION How possible for one activity to happen, if another activity happens? Automatically mined from the Web, potentially useful when #(act) is large Tourism and Amusement and Food and Drink Correlation = h(1.16m), where h is a normalization function

ACTIVITY CORRELATION EXTRACTION (CONT.) Most mined correlations are reasonable Example Tourism with other activities?@g$?@c$?@f$?@e$?@d$?@a$ AB>AC$ >?@A$ ;),<</01$ :,,5$ ;<,'9%$ =,6/"$!"#$%"&'()$*+',-$./012$?@G$?@C$ ;),<</01$?@F$?@E$ =,6/"$?@D$ :,,5$ ;<,'9%$?@A$ AB>AC$ >?@A$ 34-&0$5"%/10$*&6"'&1"$,0$7$%4#8"(9%2$ Tourism-Shopping more likely to happen together than Tourism-Sports

SYSTEM COMPONENTS!"#$%&"'(.+$,/,$,"'(.+$,/,$,"'( )*+#$,*-'( )*+#$,*-'( 2(.+$,/,$,"'(!"#$%&"'()*'#%&"'$+&%&,-(.#%&/&%0(#"11,+$%&"'-( 3 Location-Activity Matrix 6 Laptops and PCs PDAs and Smart-phones Location-based Activity Statistics Collaborative Location and Activity Recommender 2 Stay Regions 4 Location-Feature Matrix 5 Activity-Activity Matrix Grid-based Clustering Location Feature Extraction Activity Correlation Mining 1 GPS Log POI Category Database World Wide Web 1 Data Inputs 2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature Extraction 5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATION (CLAR) Collaborative filtering, with collective matrix factorization Features Activities Activities Locations Y = UW T U Locations X = UV T V Activities Z = VV T Low rank approximation, by minimizing where U, V and W are the low-dimensional representations for the locations, activities and location features, respectively. I is an indicatory matrix. After getting U* and V*, reconstruct the incomplete X Efficient: complexity is linear to #(loc), can handle large data

EXPERIMENTS Data 2.5 years (2007.4-2009.10) 162 users 13K GPS trajectories, 4M GPS points, 140K kilometers 530 comments age<=22 26<=age<29 22<age<=25 age>=30 Microsoft employees Other companies' employees Government staffs College students 7% 8% 16% 45% 40% 62% 16% 6%

EXPERIMENTS Evaluation Invite 5 subjects to give ratings independently Rating criteria

EXPERIMENTS(CONT.) Evaluation Location recommendation Measured on top 10 returned locations for each of the 5 activities Activity recommendation Measured on the 5 activities for top 20 popular locations with most visits ndcg as evaluation matrix

Normalized Discounted Cumulative Gain NDCG For Example: 3, 2, 3, 0, 1, 2 i reli logi 1 3 N/A N/A 2 2 1 2 3 3 1.59 1.887 4 0 2 0 5 1 2.32 0.431 6 2 2.59 0.772

SYSTEM PERFORMANCES Impact of location feature information (i.e. λ1 @ Fig.11) Impact of activity correlation information (i.e. λ2 @ Fig.12) Observations The weight for each information source should be moderate Using both sources outperforms using single source (i.e. λ1=0, λ2=0)

Single collaborative filtering (SCF) Using only the location-activity matrix Unifying collaborative filtering (UCF) BASELINE COMPARISON Using all 3 matrices, but in a different way For each missing entry, combine the entries belonging to the top N similar locations top N similar activities in a weighted way One-tail t-test p1<0.01 Two-tail t-test p2<0.01

IMPACT OF STAY REGION SIZE Stay region: cluster of stay points, i.e. locations We propose a grid-based clustering algorithm to get stay regions d=300 implies a size of 300 300 m2 Stay region size Should not be too small (two regions refer to same place) or too big (hard to find) One-tail t-test p1<0.05 Two-tail t-test p2<0.05

IMPACT OF USER NUMBER #(user) -> data -> #(loc) Run on PC with dual core CPU, 2.33GHz, 2G RAM Running time is linear to #(loc), converge fast (<300 iterations) #(stay point) does not necessary linearly increase w.r.t. #(user)!"#$%&$'()!"#$%&'()*+$,'' 0.'%/.'1./.2'!"-#./,'' (./3)/4%+5.'!"-#./,''

DISCUSSION Impact of the location types to activity recommendation Recommend 5 activities for top 20 locations with most visits Aggregate the evaluations and pick top 2 activities as location types shopping & movie area sports & tourism area Usually also suitable for food hunting and sometimes tourism, dominated by them Fewer comments Usually more comments for tourism 0.8 0.85 0.9 0.95 1 ndcg@5 food & sports area More often happen Strong dependency on location features -More likely to have many restaurant POIs - sports with parks and stadium

DISCUSSION (CONT.) Impact of the activity types to location recommendation Recommend top 10 locations for each activity sports & exercises shopping movie & shows food & drinks More often happen Popular places with higher scores are more likely to be recommended Many of them are available for shows/movies 0.7 0.75 0.8 0.85 0.9 0.95 1 ndcg@10 tourism & amusement Usually more comments Popular places are usually not suitable for sports & exercises Usually fewer comments

CONCLUSION We show how to mine knowledge from the real-world GPS data to answer two typical questions: If we want to do something, where shall we go? If we visit some place, what can we do there? We evaluated our system on a large GPS dataset >7% improvement on activity recommendation >20% improvement on location recommendation over the simple baseline without exploiting any additional info Future Work Incorporate user features to provide personalized recommendation Establish a comprehensive social network based on user activity and location history

COMMENTS Using many state of the art methods in IR field Large GPS dataset (but no need to have labels) GPS comments seems unrealistic Other way to profile the GPS trajectory Future work Incorporate transportation modes Social network recommendation on friends