Dynamic Embeddings for User Profiling in Twitter

Similar documents
Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Video annotation based on adaptive annular spatial partition scheme

Mining Human Trajectory Data: A Study on Check-in Sequences. Xin Zhao Renmin University of China,

A Study of Pattern-based Subtopic Discovery and Integration in the Web Track

Automatic people tagging for expertise profiling in the enterprise

Visual Query Suggestion

Northeastern University in TREC 2009 Web Track

University of Delaware at Diversity Task of Web Track 2010

Microsoft Research Asia at the Web Track of TREC 2009

IRCE at the NTCIR-12 IMine-2 Task

Developing Focused Crawlers for Genre Specific Search Engines

ECNU at 2017 ehealth Task 2: Technologically Assisted Reviews in Empirical Medicine

Supervised Reranking for Web Image Search

Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

Diversification of Query Interpretations and Search Results

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags

A probabilistic model to resolve diversity-accuracy challenge of recommendation systems

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

A Deep Relevance Matching Model for Ad-hoc Retrieval

Table of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17

ICTNET at Web Track 2010 Diversity Task

Adaptive Learning of an Accurate Skin-Color Model

Combining Implicit and Explicit Topic Representations for Result Diversification

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Efficient Diversification of Web Search Results

Entity and Knowledge Base-oriented Information Retrieval

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning

Improving Patent Search by Search Result Diversification

for Searching Social Media Posts

Real-time Collaborative Filtering Recommender Systems

Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA

STREAMING RANKING BASED RECOMMENDER SYSTEMS

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

BUPT at TREC 2009: Entity Track

Semantic Estimation for Texts in Software Engineering

Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information

Semantic Segmentation. Zhongang Qi

De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs,

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Computer Vision. Exercise Session 10 Image Categorization

COMP 465: Data Mining Still More on Clustering

Link Prediction for Social Network

Automatic Domain Partitioning for Multi-Domain Learning

CHAPTER 5 CLUSTERING USING MUST LINK AND CANNOT LINK ALGORITHM

Fast Sample Generation with Variational Bayesian for Limited Data Hyperspectral Image Classification

Modern Retrieval Evaluations. Hongning Wang

Retrieval by Content. Part 3: Text Retrieval Latent Semantic Indexing. Srihari: CSE 626 1

Reducing Redundancy with Anchor Text and Spam Priors

08 An Introduction to Dense Continuous Robotic Mapping

Latent Topic Model Based on Gaussian-LDA for Audio Retrieval

Supervised Learning for Image Segmentation

Characterizing Search Intent Diversity into Click Models

CHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL

A Study of MatchPyramid Models on Ad hoc Retrieval

Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Towards Optimized Multimodal Concept Indexing

GraphGAN: Graph Representation Learning with Generative Adversarial Nets

Bring Semantic Web to Social Communities

A Bayesian Approach to Hybrid Image Retrieval

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Improving Recognition through Object Sub-categorization

Warped Mixture Models

Learning a Hierarchical Embedding Model for Personalized Product Search

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES

jldadmm: A Java package for the LDA and DMM topic models

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation

Social Network Mining An Introduction

D B M G Data Base and Data Mining Group of Politecnico di Torino

Meta-path based Multi-Network Collective Link Prediction

Entity Information Management in Complex Networks

Multi-label classification using rule-based classifier systems

Query Independent Scholarly Article Ranking

Personalized Web Search

E6885 Network Science Lecture 11: Knowledge Graphs

Automatic Shadow Removal by Illuminance in HSV Color Space

Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web

Evolution-Based Clustering Technique for Data Streams with Uncertainty

Heterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts

Addressing the Challenges of Underspecification in Web Search. Michael Welch

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Opinions in Federated Search: University of Lugano at TREC 2014 Federated Web Search Track

Non-exhaustive, Overlapping k-means

Identifying Community For Important Intensions In Complex Data Structure On The Online Social Networks

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Inferring Protocol State Machine from Network Traces: A Probabilistic Approach

Exploratory Analysis: Clustering

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

Metric Learning for Large-Scale Image Classification:

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Meta-path based Multi-Network Collective Link Prediction

Transcription:

Dynamic Embeddings for User Profiling in Twitter Shangsong Liang 1, Xiangliang Zhang 1, Zhaochun Ren 2, Evangelos Kanoulas 3 1 KAUST, Saudi Arabia 2 JD.com, China 3 University of Amsterdam, The Netherlands

Overview The Task Background and Related Work Our Method Dynamic User and Word Embedding Model (DUWE) Streaming Keyword Diversification Model (SKDM) Experiments Conclusion 2

The Task Input: A stream of tweets generated across the time Twitter Users Tweets over time Output: A set of keywords to profile the user at different point in time Given a user at time t Sport Food 3

The Task Twitter Users Tweets over time Sport Food Relevant Given a user at time t Diversified Dynamic 4

Background of User Profiling Problem Expert finding task at TREC 2005 enterprise track Given documents which describes expert candidates, answer a query with a sorted name list in a specific domain, uncovering associations between people and topics A generative language modeling approach in Balong et al (2007) Works on a Static document collection Assumes users profiling results are unchanged Need Dynamic User Profiling 5

Dynamic User Profiling Approaches ExperTime (Rybak et al 2014) A probabilistic model for learning how personal research interests evolve (Fang and Godavarthy 2014) 6

Limitations of Current User Profiling Methods Treat words as atomic units leading to a vocabulary mismatch that harms performance Represent words and users in disjoint vocabulary spaces making it difficult to measure the similarity between users and words when constructing the profile Can words and users be embedded in the same semantic space? Can their embedding be modeled in the dynamic environment? 7

Related Work in Dynamic Topic Models and Dynamic Embedding Dynamic Topic Models: modeling dynamic user interests Topic over time model (Wang et al. KDD 2006) Topic tracking model (Iwata et al. IJCAI 2009) Dynamic user clustering topic model (Liang et al. KDD 2016), etc None of them is for user profiling Dynamic Word Embedding Dynamic word embedding by separating data into time bins, and apply word2vec within each bin (Kim et al. 2014, Hamilton et al. 2016) Or based on Bayesian skip-gram model (Bamler and Mandt, 2017) All of them are for words only but not for users All of them are not for user profiling 8

Overview The Task Background and Related Work Our Method Dynamic User and Word Embedding Model (DUWE) Streaming Keyword Diversification Model (SKDM) Experiments Conclusion 9

Our Approach Dynamic User and Word Embedding Model (DUWE) Infer both users and words embeddings over time in the same semantic space Enable to measure the similarities between users and words embeddings Streaming Keyword Diversification Model Retrieve relevant keywords to profile users current interests over time Diversify the returned relevant keywords such that the keywords can cover all aspects of the users interests 10

Dynamic User and Word Embedding User Diffusion p(u t U t 1 ) / N (U t 1, 2 t 1I) N(0, 2 0 I) Observed cooccurrence of words at t-1 z t 1 y t 1 z t y t n + t 1 m + t 1 n + t m + t Observed user-word pairs at t-1 v t 1 u t 1 v t u t V U t 1 V U t Word representation at t-1 β t 1 α t 1 β t α t User representation at t Word Diffusion p(v t V t 1 ) / N (V t 1, 2 t 1I) N(0, 2 0 I) 11

Diffusion of user representation p(u t U t 1 ) / N (U t 1, 2 t 1I) N(0, 2 0 I) Gaussian Prior According to Kalman filtering, we define the variance of transition kernel for a user embedding from t-1 to t A. F F measuring the word distribution changes from previous time step t-1 to the current time step t for user u 12

Diffusion of word representation p(v t V t 1 ) / N (V t 1, Gaussian Prior 2 0 I) According to Kalman filtering, we define the variance of transition kernel for a word embedding from t-1 to t 2 t 1I) N(0, A. F F measuring the word distribution changes from t-1 to the current time step t 13

DUWE model inference Apply the skip-gram filtering for the inference (Bamler et al. 2017) and the variational inference algorithm to obtain the embeddings Posterior distribution over and conditional on the statistics information and as follows: positive and negative indicator matrices for all user-to-word pairs positive and negative indicator matrices for all word-to-word pairs where we have: model transition for users model transition for words skip-gram model for words skip-gram model for user and words 14

Streaming Keyword Diversification Model generating top-k relevant and diversified keywords for profiling users interests at time t. 15

Overview The Task Background and Related Work Our Method Dynamic User and Word Embedding Model (DUWE) Streaming Keyword Diversification Model (SKDM) Experiments Conclusion 16

Experimental Setup Datasets 1,375 users randomly sampled from Twitter 3.78 million tweets posted by the users from the beginning of their registrations up to May 31, 2015 Two types of Ground Truth: One for evaluating Relevance-oriented (RGT) performance and another for evaluating Diversity-oriented (DGT) performance. Evaluation Metrics Relevance: Pre (Precision), NDCG, MRR, MAP Their semantic version of the metrics, denoted as Pre-S, NDCG-S, MRR-S, MAP-S Diversity: Pre-IA (Intent-Aware Precision), α-ndcg, MRR-IA, MAP-IA 17

Experimental Setup Baselines Non-dynamic Embedding Models Skip-Gram Model, i.e., word2vec Model (SGM) Distributed Representations of Documents (DRD) Dynamic Traditional Profiling Model Predictive Language Model (PLM) Dynamic Topic Model User Clustering Topic model (UCT) Dynamic Embedding Models Dynamic Independent Skip-Gram model (DISG) Dynamic Pre-initialized Skip-Gram model (DPSG) Dynamic Independent Distributed Representations of documents (DIDR) Dynamic Pre-initialized Distributed Representations of documents (DPDR) 18

Overall Performance Average relevance performance on time periods of each month 19

Overall Performance Diversity performance on time periods of each month 20

An Example User s Dynamic Profiling Results over Time Top-6 keywords of an example user s dynamic profile, whose interests cover a number of aspects and dramatically change over time, from Sport, fitness, kitchen, exercise, to education. 21

Relevance and diversity performance over time Relevance performance over time Diversity performance over time 22

Performance w.r.t. embedding dimensionality 23

Overview The Task Background and Related Work Our Method Dynamic User and Word Embedding Model (DUWE) Streaming Keyword Diversification Model (SKDM) Experiments Conclusion 24

Conclusions Study the problem of dynamic user profiling in Twitter Propose a Dynamic User and Word Embedding model (DUWE) Propose a Streaming Keyword Diversification Model (SKDM) Evaluate the performance of the proposed models in real dataset, Twitter 25

Thank you for your attention! Our paper at http://www.kdd.org/kdd2018/accepted-papers/view/dynamicembeddings-for-user-profiling-in-twitter Lab of Machine Intelligence and knowledge Engineering (MINE): http://mine.kaust.edu.sa/