Exploiting Conversation Structure in Unsupervised Topic Segmentation for s

Similar documents
Learning the Structures of Online Asynchronous Conversations

Computing Similarity between Cultural Heritage Items using Multimodal Features

Predicting Messaging Response Time in a Long Distance Relationship

Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural Networks

What is this Song About?: Identification of Keywords in Bollywood Lyrics

Getting More from Segmentation Evaluation

Using the Kolmogorov-Smirnov Test for Image Segmentation

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Postgraduate Diploma in Marketing

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Classification. I don t like spam. Spam, Spam, Spam. Information Retrieval

DeepWalk: Online Learning of Social Representations

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Sentiment analysis under temporal shift

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Automatic Summarization

Inferring Ongoing Activities of Workstation Users by Clustering

Detection of Question-Answer Pairs in Conversations

Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky

MACHINE LEARNING FOR SOFTWARE MAINTAINABILITY

Scalable Object Classification using Range Images

Using Question-Answer Pairs in Extractive Summarization of Conversations

Fast or furious? - User analysis of SF Express Inc

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Interactive Visual Text Analytics for Decision Making. Shixia Liu Microsoft Research Asia

Annotated Suffix Trees for Text Clustering

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Query Difficulty Prediction for Contextual Image Retrieval

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web

Exploiting Discourse Analysis for Article-Wide Temporal Classification

BUBBLE RAP: Social-Based Forwarding in Delay-Tolerant Networks

Visualization and text mining of patent and non-patent data

Supervised classification of law area in the legal domain

ECS289: Scalable Machine Learning


VisoLink: A User-Centric Social Relationship Mining

CSE 494 Project C. Garrett Wolf

Making Sense Out of the Web

CHAPTER 4: CLUSTER ANALYSIS

Optimizing feature representation for speaker diarization using PCA and LDA

Day 3 Lecture 1. Unsupervised Learning

Structured Learning. Jun Zhu

Predict Topic Trend in Blogosphere

Unsupervised Rank Aggregation with Distance-Based Models

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews

Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning

Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London

Semi-supervised Learning

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Fuzzy Implementation of proposed hybrid algorithm using Topic Tilling & C99 for Text Segmentation

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks

Applying the Annotation View on Messages for Discussion Search

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Towards Summarizing the Web of Entities

Visual Representations for Machine Learning

A Taxonomy of Semi-Supervised Learning Algorithms

Learning to Reweight Terms with Distributed Representations

Improving Recognition through Object Sub-categorization

Network Traffic Measurements and Analysis

A Document Graph Based Query Focused Multi- Document Summarizer

Lightly Supervised Learning of Procedural Dialog Systems

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Boundary Recognition in Sensor Networks. Ng Ying Tat and Ooi Wei Tsang

CS 534: Computer Vision Segmentation and Perceptual Grouping

Artificial Intelligence. Programming Styles

Applications of admixture models

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

AUTOMATIC TEXT SUMMARIZATION

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

Search Engines. Information Retrieval in Practice

Exploratory Analysis: Clustering

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 93-94

Component ranking and Automatic Query Refinement for XML Retrieval

Detection and Extraction of Events from s

CS 229 Midterm Review

Plagiarism Detection Using FP-Growth Algorithm

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park

Interaction Supervisor Printed Help. PureConnect powered by Customer Interaction Center (CIC) 2018 R2. Abstract

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Research at Google: With a Case of YouTube-8M. Joonseok Lee, Google Research

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

A Multilingual Social Media Linguistic Corpus

Synchronizing Location Servers with Cisco Wireless LAN Controllers and Cisco WCS

Text Document Clustering Using DPM with Concept and Feature Analysis

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Automatic summarization of video data

Slides for Data Mining by I. H. Witten and E. Frank

Machine Learning. Unsupervised Learning. Manfred Huber

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining

Transcription:

Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails Shafiq Joty, Giuseppe Carenini, Gabriel Murray, Raymond Ng University of British Columbia Vancouver, Canada EMNLP 2010 1

Topic Segmentation Topic is something about which the participants of a conversation discuss or argue. Email thread about arranging a conference can have topics: location and time, registration, food menu, workshops Topic assignment: Clustering the sentences of an email thread into a set of coherent topical clusters. EMNLP 2010 2

Example From: Charles To: WAI AU Guidelines Date: Thu May Subj: Phone connection to ftof meeting. It is probable that we can arrange a telephone connection, to call in via a US bridge. <Topic id = 1> Are there people who are unable to make the face to face meeting, eting, but would like us to have this facility? <Topic id = 1> From: William To: Charles Date:Thu May Subj: Re: Phone connection to ftof meeting. Are there people who are unable to make the face to face meeting,, but would like us to have this facility? At least one people would. <Topic id = 1>.. From: Charles To: WAI AU Guidelines Date: Mon Jun Subj: RE: Phone connection to ftof meeting. Please note the time zone difference, and if you intend to only be there for part of the time let us know which part of the time. <Topic id = 2> 9am - 5pm Amsterdam time is 3am - 11am US Eastern time which is midnight to 8am pacific time. <Topic id = 2> Until now we have got 12 people who want to have a ptop connection. <Topic id = 1> EMNLP 2010 3

Motivation Our main research goal (on asynchronous conversation): Information extraction Summarization Topic segmentation is often considered a prerequisite for other higher-level conversation analysis. Applications: Text summarization, Information ordering, Automatic QA, Information extraction and retrieval, Intelligent user interfaces. EMNLP 2010 4

Challenges Emails are different from written monologue and dialog: Asynchronous and distributed. Informal. Different styles of writing. Short sentences. Same topic can reappear. Relying on headers are often inadequate. No reliable annotation scheme, no standard corpus, and no agreed upon metrics available. EMNLP 2010 5

Example of Challenges From: William To: Charles Date: Thu May Subj: Re: Phone connection to ftof meeting. Are there people who are unable to make Short the and face to face meeting,, but would like us to have this facility? informal At least one people would. <Topic id = 1>.. From: Charles To: WAI AU Guidelines Date: Mon Jun Subj: RE: Phone connection to ftof meeting. Header is misleading Please note the time zone difference, and if you intend to only be there for part of the time let us know which part of the time. <Topic id = 2> Topics 9am - 5pm Amsterdam time is 3am - 11am US Eastern time which reappear is midnight to 8am pacific time. <Topic id = 2> Until now we have got 12 people who want to have a ptop connection <Topic id = 1> EMNLP 2010 6

Contributions: Outline of the Rest of the Talk Corpus: Dataset Annotations Metrics Agreement Segmentation Models Existing Models LCSeg LDA Extensions LCSeg+FQG LDA+FQG Evaluation Future work EMNLP 2010 7

BC3 email corpus Dataset 40 email threads from W3C corpus. 3222 sentences. On average five emails per thread. Previously annotated with: Speech acts and meta sentences, Subjectivity, Extractive and abstractive summaries. New topic annotations will be made publicly available: http://www.cs.ubc.ca/labs/lci/bc3.html EMNLP 2010 8

Topic Annotation Process Two phase pilot study: Five randomly picked email threads. Five UBC graduate students in the first phase. One postdoc in the second phase. Actual topic annotation: Three 4th year undergraduates (CS major and native speaker). Participants were also given a human written summary. EMNLP 2010 9

Annotation Tasks First task: Read an email thread and a human written summary. List the topics discussed. Example: <Topic id 1, location and time of the ftof mtg. > <Topic id 2, phone connection to the mtg. > Second task: Annotate each sentence with the most appropriate topic (id). Multiple topics were allowed. Predefined topics: OFF-TOPIC, INTRO, END 100% agreement on the predefined topics. EMNLP 2010 10

Agreement/Evaluation Metrics Number of topics varies across annotations. Kappa not applicable. Segmentation in conversation not sequential. WindowDiff (WD) and P k also not applicable. More appropriate metrics (Elsner( and Charniak, ACL-08): One-to to-one. Loc k. M-to-One. EMNLP 2010 11

Metrics (1-to to-1) 1-to-1 measures the global similarity by pairing up the clusters of 2 annotations to maximize the total overlap. transform according to optimal mapping Vs 70% EMNLP 2010 12

Metrics (loc k ) loc k measures the local agreement between two annotations within a context of k sentences. Different Same For Different 66% EMNLP 2010 13

Inter-annotator Agreement Mean Max Min 1-to-1 0.804 1 0.31 loc 3 0.831 1 0.43 Agreements are pretty good! How annotators disagree: Some are much finer-grained than others. Mean Max Min # of Topics 2.5 7 1 Entropy 0.94 2.7 0 M-to-1 gives an intuition of annotator s specificity. EMNLP 2010 14

Metrics (M-to to-1) M-to-1 maps each of the clusters of the 1st annotation to the single cluster in the 2nd annotation with which it has the greatest overlap, then computes the percentage of overlap. Mean Max Min M-to-1 0.949 1 0.61 To compare models we should use 1-to-1 and loc k. EMNLP 2010 15

Outline of the Rest of the Talk Corpus: Dataset Annotations Metrics Agreement Segmentation Models Existing Models LCSeg LDA Extensions LCSeg+FQG LDA+FQG Evaluation Future work EMNLP 2010 16

Related Work: Existing Segmentation Models Segmentation in monolog and sync. dialog: Supervised: Binary classification with features. Unsupervised: LCSeg (Galley et al., ACL 03). LDA (Georgescul( et al., ACL 08). Multi-party chat (Conversation disentanglement): Graph-based clustering (Elsner( and Charniak,, ACL 08). Asynchronous conversations (emails, blogs): To our knowledge no work. EMNLP 2010 17

LDA on Email Corpus Latent Dirichlet Allocation (Blei( et al., 03): Generative model. Generation process: Choose a topic. Choose a word. Each email is a document. Inference gives distributions of words over the topics. Assuming the words in a sentence occur independently, we compute distributions of sentences over topics. Assign topic by taking argmax over the topics. EMNLP 2010 18

LCSeg of Email Corpus Lexical Chain Segmenter (Galley et al., 03): Order the emails based on their temporal relation. Compute lexical chains based on word repetition. Rank the chains according to two measures: Number of repetition. Compactness of the chain. Score of the words in a chain is same as the rank of the chain. Measure similarity between two consecutive windows of sentences. Assign a boundary if the measure falls below a threshold. EMNLP 2010 19

Limitations of the Two Models Both LDA and LCSeg make BOW assumptions. Ignore important conversation features: Reply-to relation. Usage of quotations. In our corpus people use quotations to talk about the same topic. Example: >Are there people who are unable to make the face to face meeting, but would like us to have this facility? At least one people would. <Topic id = 1> In BC3, usage of quotations per thread is: 6.44. EMNLP 2010 20

What We Need We need to: Capture the conversation structure at the quotation level. Incorporate this structure into the models.

Extracting Conversation Structure (Carenini et al., ACL 08) We analyze the actual body of the emails. We find two kinds of fragments: New fragment (depth level 0) Quoted fragment (depth level > 0) We form a fragment quotation graph (FQG): Nodes represent fragments. Edges represent referential relations. EMNLP 2010 22

Fragment Quotation Graph E 1 a E 2 b > a E 3 c > b > > a Nodes Identify quoted and new fragments Edges Neighbouring quotations E 6 > g i > h j E 5 g h > > d > f > > e E 4 d e > c > > b > > > a a b c e d f g h i j An email conversation with 6 emails. EMNLP 2010 23

LDA with FQG Our primary goal is to regularize LDA so that sentences in nearby fragments fall in the same topical cluster. Regularize the topic-word distr. with a word network. Standard Dirichlet prior doesn t t allow this. Andrzejewski et al., (2009) describes how to encode domain knowledge using Dirichlet Forest prior. We re-implemented this model (only must link ). We construct word network by connecting words in the same or adjacent fragments. EMNLP 2010 24

LCSeg with FQG Extract the paths (sub-conversations) of FQG. On each path run LCSeg. e g i a b c f d h j Sentences in common fragments fall in multiple segments.

LCSeg with FQG (Cont..) Consolidate different segments: Form graph where Nodes represent sentences. Edge weight w(u,v) ) represents the number of cases, sentences u and v fall in the same segment. Find optimal clusters using normalized cut criteria (Shi & Malik,, 2000).

Outline of the Rest of the Talk Corpus: Dataset Annotations Metrics Agreement Segmentation Models Existing Models LCSeg LDA Extensions LCSeg+FQG LDA+FQG Evaluation Future work EMNLP 2010 27

Baselines: Evaluation All different: Each sentence a separate topic. All same: Whole thread is a single topic. Speaker: Sentences from each participant constitute a separate topic. Blocks of k(= 5, 10, 15): Consecutive group of k sentences a separate topic. Speaker and Blocks of 5 are two strong baselines. EMNLP 2010 28

Results Scores Baselines Systems Huma n Mean 1-1 Speaker Block 5 LDA LDA+FQG LCSeg LCSeg+FQG 0.52 0.38 0.57 0.62 0.62 0.68 0.80 Mean 0.64 0.57 0.54 0.61 0.72 0.71 0.83 loc 3 Our systems performs better than baselines but worse than humans. EMNLP 2010 29

Results Scores Baselines Systems Huma n Mean 1-1 Speaker Block 5 LDA LDA+FQG LCSeg LCSeg+FQG 0.52 0.38 0.57 0.62 0.62 0.68 0.80 Mean 0.64 0.57 0.54 0.61 0.72 0.71 0.83 loc 3 LDA performs very disappointingly. FQG helps LDA. EMNLP 2010 30

Results Scores Baselines Systems Huma n Mean 1-1 Speaker Block 5 LDA LDA+FQG LCSeg LCSeg+FQG 0.52 0.38 0.57 0.62 0.62 0.68 0.80 Mean 0.64 0.57 0.54 0.61 0.72 0.71 0.83 loc 3 LCSeg is a better model than LDA. EMNLP 2010 31

Results Scores Baselines Systems Huma n Mean 1-1 Speaker Block 5 LDA LDA+FQG LCSeg LCSeg+FQ G 0.52 0.38 0.57 0.62 0.62 0.68 0.80 Mean 0.64 0.57 0.54 0.61 0.72 0.71 0.83 loc 3 FQG helps LCSeg in 1-1 metric. Loc3 suffers a bit but not significantly. LCSeg+FQG is the best model. EMNLP 2010 32

Future Work Consider other important features: Speaker. Mention of names. Subject of the email. Topic shift cue words. Transfer our approach to other similar domains Synchronous domains (chats, meetings). Asynchronous domains (blogs). EMNLP 2010 33

Questions? Thanks EMNLP 2010 34

Acknowledgements 6 pilot annotators. 3 test annotators. 3 anonymous reviewers. NSERC PGS award. NSERC BIN project. NSERC discovery grant. ICICS at UBC. EMNLP 2010 35

Metrics (M-to to-1) M-to-1 maps each of the clusters of the 1st annotation to the single cluster in the 2nd annotation with which it has the greatest overlap, then computes the percentage of overlap. M-to-1 mapping 1-to-1: 75% M-to-1: 100% Mean Max Min M-to-1 0.949 1 0.61 To compare models we should use 1-to-1 and loc k. EMNLP 2010 36

Results Scores Baselines Systems Human Speaker Block 5 LDA LDA+FQG LCSeg LCSeg+FQG Mean 1-1 0.52 0.38 0.57 0.62 0.62 0.68 0.80 Max 1-1 0.94 0.77 1.00 1.00 1.00 1.00 1.00 Min 1-1 0.23 0.14 0.24 0.24 0.33 0.33 0.31 Mean 0.64 0.57 0.54 0.61 0.72 0.71 0.83 loc k Max loc k 0.97 0.73 1.00 1.00 1.00 1.00 1.00 Min loc k 0.27 0.42 0.38 0.38 0.40 0.40 0.43 EMNLP 2010 37