A Class of Submodular Functions for Document Summarization

Similar documents
Submodularity in Speech/NLP

Using Document Summarization Techniques for Speech Data Subset Selection

Graph-based Submodular Selection for Extractive Summarization

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

1 Overview. 2 Applications of submodular maximization. AM 221: Advanced Optimization Spring 2016

Word Alignment via Submodular Maximization over Matroids

Optimization I : Brute force and Greedy strategy

Optimization of submodular functions

PKUSUMSUM : A Java Platform for Multilingual Document Summarization

Coverage Approximation Algorithms

The Encoding Complexity of Network Coding

Lecture 2. 1 Introduction. 2 The Set Cover Problem. COMPSCI 632: Approximation Algorithms August 30, 2017

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs

CS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

Integer Programming Theory

International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15 ISSN

A survey of submodular functions maximization. Yao Zhang 03/19/2015

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

A Study of Global Inference Algorithms in Multi-Document Summarization

Submodular Optimization

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

CMPSCI611: Approximating SET-COVER Lecture 21

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Abstract. A graph G is perfect if for every induced subgraph H of G, the chromatic number of H is equal to the size of the largest clique of H.

Absorbing Random walks Coverage

TELCOM2125: Network Science and Analysis

Clustering: Classic Methods and Modern Views

Lecture 25 Nonlinear Programming. November 9, 2009

Algorithms for Data Science

15-854: Approximations Algorithms Lecturer: Anupam Gupta Topic: Direct Rounding of LP Relaxations Date: 10/31/2005 Scribe: Varun Gupta

Introduction to Mathematical Programming IE406. Lecture 20. Dr. Ted Ralphs

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Summarizing Search Results using PLSI

3 No-Wait Job Shops with Variable Processing Times

Approximation Algorithms

Algorithms for Storage Allocation Based on Client Preferences

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Predictive Indexing for Fast Search

Approximation Algorithms

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I. Instructor: Shaddin Dughmi

Lecture 14: Linear Programming II

Improving the Efficiency of Multi-site Web Search Engines

CS Introduction to Data Mining Instructor: Abdullah Mueen

Peer-to-peer clustering

Distributed Submodular Maximization in Massive Datasets. Alina Ene. Joint work with Rafael Barbosa, Huy L. Nguyen, Justin Ward

Scalable Diversified Ranking on Large Graphs

Module 7. Independent sets, coverings. and matchings. Contents

Variable Selection 6.783, Biomedical Decision Support

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos

Absorbing Random walks Coverage

Approximation Algorithms

Representative & Informative Query Selection for Learning to Rank using Submodular Functions

Scalable Influence Maximization in Social Networks under the Linear Threshold Model

Part I Part II Part III Part IV Part V. Influence Maximization

Week 5. Convex Optimization

Normalized cuts and image segmentation

Subset sum problem and dynamic programming

SUBMODULAR FEATURE SELECTION FOR HIGH-DIMENSIONAL ACOUSTIC SCORE SPACES. Yuzong Liu, Kai Wei, Katrin Kirchhoff, Yisong Song, Jeff Bilmes

Towards more efficient infection and fire fighting

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

CSE 417 Network Flows (pt 3) Modeling with Min Cuts

Online Stochastic Matching CMSC 858F: Algorithmic Game Theory Fall 2010

Combinatorial Optimization and Integer Linear Programming

Cluster algebras and infinite associahedra

Let denote the number of partitions of with at most parts each less than or equal to. By comparing the definitions of and it is clear that ( ) ( )

All lecture slides will be available at CSC2515_Winter15.html

31.6 Powers of an element

OPERATIONS RESEARCH. Linear Programming Problem

REDUNDANCY REMOVAL IN WEB SEARCH RESULTS USING RECURSIVE DUPLICATION CHECK ALGORITHM. Pudukkottai, Tamil Nadu, India

Algorithm Design and Analysis

Notes for Lecture 24

EC422 Mathematical Economics 2

Approximation Algorithms

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

The Ordered Covering Problem

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

1 Linear programming relaxation

Maximum Betweenness Centrality: Approximability and Tractable Cases

Algorithms for Integer Programming

Lecture 9: Pipage Rounding Method

Nash Equilibrium Load Balancing

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

Viral Marketing and Outbreak Detection. Fang Jin Yao Zhang

Network Improvement for Equilibrium Routing

CS 337 Project 1: Minimum-Weight Binary Search Trees

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Structured Optimal Transport

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CSE 417 Network Flows (pt 4) Min Cost Flows

General properties of staircase and convex dual feasible functions

Generalized trace ratio optimization and applications

8 Matroid Intersection

Automatic Summarization

Support vector machines

Topological properties of convex sets

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Transcription:

A Class of Submodular Functions for Document Summarization Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 20, 2011 Lin and Bilmes Submodular Summarization June 20, 2011 1 / 29

Extractive Document Summarization The figure below represents the sentences of a document Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29

Extractive Document Summarization We extract sentences (green) as a summary of the full document Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29

Extractive Document Summarization We extract sentences (green) as a summary of the full document Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29

Extractive Document Summarization We extract sentences (green) as a summary of the full document The summary on the left is a subset of the summary on the right. Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29

Extractive Document Summarization We extract sentences (green) as a summary of the full document The summary on the left is a subset of the summary on the right. Consider adding a new (blue) sentence to each of the two summaries. Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29

Extractive Document Summarization We extract sentences (green) as a summary of the full document The summary on the left is a subset of the summary on the right. Consider adding a new (blue) sentence to each of the two summaries. The marginal (incremental) benefit of adding the new (blue) sentence to the smaller (left) summary is no more than the marginal benefit of adding the new sentence to the larger (right) summary. Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29

Extractive Document Summarization We extract sentences (green) as a summary of the full document The summary on the left is a subset of the summary on the right. Consider adding a new (blue) sentence to each of the two summaries. The marginal (incremental) benefit of adding the new (blue) sentence to the smaller (left) summary is no more than the marginal benefit of adding the new sentence to the larger (right) summary. diminishing returns submodularity Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29

Outline Background on Submodularity 1 Background on Submodularity 2 Problem Setup and Algorithm 3 Submodularity in Summarization 4 New Class of Submodular Functions for Document Summarization 5 Experimental Results 6 Summary Lin and Bilmes Submodular Summarization June 20, 2011 3 / 29

Background on Submodularity Submodular Set Functions There is a finite sized ground set of elements V We use set functions of the form f : 2 V R A set function f is monotone nondecreasing if R S, f (R) f (S). Definition of Submodular Functions For any R S V and k V, k / S, f ( ) is submodular if f (S + k) f (S) f (R + k) f (R) This is known as the principle of diminishing returns S R Lin and Bilmes Submodular Summarization June 20, 2011 4 / 29

Background on Submodularity Example: Number of Colors of Balls in Urns f (R) = f ( ) = 3 f (S) = f ( ) = 4 Given a set A of colored balls f (A): the number of distinct colors contained in the urn The incremental value of an object only diminishes in a larger context (diminishing returns). Lin and Bilmes Submodular Summarization June 20, 2011 5 / 29

Background on Submodularity Example: Number of Colors of Balls in Urns f (R) = f ( f (R + k) = f ( )=3 + )=4 f (S ) = f ( f (S + k) = f ( )=4 + )=4 Given a set A of colored balls f (A): the number of distinct colors contained in the urn The incremental value of an object only diminishes in a larger context (diminishing returns). Lin and Bilmes Submodular Summarization June 20, 2011 5 / 29

Background on Submodularity Why is submodularity attractive? Lin and Bilmes Submodular Summarization June 20, 2011 6 / 29

Background on Submodularity Why is submodularity attractive? Why is convexity attractive? How about submodularity: Lin and Bilmes Submodular Summarization June 20, 2011 6 / 29

Background on Submodularity Why is submodularity attractive? Why is convexity attractive? convexity appears in many mathematical models in economy, engineering and other sciences. minimum can be found efficiently. convexity has many nice properties, e.g. convexity is preserved under many natural operations and transformations. How about submodularity: Lin and Bilmes Submodular Summarization June 20, 2011 6 / 29

Background on Submodularity Why is submodularity attractive? Why is convexity attractive? convexity appears in many mathematical models in economy, engineering and other sciences. minimum can be found efficiently. convexity has many nice properties, e.g. convexity is preserved under many natural operations and transformations. How about submodularity: submodularity arises in many areas: combinatorics, economics, game theory, operation research, machine learning, and (now) natural language processing. minimum can be found in polynomial time submodularity has many nice properties, e.g. submodularity is preserved under many natural operations and transformations (e.g. scaling, addition, convolution, etc.) Lin and Bilmes Submodular Summarization June 20, 2011 6 / 29

Outline Problem Setup and Algorithm 1 Background on Submodularity 2 Problem Setup and Algorithm 3 Submodularity in Summarization 4 New Class of Submodular Functions for Document Summarization 5 Experimental Results 6 Summary Lin and Bilmes Submodular Summarization June 20, 2011 7 / 29

Problem Setup and Algorithm Problem setup The ground set V corresponds to all the sentences in a document. Extractive document summarization: select a small subset S V that accurately represents the entirety (ground set V ). Lin and Bilmes Submodular Summarization June 20, 2011 8 / 29

Problem Setup and Algorithm Problem setup The ground set V corresponds to all the sentences in a document. Extractive document summarization: select a small subset S V that accurately represents the entirety (ground set V ). The summary is usually required to be length-limited. c i : cost (e.g., the number of words in sentence i), b: the budget (e.g., the largest length allowed), knapsack constraint: i S c i b. Lin and Bilmes Submodular Summarization June 20, 2011 8 / 29

Problem Setup and Algorithm Problem setup The ground set V corresponds to all the sentences in a document. Extractive document summarization: select a small subset S V that accurately represents the entirety (ground set V ). The summary is usually required to be length-limited. c i : cost (e.g., the number of words in sentence i), b: the budget (e.g., the largest length allowed), knapsack constraint: i S c i b. A set function f : 2 V R measures the quality of the summary S, Thus, the summarization problem is formalized as: Problem (Document Summarization Optimization Problem) S argmax f (S) subject to: S V c i b. (1) i S Lin and Bilmes Submodular Summarization June 20, 2011 8 / 29

Problem Setup and Algorithm A Practical Algorithm for Large-Scale Summarization When f is both monotone and submodular: A greedy algorithm with partial enumeration (Sviridenko, 2004), theoretical guarantee of near-optimal solution, but not practical for large data sets. Lin and Bilmes Submodular Summarization June 20, 2011 9 / 29

Problem Setup and Algorithm A Practical Algorithm for Large-Scale Summarization When f is both monotone and submodular: A greedy algorithm with partial enumeration (Sviridenko, 2004), theoretical guarantee of near-optimal solution, but not practical for large data sets. A greedy algorithm (Lin and Bilmes, 2010): near-optimal with theoretical guarantee, and practical/scalable! We choose next element with largest ratio of gain over scaled cost: k argmax i U f (G {i}) f (G) (c i ) r. (2) Lin and Bilmes Submodular Summarization June 20, 2011 9 / 29

Problem Setup and Algorithm A Practical Algorithm for Large-Scale Summarization When f is both monotone and submodular: A greedy algorithm with partial enumeration (Sviridenko, 2004), theoretical guarantee of near-optimal solution, but not practical for large data sets. A greedy algorithm (Lin and Bilmes, 2010): near-optimal with theoretical guarantee, and practical/scalable! We choose next element with largest ratio of gain over scaled cost: k argmax i U f (G {i}) f (G) (c i ) r. (2) Scalability: the argmax above can be solved by O(log n) calls of f, thanks to submodularity Integer linear programming (ILP) takes 17 hours vs. greedy which takes < 1 second!! Lin and Bilmes Submodular Summarization June 20, 2011 9 / 29

Problem Setup and Algorithm Objective Function Optimization: Performance in Practice Objective function value 140 120 100 80 60 40 20 0 exact solution op mal r=0 r=0.5 r=1 r=1.5 0 2 4 6 8 10 12 number of sentences in the summary Figure: The plots show the achieved objective function value as the number of selected sentences grows. The plots stop when in each case adding more sentences violates the budget. Lin and Bilmes Submodular Summarization June 20, 2011 10 / 29

Outline Submodularity in Summarization 1 Background on Submodularity 2 Problem Setup and Algorithm 3 Submodularity in Summarization 4 New Class of Submodular Functions for Document Summarization 5 Experimental Results 6 Summary Lin and Bilmes Submodular Summarization June 20, 2011 11 / 29

Submodularity in Summarization MMR is non-monotone submodular Maximal Margin Relevance (MMR, Carbonell and Goldstein, 1998): MMR is very popular in document summarization. MMR corresponds to an objective function which is submodular but non-monotone (see paper for details). Therefore, the greedy algorithm s performance guarantee does not apply in this case (since MMR is not monotone). Lin and Bilmes Submodular Summarization June 20, 2011 12 / 29

Submodularity in Summarization MMR is non-monotone submodular Maximal Margin Relevance (MMR, Carbonell and Goldstein, 1998): MMR is very popular in document summarization. MMR corresponds to an objective function which is submodular but non-monotone (see paper for details). Therefore, the greedy algorithm s performance guarantee does not apply in this case (since MMR is not monotone). Moreover, the greedy algorithm of MMR does not take cost into account, and therefore could lead to solutions that are significantly worse than the solutions found by the greedy algorithm with scaled cost. Lin and Bilmes Submodular Summarization June 20, 2011 12 / 29

Submodularity in Summarization MMR is non-monotone submodular Maximal Margin Relevance (MMR, Carbonell and Goldstein, 1998): MMR is very popular in document summarization. MMR corresponds to an objective function which is submodular but non-monotone (see paper for details). Therefore, the greedy algorithm s performance guarantee does not apply in this case (since MMR is not monotone). Moreover, the greedy algorithm of MMR does not take cost into account, and therefore could lead to solutions that are significantly worse than the solutions found by the greedy algorithm with scaled cost. MMR-like approaches: non-monotone because summary redundancy is penalized negatively. Lin and Bilmes Submodular Summarization June 20, 2011 12 / 29

Submodularity in Summarization Concept-based approach Concepts: n-grams, keywords, etc. Maximizes the weighted credit of concepts covered the summary Lin and Bilmes Submodular Summarization June 20, 2011 13 / 29

Submodularity in Summarization Concept-based approach Concepts: n-grams, keywords, etc. Maximizes the weighted credit of concepts covered the summary: submodular! (similar to the colored ball examples we saw) The objectives in the nice talk (Berg-Kirkpatrick et al., 2011) we saw at the beginning of this section are, actually, submodular, when value(b) 0. Lin and Bilmes Submodular Summarization June 20, 2011 13 / 29

Submodularity in Summarization Concept-based approach Concepts: n-grams, keywords, etc. Maximizes the weighted credit of concepts covered the summary: submodular! (similar to the colored ball examples we saw) The objectives in the nice talk (Berg-Kirkpatrick et al., 2011) we saw at the beginning of this section are, actually, submodular, when value(b) 0. Lin and Bilmes Submodular Summarization June 20, 2011 13 / 29

Submodularity in Summarization Even ROUGE-N itself is monotone submodular!! ROUGE-N: high correlation to human evaluation (Lin 2004). Lin and Bilmes Submodular Summarization June 20, 2011 14 / 29

Submodularity in Summarization Even ROUGE-N itself is monotone submodular!! ROUGE-N: high correlation to human evaluation (Lin 2004). Theorem (Lin and Bilmes, 2011) ROUGE-N is monotone submodular (proof in paper). Lin and Bilmes Submodular Summarization June 20, 2011 14 / 29

Submodularity in Summarization Even ROUGE-N itself is monotone submodular!! ROUGE-N: high correlation to human evaluation (Lin 2004). Theorem (Lin and Bilmes, 2011) ROUGE-N is monotone submodular (proof in paper). 15 14 ROUGE-2 Recall (%) 13 12 11 10 greedy algorithm 9 Human 8 0 0.2 0.4 0.6 0.8 1 (scale on cost) r Figure: Oracle experiments on DUC-05. The red dash line indicates the best ROUGE-2 recall score of human summaries (summary with ID C). Lin and Bilmes Submodular Summarization June 20, 2011 14 / 29

New Class of Submodular Functions for Document Summarization Outline 1 Background on Submodularity 2 Problem Setup and Algorithm 3 Submodularity in Summarization 4 New Class of Submodular Functions for Document Summarization 5 Experimental Results 6 Summary Lin and Bilmes Submodular Summarization June 20, 2011 15 / 29

New Class of Submodular Functions for Document Summarization The General Form of Our Submodular Functions Two properties of a good summary: relevance and non-redundancy. Common approaches (e.g., MMR): encourage relevance and (negatively) penalize redundancy. The redundancy penalty is usually what violates monotonicity. Our approach: we positively reward diversity instead of negatively penalizing redundancy: Lin and Bilmes Submodular Summarization June 20, 2011 16 / 29

New Class of Submodular Functions for Document Summarization The General Form of Our Submodular Functions Two properties of a good summary: relevance and non-redundancy. Common approaches (e.g., MMR): encourage relevance and (negatively) penalize redundancy. The redundancy penalty is usually what violates monotonicity. Our approach: we positively reward diversity instead of negatively penalizing redundancy: Definition (The general form of our submodular functions) f (S) = L(S) + λr(s) L(S) measures the coverage (or fidelity) of summary set S to the document. R(S) rewards diversity in S. λ 0 is a trade-off coefficient. Analogous to the objectives widely used in machine learning: loss + regularization Lin and Bilmes Submodular Summarization June 20, 2011 16 / 29

New Class of Submodular Functions for Document Summarization Coverage function Coverage Function L(S) = i V min {C i (S), α C i (V )} C i : 2 V R is monotone submodular, and measures how well i is covered by S. 0 α 1 is a threshold coefficient sufficient coverage fraction. Lin and Bilmes Submodular Summarization June 20, 2011 17 / 29

New Class of Submodular Functions for Document Summarization Coverage function Coverage Function L(S) = i V min {C i (S), α C i (V )} C i : 2 V R is monotone submodular, and measures how well i is covered by S. 0 α 1 is a threshold coefficient sufficient coverage fraction. if min{c i (S), αc i (V )} = αc i (V ), then sentence i is well covered by summary S (saturated). Lin and Bilmes Submodular Summarization June 20, 2011 17 / 29

New Class of Submodular Functions for Document Summarization Coverage function Coverage Function L(S) = i V min {C i (S), α C i (V )} C i : 2 V R is monotone submodular, and measures how well i is covered by S. 0 α 1 is a threshold coefficient sufficient coverage fraction. if min{c i (S), αc i (V )} = αc i (V ), then sentence i is well covered by summary S (saturated). After saturation, further increases in C i (S) won t increase the objective function values (return diminishes). Therefore, new sentence added to S should focus on sentences that are not yet saturated, in order to increasing the objective function value. Lin and Bilmes Submodular Summarization June 20, 2011 17 / 29

New Class of Submodular Functions for Document Summarization Coverage function Coverage Function L(S) = i V min {C i (S), α C i (V )} C i measures how well i is covered by S. One simple possible C i (that we use in our experiments and works well) is: C i (S) = j S w i,j, where w i,j 0 measures the similarity between i and j. With this C i, L(S) is monotone submodular, as required. Lin and Bilmes Submodular Summarization June 20, 2011 18 / 29

New Class of Submodular Functions for Document Summarization Diversity reward function Diversity Reward Function R(S) = K r j. i=1 j P i S P i, i = 1, K is a partition of the ground set V r j 0: singleton reward of j, which represents the importance of j to the summary. square root over the sum of rewards of sentences belong to the same partition (diminishing returns). R(S) is monotone submodular as well. Lin and Bilmes Submodular Summarization June 20, 2011 19 / 29

New Class of Submodular Functions for Document Summarization Diversity reward function - how does it reward diversity? P 1 2 1 3 4 P 3 3 partitions: P 1, P 2, P 3. Singleton reward for sentence 1, 2, 3 and 4: r 1 = 5, r 2 = 5, r 3 = 4, r 4 = 3. Current summary: S = {1, 2} consider adding a new sentence, 3 or 4. P 2 A diverse (non-redundant) summary: {1, 2, 4}. Lin and Bilmes Submodular Summarization June 20, 2011 20 / 29

New Class of Submodular Functions for Document Summarization Diversity reward function - how does it reward diversity? P 1 2 1 3 4 P 3 3 partitions: P 1, P 2, P 3. Singleton reward for sentence 1, 2, 3 and 4: r 1 = 5, r 2 = 5, r 3 = 4, r 4 = 3. Current summary: S = {1, 2} consider adding a new sentence, 3 or 4. P 2 A diverse (non-redundant) summary: {1, 2, 4}. Modular objective: R({1, 2, 3}) = 5 + 5 + 4 = 14 > R({1, 2, 4}) = 5 + 5 + 3 = 13 Submodular objective: R({1, 2, 3}) = 5 + 5 + 4 = 8 < R({1, 2, 4}) = 5 + 5 + 3 = 13 Lin and Bilmes Submodular Summarization June 20, 2011 20 / 29

New Class of Submodular Functions for Document Summarization Diversity Reward Function singleton reward of j: the importance of being j (to the summary). Query-independent (generic) case: r j = 1 w i,j. N i V Query-dependent case, given a query Q, r j = β 1 w i,j + (1 β)r j,q N i V where r j,q measures the relevance between j and query Q. Lin and Bilmes Submodular Summarization June 20, 2011 21 / 29

New Class of Submodular Functions for Document Summarization Diversity Reward Function singleton reward of j: the importance of being j (to the summary). Query-independent (generic) case: r j = 1 w i,j. N i V Query-dependent case, given a query Q, r j = β 1 w i,j + (1 β)r j,q N i V where r j,q measures the relevance between j and query Q. Multi-resolution Diversity Reward K 1 R(S) = i=1 j P (1) i S K 2 r j + i=1 j P (2) i S r j + Lin and Bilmes Submodular Summarization June 20, 2011 21 / 29

Outline Experimental Results 1 Background on Submodularity 2 Problem Setup and Algorithm 3 Submodularity in Summarization 4 New Class of Submodular Functions for Document Summarization 5 Experimental Results 6 Summary Lin and Bilmes Submodular Summarization June 20, 2011 22 / 29

Experimental Results Generic Summarization DUC-04: generic summarization Table: ROUGE-1 recall (R) and F-measure (F) results (%) on DUC-04. DUC-03 was used as development set. DUC-04 R F L 1(S) 39.03 38.65 R 1(S) 38.23 37.81 L 1(S) + λr 1(S) 39.35 38.90 Takamura and Okumura (2009) 38.50 - Wang et al. (2009) 39.07 - Lin and Bilmes (2010) - 38.39 Best system in DUC-04 (peer 65) 38.28 37.94 Lin and Bilmes Submodular Summarization June 20, 2011 23 / 29

Experimental Results Generic Summarization DUC-04: generic summarization Table: ROUGE-1 recall (R) and F-measure (F) results (%) on DUC-04. DUC-03 was used as development set. DUC-04 R F L 1(S) 39.03 38.65 R 1(S) 38.23 37.81 L 1(S) + λr 1(S) 39.35 38.90 Takamura and Okumura (2009) 38.50 - Wang et al. (2009) 39.07 - Lin and Bilmes (2010) - 38.39 Best system in DUC-04 (peer 65) 38.28 37.94 Note: this is the best ROUGE-1 result ever reported on DUC-04. Lin and Bilmes Submodular Summarization June 20, 2011 23 / 29

Experimental Results Query-focused Summarization DUC-05,06,07: query-focused summarization For each document cluster, a title and a narrative (query) describing a user s information need are provided. Nelder-Mead (derivative-free) for parameter training. Lin and Bilmes Submodular Summarization June 20, 2011 24 / 29

DUC-05 results Experimental Results Table: ROUGE-2 recall (R) and F-measure (F) results (%) R F L 1(S) + λr Q (S) 7.82 7.72 L 1(S) + 3 κ=1 λκr Q,κ(S) 8.19 8.13 Daumé III and Marcu (2006) 6.98 - Wei et al. (2010) 8.02 - Best system in DUC-05 (peer 15) 7.44 7.43 DUC-06 was used as training set for the objective function with single diversity reward. DUC-06 and 07 were used as training sets for the objective function with multi-resolution diversity reward (new results since our camera-ready version of the paper) Lin and Bilmes Submodular Summarization June 20, 2011 25 / 29

DUC-05 results Experimental Results Table: ROUGE-2 recall (R) and F-measure (F) results (%) R F L 1(S) + λr Q (S) 7.82 7.72 L 1(S) + 3 κ=1 λκr Q,κ(S) 8.19 8.13 Daumé III and Marcu (2006) 6.98 - Wei et al. (2010) 8.02 - Best system in DUC-05 (peer 15) 7.44 7.43 DUC-06 was used as training set for the objective function with single diversity reward. DUC-06 and 07 were used as training sets for the objective function with multi-resolution diversity reward (new results since our camera-ready version of the paper) Note: this is the best ROUGE-2 result ever reported on DUC-05. Lin and Bilmes Submodular Summarization June 20, 2011 25 / 29

DUC-06 results Experimental Results Table: ROUGE-2 recall (R) and F-measure (F) results (%) R F L 1(S) + λr Q (S) 9.75 9.77 L 1(S) + 3 κ=1 λκr Q,κ(S) 9.81 9.82 Celikyilmaz and Hakkani-tür (2010) 9.10 - Shen and Li (2010) 9.30 - Best system in DUC-06 (peer 24) 9.51 9.51 DUC-05 was used as training set for the objective function with single diversity reward. DUC-05 and 07 were used as training sets for the objective function with multi-resolution diversity reward (new results since our camera-ready version of the paper) Lin and Bilmes Submodular Summarization June 20, 2011 26 / 29

DUC-06 results Experimental Results Table: ROUGE-2 recall (R) and F-measure (F) results (%) R F L 1(S) + λr Q (S) 9.75 9.77 L 1(S) + 3 κ=1 λκr Q,κ(S) 9.81 9.82 Celikyilmaz and Hakkani-tür (2010) 9.10 - Shen and Li (2010) 9.30 - Best system in DUC-06 (peer 24) 9.51 9.51 DUC-05 was used as training set for the objective function with single diversity reward. DUC-05 and 07 were used as training sets for the objective function with multi-resolution diversity reward (new results since our camera-ready version of the paper) Note: this is the best ROUGE-2 result ever reported on DUC-06. Lin and Bilmes Submodular Summarization June 20, 2011 26 / 29

DUC-07 results Experimental Results Table: ROUGE-2 recall (R) and F-measure (F) results (%) R F L 1(S) + λr Q (S) 12.18 12.13 L 1(S) + 3 κ=1 λκr Q,κ(S) 12.38 12.33 Toutanova et al. (2007) 11.89 11.89 Haghighi and Vanderwende (2009) 11.80 - Celikyilmaz and Hakkani-tür (2010) 11.40 - Best system in DUC-07 (peer 15), using web search 12.45 12.29 DUC-05 was used as training set for the objective function with single diversity reward. DUC-05 and 06 were used as training sets for the objective function with multi-resolution diversity reward. Lin and Bilmes Submodular Summarization June 20, 2011 27 / 29

DUC-07 results Experimental Results Table: ROUGE-2 recall (R) and F-measure (F) results (%) R F L 1(S) + λr Q (S) 12.18 12.13 L 1(S) + 3 κ=1 λκr Q,κ(S) 12.38 12.33 Toutanova et al. (2007) 11.89 11.89 Haghighi and Vanderwende (2009) 11.80 - Celikyilmaz and Hakkani-tür (2010) 11.40 - Best system in DUC-07 (peer 15), using web search 12.45 12.29 DUC-05 was used as training set for the objective function with single diversity reward. DUC-05 and 06 were used as training sets for the objective function with multi-resolution diversity reward. Note: this is the best ROUGE-2 F-measure result ever reported on DUC-07, and best ROUGE-2 R without web search expansion. Lin and Bilmes Submodular Summarization June 20, 2011 27 / 29

Outline Summary 1 Background on Submodularity 2 Problem Setup and Algorithm 3 Submodularity in Summarization 4 New Class of Submodular Functions for Document Summarization 5 Experimental Results 6 Summary Lin and Bilmes Submodular Summarization June 20, 2011 28 / 29

Summary Summary Submodularity is natural fit for summarization problems (e.g., even ROUGE-N is submodular). A greedy algorithm using scaled cost: both scalable and near-optimal, thanks to submodularity. We have introduced a class of submodular functions: expressive and general (more advanced NLP techniques not used, but could be easily incorporated into our objective functions). We show the best results yet on DUC-04, 05, 06 and 07. Lin and Bilmes Submodular Summarization June 20, 2011 29 / 29