How Hard Is Inference for Structured Prediction?

Similar documents
Approximation Algorithms

CPSC 536N: Randomized Algorithms Term 2. Lecture 10

What Does Robustness Say About Algorithms?

Learning Probabilistic models for Graph Partitioning with Noise

Semidefinite Programs for Exact Recovery of a Hidden Community (and Many Communities)

Approximation Algorithms

arxiv: v3 [cs.dm] 12 Jun 2014

Fabian Kuhn. Nicla Bernasconi, Dan Hefetz, Angelika Steger

Lecture Notes on Karger s Min-Cut Algorithm. Eric Vigoda Georgia Institute of Technology Last updated for Advanced Algorithms, Fall 2013.

6 Randomized rounding of semidefinite programs

Limits of Local Algorithms in Random Graphs

Treewidth and graph minors

Parameterized Approximation Schemes Using Graph Widths. Michael Lampis Research Institute for Mathematical Sciences Kyoto University

Cooperative Computing for Autonomous Data Centers

Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev

Important separators and parameterized algorithms

Approximation algorithms for Euler genus and related problems

All lecture slides will be available at CSC2515_Winter15.html

Computer vision: models, learning and inference. Chapter 10 Graphical Models

On communication complexity of classification

Parameterized Approximation Schemes Using Graph Widths. Michael Lampis Research Institute for Mathematical Sciences Kyoto University

The 4/3 Additive Spanner Exponent is Tight

val(y, I) α (9.0.2) α (9.0.3)

Abstract. A graph G is perfect if for every induced subgraph H of G, the chromatic number of H is equal to the size of the largest clique of H.

Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011

Models of distributed computing: port numbering and local algorithms

Flows on surface embedded graphs. Erin Chambers

Approximation Algorithms

Best known solution time is Ω(V!) Check every permutation of vertices to see if there is a graph edge between adjacent vertices

The Ordered Covering Problem

Streaming verification of graph problems

Improved Approximations for Graph-TSP in Regular Graphs

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube

Spectral Clustering and Community Detection in Labeled Graphs

Design and Analysis of Algorithms (VII)

COMP260 Spring 2014 Notes: February 4th

Distribution-Free Models of Social and Information Networks

Chapter Design Techniques for Approximation Algorithms

1 Counting triangles and cliques

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions.

An exact characterization of tractable demand patterns for maximum disjoint path problems

On Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari)

Algorithm Design and Analysis

Loopy Belief Propagation

Network Flow Interdiction on Planar Graphs

A Survey of Correlation Clustering

Lecture and notes by: Nate Chenette, Brent Myers, Hari Prasad November 8, Property Testing

6.856 Randomized Algorithms

On-line Steiner Trees in the Euclidean Plane

2 The Fractional Chromatic Gap

Sublinear Time and Space Algorithms 2016B Lecture 7 Sublinear-Time Algorithms for Sparse Graphs

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost

Direct Routing: Algorithms and Complexity

An Introduction to Chromatic Polynomials

Preventing Unraveling in Social Networks: The Anchored k-core Problem

Superlinear Lower Bounds for Multipass Graph Processing

Fixed Parameter Algorithms

Probabilistic Graphical Models

Hierarchical Clustering: Objectives & Algorithms. École normale supérieure & CNRS

Colouring graphs with no odd holes

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem

A Computational Theory of Clustering

Pearls of Algorithms

Outlier Pursuit: Robust PCA and Collaborative Filtering

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

On Finding Dense Subgraphs

CS675: Convex and Combinatorial Optimization Spring 2018 Consequences of the Ellipsoid Algorithm. Instructor: Shaddin Dughmi

The Parameterized Complexity of Finding Point Sets with Hereditary Properties

Graphs: Introduction. Ali Shokoufandeh, Department of Computer Science, Drexel University

The Surprising Power of Belief Propagation

Coloring 3-Colorable Graphs

s-t Graph Cuts for Binary Energy Minimization

4 Fractional Dimension of Posets from Trees

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Maximal Independent Set

The Remote-Clique Problem Revisited

1 Introduction and Results

Dominating Sets in Triangulations on Surfaces

Probabilistic Graphical Models

Hardness for Maximum Weight Rectangles ARTURS BACKURS, MIT NISHANTH DIKKALA, MIT CHRISTOS TZAMOS, MIT

Repetition: Primal Dual for Set Cover

The Probabilistic Method

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Storage capacity of labeled graphs

PCP and Hardness of Approximation

Problem Set 3. MATH 776, Fall 2009, Mohr. November 30, 2009

The Parameterized Complexity of the Rainbow Subgraph Problem. Falk Hüffner, Christian Komusiewicz *, Rolf Niedermeier and Martin Rötzschke

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 36

Online Motion Planning MA-INF 1314 Online TSP, Shortcut algorithm

Poly-logarithmic Approximation for Maximum Node Disjoint Paths with Constant Congestion

Lecture 1: An Introduction to Online Algorithms

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

CMPSCI611: The SUBSET-SUM Problem Lecture 18

Vertex Insertion Approximates the Crossing Number for Apex Graphs

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

4 Greedy Approximation for Set Cover

Part 5: Structured Support Vector Machines

New Results on Simple Stochastic Games

arxiv: v1 [cs.cg] 15 Sep 2014

Transcription:

How Hard Is Inference for Structured Prediction? Tim Roughgarden (Stanford University) joint work with Amir Globerson (Tel Aviv), David Sontag (NYU), and Cafer Yildirum (NYU) 1

Structured Prediction structured prediction: predict labels of many objects at once, given information about relationships between objects applications: computer vision (objects = pixels, prediction = image segmentation) (Borkar et al., 2013) 2

Structured Prediction structured prediction: predict labels of many objects at once, given information about relationships between objects applications: computer vision (objects = pixels, prediction = image segmentation) NLP (objects = words, prediction = parse tree) etc. today s focus: complexity of inference (given a model), not of learning a model 3

Recovery From Exact Parities Setup: (noiseless) known graph G=(V,E) unknown labeling X:V -> {0,1} given parity of each edge + if X(u)=X(v), - otherwise Goal: recover X. - + + - - + - + + - - + + + - + + + + - - - + + + - + + - - + - + + - - - - - + Solution: (for connected G) label some vertex arbitrarily and propagate. 4

Recovery From Noisy Parities Setup: (with noise) known graph G=(V,E) unknown labeling X:V -> {0,1} given noisy parity of each edge flipped with probability p Goal: (approximately) recover X. - + + - - + - + - - - + + + - + + + - - - - + + + - + + - + + + + + - - - - - + Formally: want algorithm A: {+,-} E -> {0,1} V that minimizes worst-case expected Hamming error: max X {E L~D( X ) [error(a(l), X]} 5

Research Agenda Formally: want estimator A: {+,-} E -> {0,1} V that minimizes worst-case expected Hamming error: max X {E L~D( X ) [error(a(l), X]} (Info-theoretic) What is the minimum expected error possible? How does it depend on p? Or on the structure of the graph? (Computational) When can approximate recovery be done efficiently? How does the answer depend on p and the graph? 6

Analogy: Stochastic Block Model Stochasic Block Model Setup: [Boppana 87], [Bui/Chaudhuri/Leighton/ Sipser 92] [Feige/Killian 01], [McSherry 01], [Mossel/Neeman/Sly 13,14], [Massoulié 14], [Abbe/Bandeira/Hall 15], [Makarychev/Makarychev/Vijayaraghavan 15,16], [Moitra/Perry/Wein 16]... known vertices V, unknown labeling X:V -> {0,1} for every pair v,w, get noisy signal if X(u)=X(v) no noise => get two disjoint cliques noise = 1-a/n if X(u)=X(v), = b/n if X(u) X(v) [a > b] A B Goal: (approximately) recover X. 7

The Graph Matters Example: G = path graph with n vertices. ground truth = 1 st i vertices are 0, last n-i vertices are 1 (i is unknown) p = Θ(log n/n) [very small] 0 0 0 1 1 1 w.h.p., input has Θ(log n) - edges; only one is consistent with the data no algorithm can reliably guess the true - edge; Θ(n) expected error unavoidable + + - + + 8

The Grid: Empirical Evidence 9

Approximate Recovery Definition: a family of graphs allows approximate recovery if there exists an algorithm with expected error f(p)n, where f(p)->0 as p->0 [for n large] non-example: paths. potential example: grids. Questions: which graphs admit approximate recovery? in poly-time? with what functions f(.)? 10

MLE/Correlation Clustering Our algorithm: compute labeling minimizing: number of + edges with bichromatic endpoints + number of - edges with monochromatic endpoints Fact: Can be implemented in polynomial time in planar (and bounded genus) graphs. Fact: Not information-theoretically optimal. optimal: marginal inference 11

Flipping Lemma Definition: A bad set S is a maximal connected subgraph of mislabeled nodes. (w.r.t. X, A(L)) all wrong S 1 S 2 G S 3 12

Flipping Lemma Definition: A bad set S is a maximal connected subgraph of mislabeled nodes. (w.r.t. X, A(L)) Lemma: S bad => at least half the edges of δ(s) were corrupted. Proof idea: (i) Our algorithm gets half the edges of δ(s) correct w.r.t. input (else flipping S improves alleged optimal solution). (ii) But we get all the edges of δ(s) wrong w.r.t. the ground truth. S V\S all wrong all wrong 13

Warm-Up #1: Expanders Graph family: d-regular expander graphs, with δ(s) c d S (for constant c, for all S n/2) Analysis: let bad sets = C 1,...,C k. maximal connected subgraphs of mislabeled nodes note: error = Σ i C i flipping lemma => at least half of the c d C i edges of δ(c i ) were corrupted E[error] 4 E[# corrupted edges]/(c d) 2pn/c [so f(p) = 2p/c] 14

Open Questions Open Question #1: polynomial-time recovery. correlation clustering NP-hard for expanders roughly equivalent to min multicut [Demaine/Emanuel/ Fiat/Immorlica 05] does semidefinite programming help? Open Question #2: determine optimal error rate. upper bound works even in a bounded adversary model (budget of p E edges to corrupt) expected error O(pn) optimal for adversarial case conjecture: O(p d/2 n) is tight for random errors 15

Warm-Up #2: Large Min Cut Graph family: graphs with global min cut Ω(log n). Easy (Chernoff): for a connected subgraph S with δ(s) = i, Pr[S is bad] p i/2. Key fact: (e.g., [Karger 93]) for every α 1, number of α-approximate minimum cuts is at most n 2α. Result: E[error] i=c * S: δ (S) =i i=c * S ipr[s bad] n n 2(i/c* ) p i/2 = o(1) 16

Grid-Like Graphs Graph family: n n grids. Key properties: planar, each face has O(1) sides weak expansion: for every S with S n/2, δ(s) S c (some c > 0) Theorem: computationally efficient recovery with expected Hamming error O(p 2 n). information-theoretic lower bound: Ω(p 2 n) 4-regular => each node ambiguous with prob p 2 17

Grid-Like Graphs: Analysis Lemma: number of connected subgraphs S with δ(s) =i is at most n 3 i. Proof sketch: suffices to count cycles in the dual graph (also a grid). n choices for first vertex; 3 choices for each successive vertex. Result: E[error] i=4 S: δ (S) =i 3 i S ipr[s bad] n p i/2 = O(np 2 ) i=4 18

A Subtlety and a Fix Bug: forgot about connected sets S like S G 19

A Subtlety and a Fix Generalized Flipping Lemma: for connected sets S with holes, at least half of edges of outer boundary corrupted. [else, flip all labels inside outer boundary] => charge errors in S to its filled-in version F(S) sum only over filled-in sets E[error] i=4 S: δ (S) =i 3 i S ipr[s bad] n p i/2 = O(np 2 ) i=4 corrupted S 20 G

More Open Questions 1. Characterize graphs where good approximate recovery is possible (as noise -> 0). is weak expansion sufficient? 2. Computationally efficient recovery beyond planar graphs. (or hardness results) does semidefinite programming help? 3. Take advantage of noisy node labels. major progress: [Foster/Reichman/Sridharan 16] 4. More than two labels. 21