The limits of adaptive sensing

Similar documents
Constrained adaptive sensing

Lecture 7. s.t. e = (u,v) E x u + x v 1 (2) v V x v 0 (3)

I How does the formulation (5) serve the purpose of the composite parameterization

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee

Measurements and Bits: Compressed Sensing meets Information Theory. Dror Baron ECE Department Rice University dsp.rice.edu/cs

The 4/3 Additive Spanner Exponent is Tight

Paul's Online Math Notes Calculus III (Notes) / Applications of Partial Derivatives / Lagrange Multipliers Problems][Assignment Problems]

Lecture 1: An Introduction to Online Algorithms

Lecture 2 - Introduction to Polytopes

Computational Geometry: Lecture 5

The exam is closed book, closed notes except your one-page cheat sheet.

AXIOMS FOR THE INTEGERS

Compressive Sensing. Part IV: Beyond Sparsity. Mark A. Davenport. Stanford University Department of Statistics

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

We show that the composite function h, h(x) = g(f(x)) is a reduction h: A m C.

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

9.5 Equivalence Relations

Lecture 4: Linear Programming

Online Stochastic Matching CMSC 858F: Algorithmic Game Theory Fall 2010

Probabilistic Graphical Models

CS261: A Second Course in Algorithms Lecture #14: Online Bipartite Matching

The Simplex Algorithm

Online Facility Location

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.

Combinatorial Gems. Po-Shen Loh. June 2009

Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator

A Geometric Analysis of Subspace Clustering with Outliers

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Lecture 17 Sparse Convex Optimization

6 Randomized rounding of semidefinite programs

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

Treewidth and graph minors

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if

Vertex Cover Approximations

Feature Selection for Image Retrieval and Object Recognition

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

CHAPTER 8. Copyright Cengage Learning. All rights reserved.

Clustering Using Graph Connectivity

Problem Set 1. Solution. CS4234: Optimization Algorithms. Solution Sketches

6. Advanced Topics in Computability

Limits and Derivatives (Review of Math 249 or 251)

Chapter 3. Set Theory. 3.1 What is a Set?

arxiv: v1 [math.co] 24 Aug 2009

Quickest Search Over Multiple Sequences with Mixed Observations

Some Hardness Proofs

1 Variations of the Traveling Salesman Problem

Structurally Random Matrices

Bulgarian Math Olympiads with a Challenge Twist

GRAPH DECOMPOSITION BASED ON DEGREE CONSTRAINTS. March 3, 2016

Scribe from 2014/2015: Jessica Su, Hieu Pham Date: October 6, 2016 Editor: Jimmy Wu

6 Distributed data management I Hashing

Probabilistic Graphical Models

Adaptive Linear Programming Decoding of Polar Codes

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Algorithms and Game Theory Date: 12/3/15

Approximation Algorithms

Notes for Lecture 24

BELOW, we consider decoding algorithms for Reed Muller

Lecture 8: The Traveling Salesman Problem

Linear programming II João Carlos Lourenço

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 24: Online Algorithms

Approximation Algorithms

In this chapter, we define limits of functions and describe some of their properties.

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

ITERATIVE decoders have gained widespread attention

Linear methods for supervised learning

Basic Combinatorics. Math 40210, Section 01 Fall Homework 4 Solutions

2 What does it mean that a crypto system is secure?

Generative Adversarial Network: a Brief Introduction. Lili Mou

AtCoder World Tour Finals 2019

Interleaving Schemes on Circulant Graphs with Two Offsets

PRAM Divide and Conquer Algorithms

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

Answers to specimen paper questions. Most of the answers below go into rather more detail than is really needed. Please let me know of any mistakes.

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

Move-to-front algorithm

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

Optimization of Design. Lecturer:Dung-An Wang Lecture 8

Discrete Optimization. Lecture Notes 2

Copyright 2000, Kevin Wayne 1

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Maximal Independent Set

Sparse Reconstruction / Compressive Sensing

Chapter 4 Concepts from Geometry

Stanford University CS359G: Graph Partitioning and Expanders Handout 18 Luca Trevisan March 3, 2011

3. Replace any row by the sum of that row and a constant multiple of any other row.

Reducing Directed Max Flow to Undirected Max Flow and Bipartite Matching

Linear Block Codes. Allen B. MacKenzie Notes for February 4, 9, & 11, Some Definitions

(Refer Slide Time: 01.26)

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

Joint Entity Resolution

However, this is not always true! For example, this fails if both A and B are closed and unbounded (find an example).

Lecture 15: The subspace topology, Closed sets

Change-Point Estimation of High-Dimensional Streaming Data via Sketching

On the Balanced Case of the Brualdi-Shen Conjecture on 4-Cycle Decompositions of Eulerian Bipartite Tournaments

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

Random Simplicial Complexes

Paul's Online Math Notes. Online Notes / Algebra (Notes) / Systems of Equations / Augmented Matricies

Transcription:

The limits of adaptive sensing Mark A. Davenport Stanford University Department of Statistics

Sparse Estimation -sparse How well can we estimate?

Background: Dantzig Selector Choose a random matrix fill out the entries of with i.i.d. samples from a sub- Gaussian distribution with select rows from a random unitary matrix. If, then using -minimization (e.g., Dantzig selector, LASSO) we can achieve Is this the best we can do?

Can We Do Better? Via a better choice of? Better recovery algorithm? Assume that we have a sensing power budget that requires for and that the rows are selected in advance, i.e., nonadaptively. Theorem For any matrix and recovery procedure, if with, then See Raskutti, Wainwright, and Yu (2009), Ye and Zhang (2010), Candès and Davenport (2011)

Intuition Suppose that with and that

Compressive Sensing and SNR dense sparse We are using most of our sensing power to sense entries that aren t even there! Tremendous SNR loss Can potentially do much better if we can somehow concentrate our sensing power on the nonzeros

Adaptivity to the Rescue? Think of sensing as a game of 20 questions Simple strategy: Use measurements to find the support, and the remainder to estimate the values. If support estimate is correct:

Does Adaptivity Really Help? Sometimes Information-based complexity: Adaptivity doesn t help! assumes signal lies in a set satisfying certain conditions noise-free measurements adaptivity reduces minimax error over by at most Nevertheless, adaptivity can still help [Indyk et al. - 2011] reduced number of measurements in a probabilistic setting still requires noise-free measurements What about noise? distilled sensing (Haupt, Castro, Nowak, and others) message seems to be that adaptivity really helps in noise

Main Result Suppose we have a budget of measurements of the form where and. The vector can have an arbitrary dependence on the measurement history, i.e., Theorem For any adaptive measurement strategy and any recovery procedure, Thus, in general, adaptivity does not significantly help! [Arias-Castro, Candès, and Davenport - 2011]

A Detour Down Fano s Highway We know that feedback does not (substantially) increase the capacity of a Gaussian channel. This is very similar in flavor to our result, so can we use the same technique? We could construct a packing set and via Fano s inequality, obtain a lower bound on where The distribution of given is potentially very nasty it is not clear how we could bound

Alternative Strategy Step 1: Consider sparse signals with nonzeros of amplitude Step 2: Show that if you given a budget of measurements, you cannot detect the support very well. Step 3: Immediately translate this into a lower bound on the MSE. To make things simpler, we will consider a Bernoulli prior instead of a uniform -sparse prior:

Proof of Main Result Let, and be an estimate of obtained via any adaptive measurement strategy. Set Lemma Under the Bernoulli prior, if, then For any estimator, define

Proof of Main Result Thus, Plug in and this reduces to The hard part is proving the required lemma.

Proof of Lemma Define if and 0 otherwise. Let and For each term in the sum, we can lower bound by the Bayes risk of the optimal detector (the LRT). Towards this end, let and define:

Likelihood Ratio Test The likelihood ratio test (LRT) will set when and has risk bounded by Thus, Our result follows from

Pinsker s Inequality Applying Pinsker twice we obtain Consider the case of and set and If, then we can write and similarly for

Bounding the KL Divergence From the convexity of the KL divergence, we obtain To calculate this divergence, observe that if then under and under Moreover, and similarly for

Bounding the KL Divergence Combining all of this we obtain Thus,

Bounding the KL Divergence Similarly, Recall that we originally wanted to bound Plugging in our bound (which holds for any ) we obtain Summing over, we finally arrive at

Adaptivity in Practice Suppose that and that Algorithm 1 [Castro et al. 2008] start with random (Rademacher) matrix after each measurement, compute posterior distribution re-weight subsequent measurements using, i.e., set The posterior will gradually concentrate on the correct support, eventually leading to measurement vectors that use all their energy to directly measure the nonzero.

Adaptivity in Practice Suppose that and that Algorithm 2 [Iwen and Tewfik 2011] split measurements into stages in each stage, use measurements to decide if the nonzero is in the left or right half of the active set after subdividing times, return support

Adaptivity in Practice Suppose that and that Algorithm 2 [Iwen and Tewfik 2011] split measurements into stages in each stage, use measurements to decide if the nonzero is in the left or right half of the active set after subdividing times, return support

Adaptivity in Practice Suppose that and that Algorithm 2 [Iwen and Tewfik 2011] split measurements into stages in each stage, use measurements to decide if the nonzero is in the left or right half of the active set after subdividing times, return support

Adaptivity in Practice Suppose that and that Algorithm 2 [Iwen and Tewfik 2011] split measurements into stages in each stage, use measurements to decide if the nonzero is in the left or right half of the active set after subdividing times, return support

Phase Transition in the Posterior Algorithm 1 Algorithm 2 [Arias-Castro, Candès, and Davenport - 2011]

Phase Transition in the MSE [Arias-Castro, Candès, and Davenport - 2011]

Conclusions Surprisingly, adaptive algorithms, no matter how intractable, cannot significantly improve over seemingly naively simple nonadaptive strategies Adaptivity might still be very useful in practice for a given value of, how many additional measurements are required to transition from the regime where adaptivity doesn t help to where it does? practical adaptive algorithms that achieve the minimax rate for all values of? practical architectures for implementing adaptive measurements in real-world signals?