Principles of Wireless Sensor Networks. Fast-Lipschitz Optimization

http://www.ee.kth.se/~carlofi/teaching/pwsn-2011/wsn_course.shtml Lecture 5 Stockholm, October 14, 2011 Fast-Lipschitz Optimization Royal Institute of Technology - KTH Stockholm, Sweden e-mail: carlofi@kth.se Previous lecture Non expansive mappings, agreement, consensus Today F-Lipschitz optimization Protocol stack Application Presentation Session Transport Routing MAC Phy

In lecture, you will learn some of the key aspects of Fast Lipschitz optmization When a problem is F-Lipschitz? Why should I care of F-Lipschitz? The theory is illustrated by distributed estimation applications Why F-Lipschitz is useful for distributed estimation? Other applications will be mentioned as well optimization centralized distributed/networked optimization network optimization Optimization over networks going for less and less base-stations but more and more distributed networks Optimization needs to be computed by fast algorithms of low complexity time-varying networks, little time to compute optimal solutions computations often must be distributed E.g., cross layer protocol design, distributed detection, estimation, content distribution, routing,... Parallel and distributed computation Fundamental theory for optimization over networks Drowback over energy-constrained wireless networks: the cost for communication not considered An alternative theory is needed In a number of cases, Fast-Lipschitz optimization

Outline Definition of Fast-Lipschitz optimization Computation of the optimal solution Problems in canonical form Examples of application Conclusions Network optimization We have seen that a class of distributed estimation problems needs a fast distributed computation of an optimization Is there a general formulation of optiomization problmes that can be solved quickly and efficiently in a distributed manner? In many cases, Fast-Lipschitz optimization C. Fischione, F-Lipschitz Optimization with Wireless Sensor Networks Applications, IEEE Transactions on Automatic Control, Accepted for Publication, to appear, 2011.

The Fast-Lipschitz optimization ex: x could be quality control action, radio power etc some box D nonempty compact set containing other constraints Computation of the solution Centralized optimization Problem solved by a central processor Network of n nodes Distributed optimization Decision variables and constraints are associated to nodes that cooperate to compute the solution in parallel

The F-Lipschitz optimization Non-Convex Optimization Convex optimization problems require heavier solvers to solve than F-Lipschitz Convex Optimization F-Lipschitz Optimization Interference Function Optimization Geometric Optimization F-Lipschitz optimization problems can be convex, geometric, quadratic, interference-function,... Pareto Optimal Solution

White board notes to the previous slide ExampleofParetoOptimal: 5!!! = 3 1 y!!! = 4 6 0.5 Noneisbetterorworse,since notallelementsarethehighest, henceitisaparetooptimal solution Notation Gradient Norm infinity: sum along a row Norm 1: sum along a column

F-Lipschitz qualifying properties to be a F-Lipschitz optimization problem Criteria 1 must always be satisfied while only one of criteria 2-4 needs to be fulfilled Functions may be non-convex White board notes to the previous slide ExamplesofF=Lipschitzqualifyingproperties 1b) 3c) < 1! =!!!!!! < 1 < 1 2a)! = 3a)! = + + 0 + 0!!! =!!!!!!!!!!!!! = < 1!!!!!! < 1 < 1 1 1 1 4a)! = 0 0 < 1 + 0 0 + <!!!!!

Outline Definition of Fast-Lipschitz optimization Computation of the optimal solution Problems in canonical form Examples of application Conclusions Optimal Solution The Pareto optimal solution is just given by a set of in general nonlinear) equations. Solving a set of equations is much easier than solving an optimization problem by traditional Lagrangian methods.

Lagrangian methods Theorem: Consider a feasible F-Lipschitz problem. Then, the KKT conditions are necessary and sufficient. KKT conditions: with F-Lipschitz, we do not need the Lagrangian method to solve the problem scalar x, lambda is a vector of Lagrangian multipliers Lagrangian contractive iteration Lagrangian methods to compute the solution White board notes to the previous slide Lagrangianmethod 1) BuildtheLagrangian 2)!!!,! = 0!! Lagrangianmethods!!!,! = 0

Centralized optimization The optimal solution is given by iterative methods to solve systems of non-linear equations e.g., Newton methods) is a matrix to ensure and maximize convergence speed Many other methods are available, e.g., second-oder methods. Distributed optimization the value can not go outside the box D otherwise approx. around the max bound of the box)

F-Lipschitz optimization yes Inequality constraints satisfy the equality at the optimum? no Compute the solution by F- Lipschitz methods Compute the solution by Lagrangian methods F-Lipschitz optimization: a class of problems for which all the constraints are active at the optimum Optimum: the solution to the set of equations given by the constraints No Lagrangian methods, which are computationally expensive, particularly on wireless networks Outline Definition of Fast-Lipschitz optimization Computation of the optimal solution Problems in canonical form Examples of application Conclusions

Problems in canonical form Canonical form Bertsekas, Non Linear Programming, 2004 F-Lipschitz form possible by doing the following Problems in canonical form

Example 1: from canonical to F-Lipschitz The problem is convex, but is also F-Lipschitz: Off-diagonal monotonicity Diagonal dominance The solution is given by the constraints at the equality, trivially Example 2: a hidden F-Lipschitz Non F-Lipschitz Simple variable transformation,, F-Lipschitz

An F-Lipschitz Matlab Toolbox M. Leithe, Introducing a Matlab Toolbox for F-Lipschitz optimization, Master Thesis KTH, 2011 Outline Definition of Fast-Lipschitz optimization Computation of the optimal solution Problems in canonical form Examples of application Conclusions

Distributed estimation We now study F-Lipschitz optimization for distributed estimation distributed detection distributed radio power control Estimation phenomena phenomena sensors 1 2 N Fusion Center sensors 1 2 N Distributed estimation: no central coordination Centralized estimation: No/little intelligence on nodes A. Speranzon, C. Fischione, K. H. Johansson, A. Sangiovanni-Vincentelli, A Distributed Minimum Variance Estimator for Sensor Networks, IEEE Journal on Selected Areas in Communications, special issue on Control and Communications, Vol. 26, N. 4, pp. 609 621, May 2008. A. Speranzon, C. Fischione, K. H. Johansson, Distributed and Collaborative Estimation over Wireless Sensor Networks, IEEE CDC 2006.

Network and signals dt) Nodes perform a noisy measurement of a common time-varying signal Communication subject to space-time varying packet losses Distributed estimator Nodes exchange local measurements and estimates Local estimate Global vector of the estimates Goal: find locally the coefficients and that minimize the variance of the estimation error

Estimation coefficients Estimation error Estimation coefficients minimizing the average estimation error, under stability constraints Small Bias Stable estimation error A centralized optimization problem How to distribute the computation of the optimal solution? 1. Cost function and first constraint easy to distribute 2. Second constraint is difficult to distribute How to distribute the second constraint Node j Node i The 1-norm and max-norm are easy to distribute, but give infeasibility

How to distribute the second constraint A global constraint is translated into a local one by using some thresholds Distributed estimator By using the thresholds: from centralized to distributed optimization

Estimation coefficients How to compute the thresholds? What is the performance of the estimator? How to compute the thresholds? The higher the thresholds the lower the estimation error A Lipschitz optimization problem See second part of the lecture

Performance The error variance of the estimator that makes a simple average of the received measurements is an upper bound to the error variance of the proposed distributed estimator. Simulation example Network with 30 nodes randomly deployed. Signal to track: Variance of the additive noise:

Simulation Example 2) Laplacian Estimator Instantaneous Average Estimator Proposed Distributed Estimator Packet loss probability Remarks A class of distributed estimators for WSNs Optimal distributed estimator Stability conditions Performance analysis Open issues Model-based estimator Estimator for signal with spatial and temporal correlation

Optimal thresholds Optimal solution: the fixed point of a contraction mapping Each node updates asynchronously its threshold after receiving those of neighboring nodes. Distributed binary detection measurements at node i hypothesis thesting with S measurements and threshold x i probability of false alarm probability of misdetection A threshold minimizing the prob. of false alarm maximizes the prob. of misdetection. How to choose optimally the thresholds when nodes exchange opinions?

Threshold optimization in distributed detection done in ex. cognitive radios How to solve the problem by parallel and distributed operations among the nodes? The problem is convex Lagrangian methods interior point methods) could be applied Drowback: too many message passing Lagrangian multipliers) among nodes to compute iteratively the optimal solution An alternative method: F-Lipschitz optimization Distributed detection: F-Lipschitz vs 231 Lagrangian methods can be used also in estimation or radio power control 10 nodes network 31 F-Lipschitz Lagrangian methods interior point) 36 5 Number of iterations Number of function evaluations

Interference function theory vector of radio powers e interference that the radio power has to overcome Tx Properties of the Type-I) interference function Rx nodes Foschini, Miljanic, A simple distributed autonomous power control algorithm and its convergence, IEEE Trans. Veh. Technol., 1993. Example: Radio Power Control with unreliable components Unreliable transceivers introduce intermodulation powers difficult to compensate SINR Tx Outages Rx A difficult optimization problem How to distribute the computation?

Power control as an F-Lipschitz problem A change of variables and a redefinition F-Lipschitz qualifying properties are much more general than the interference function ones. More radio power control problems can be solved than traditional ones based on the Interference Function. Conclusions Existing methods for optimization over networks are too expensive Studied the Fast-Lipschitz optimization Application to distributed estimation, and other cases F-Lipschitz optimization is a panacea for many cases, but still there is a lack of a theory for fast parallel and distributed computations