Evolutionary Cutting Planes

Similar documents
SDLS: a Matlab package for solving conic least-squares problems

An Experimental Assessment of the 2D Visibility Complex

BoxPlot++ Zeina Azmeh, Fady Hamoui, Marianne Huchard. To cite this version: HAL Id: lirmm

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

Tacked Link List - An Improved Linked List for Advance Resource Reservation

DANCer: Dynamic Attributed Network with Community Structure Generator

Setup of epiphytic assistance systems with SEPIA

LaHC at CLEF 2015 SBS Lab

Taking Benefit from the User Density in Large Cities for Delivering SMS

The optimal routing of augmented cubes.

Traffic Grooming in Bidirectional WDM Ring Networks

Blind Browsing on Hand-Held Devices: Touching the Web... to Understand it Better

How to simulate a volume-controlled flooding with mathematical morphology operators?

Deformetrica: a software for statistical analysis of anatomical shapes

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

A Voronoi-Based Hybrid Meshing Method

Branch-and-price algorithms for the Bi-Objective Vehicle Routing Problem with Time Windows

Malware models for network and service management

Study on Feebly Open Set with Respect to an Ideal Topological Spaces

Relabeling nodes according to the structure of the graph

The Proportional Colouring Problem: Optimizing Buffers in Radio Mesh Networks

Regularization parameter estimation for non-negative hyperspectral image deconvolution:supplementary material

Real-Time Collision Detection for Dynamic Virtual Environments

Inverting the Reflectance Map with Binary Search

Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid

Every 3-connected, essentially 11-connected line graph is hamiltonian

Reverse-engineering of UML 2.0 Sequence Diagrams from Execution Traces

Change Detection System for the Maintenance of Automated Testing

THE COVERING OF ANCHORED RECTANGLES UP TO FIVE POINTS

QuickRanking: Fast Algorithm For Sorting And Ranking Data

An Efficient Numerical Inverse Scattering Algorithm for Generalized Zakharov-Shabat Equations with Two Potential Functions

Multimedia CTI Services for Telecommunication Systems

Robust IP and UDP-lite header recovery for packetized multimedia transmission

HySCaS: Hybrid Stereoscopic Calibration Software

Accurate Conversion of Earth-Fixed Earth-Centered Coordinates to Geodetic Coordinates

Prototype Selection Methods for On-line HWR

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Acyclic Coloring of Graphs of Maximum Degree

Comparison of spatial indexes

Zigbee Wireless Sensor Network Nodes Deployment Strategy for Digital Agricultural Data Acquisition

Linux: Understanding Process-Level Power Consumption

Fuzzy sensor for the perception of colour

Real-time FEM based control of soft surgical robots

Comparator: A Tool for Quantifying Behavioural Compatibility

A Methodology for Improving Software Design Lifecycle in Embedded Control Systems

Compromise Based Evolutionary Multiobjective Optimization Algorithm for Multidisciplinary Optimization

Mokka, main guidelines and future

Linked data from your pocket: The Android RDFContentProvider

RETIN AL: An Active Learning Strategy for Image Category Retrieval

X-Kaapi C programming interface

The Connectivity Order of Links

Multi-atlas labeling with population-specific template and non-local patch-based label fusion

Computing and maximizing the exact reliability of wireless backhaul networks

Kernel perfect and critical kernel imperfect digraphs structure

DETERMINATION OF THE TRANSDUCER VELOCITIES IN A SONAR ARRAY USING DIGITAL ACOUSTICAL HOLOGRAPHY

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

YAM++ : A multi-strategy based approach for Ontology matching task

Representation of Finite Games as Network Congestion Games

Primitive roots of bi-periodic infinite pictures

Assisted Policy Management for SPARQL Endpoints Access Control

Sliding HyperLogLog: Estimating cardinality in a data stream

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications

Optimization Methods for Solving the Low Autocorrelation Sidelobes Problem

Quality of Service Enhancement by Using an Integer Bloom Filter Based Data Deduplication Mechanism in the Cloud Storage Environment

A 64-Kbytes ITTAGE indirect branch predictor

Continuous Control of Lagrangian Data

A N-dimensional Stochastic Control Algorithm for Electricity Asset Management on PC cluster and Blue Gene Supercomputer

Very Tight Coupling between LTE and WiFi: a Practical Analysis

Developing interfaces for the TRANUS system

Syrtis: New Perspectives for Semantic Web Adoption

Multi-objective branch and bound. Application to the bi-objective spanning tree problem

Decentralised and Privacy-Aware Learning of Traversal Time Models

KeyGlasses : Semi-transparent keys to optimize text input on virtual keyboard

FIT IoT-LAB: The Largest IoT Open Experimental Testbed

Learning Object Representations for Visual Object Class Recognition

XML Document Classification using SVM

QAKiS: an Open Domain QA System based on Relational Patterns

Learning a Representation of a Believable Virtual Character s Environment with an Imitation Algorithm

Stream Ciphers: A Practical Solution for Efficient Homomorphic-Ciphertext Compression

Recommendation-Based Trust Model in P2P Network Environment

NP versus PSPACE. Frank Vega. To cite this version: HAL Id: hal

Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories

Is GPU the future of Scientific Computing?

Lossless and Lossy Minimal Redundancy Pyramidal Decomposition for Scalable Image Compression Technique

Self-optimisation using runtime code generation for Wireless Sensor Networks Internet-of-Things

Quasi-tilings. Dominique Rossin, Daniel Krob, Sebastien Desreux

Cache-enabled Small Cell Networks with Local User Interest Correlation

Relevance of the Hölderian regularity-based interpolation for range-doppler ISAR image post-processing.

Scale Invariant Detection and Tracking of Elongated Structures

A Novelty Search Approach for Automatic Test Data Generation

An iterative method for solving the inverse problem in electrocardiography in normal and fibrillation conditions: A simulation Study

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

Search-Based Testing for Embedded Telecom Software with Complex Input Structures

ROBUST MOTION SEGMENTATION FOR HIGH DEFINITION VIDEO SEQUENCES USING A FAST MULTI-RESOLUTION MOTION ESTIMATION BASED ON SPATIO-TEMPORAL TUBES

Natural Language Based User Interface for On-Demand Service Composition

Efficient Gradient Method for Locally Optimizing the Periodic/Aperiodic Ambiguity Function

Synthesis of fixed-point programs: the case of matrix multiplication

Selection of Scale-Invariant Parts for Object Class Recognition

Minor-monotone crossing number

Vector Representation of Graphs: Application to the Classification of Symbols and Letters

Transcription:

Evolutionary Cutting Planes Jérémie Decock, David L. Saint-Pierre, Olivier Teytaud To cite this version: Jérémie Decock, David L. Saint-Pierre, Olivier Teytaud. Evolutionary Cutting Planes. Stephane Bonnevay and Pierrick Legrand and Nicolas Montmarché and Evelyne Lutton and Marc Schoenauer. Artificial Evolution (EA215), 215, Lyon, France. 215. <hal-119454> HAL Id: hal-119454 https://hal.inria.fr/hal-119454 Submitted on 7 Sep 215 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Evolutionary Cutting Planes Jérémie Decock, David L. Saint-Pierre, Olivier Teytaud TAO (Inria), LRI, UMR 8623 (CNRS - Univ. Paris-Sud) Bat 66 Claude Shannon Univ. Paris-Sud, 9119 Gif-sur-Yvette, France Email: {firstname.lastname}@inria.fr Abstract. The Cutting Plane method is a simple and efficient method for optimizing convex functions in which subgradients are available. This paper proposes several methods for parallelizing it, in particular using a typically evolutionary method, and compares them experimentally in a well-conditioned and ill-conditioned settings. 1 Introduction Various information levels can help for optimization: the objective function values are usually available; sometimes, the gradient is also provided[3], and in some cases the Hessian is available (either directly, in Newton s method, or approximated by successive gradients, as in quasi-newton methods[2, 4, 6, 14]; there are also cases in which both the gradient and Hessian are approximated by objective function values without direct computations[13]). We here consider cases in which we have access to subgradients. A traditional method for cases in which subgradients are available is the cutting-plane method[8]: at each search point x n, a linear approximation of the objective function f, tangent to f at x n, is computed 1 ; this linear approximation is termed a cut c. This cutting-plane method assumes that the objective function f is convex, so that the cuts are all below f. The f n function corresponding to the maximum of cuts {c 1,..., c n } computed in previous iterations {1,..., n} can therefore be used as an heuristic. Thus, f is an approximate model of the objective function and it determines the search point of the next iteration n + 1: x n+1 = arg min x fn. As f is piecewise linear, arg min x fn can be quickly computed using linear programming solvers. The parallelization of optimization algorithms is critical for many applications. To the best of our knowledge, whereas many methods using linear cuts have been parallelized (e.g. dual dynamic programming[12]), there is no published work on parallel cutting plane methods. In this paper we propose variants of parallel cutting plane methods. We analyse these variant to validate the approach both theoretically and experimentally. Section 2 formalizes the problem. Section 3 introduces several parallel cutting plane methods. Section 4 provides experimental results. 1 This cutting-plane method is not the one which is used in integer linear programming but the version for continuous non-linear optimization with gradients.

2 Problem Statement Let f : C R d R be a convex function with domain C, where d is the dimension of the problem. The generic problem at hand is defined by min f(x), subject to x C, where C is a convex set. Assuming that we have access, for a given x, to both f(x) and a subgradient f x, then the cutting plane method can be defined as follow for the iteration n {, 1, 2,... }: x n+1 = arg min fn, x (1) c n+1 = x c n+1 (x) = f(x n+1 ) + f xn+1.(x x n+1 ), (2) f n+1 = max( f n, c n+1 ), (3) where f is some heuristic, and where Eq. (1) defines the sequence x 1, x 2,..., x n of search points. Eq. (2) defines the cuts, i.e. linear functions tangent to the objective function f. Eq. (3) defines the sequence of approximate models f 1, f 2,..., f n. f can be or some heuristically defined functions. However, if f can be encoded in linear programming, then Eq. (1) can be solved by linear programming for any n. Usually x 1 is randomly drawn in C if f is a constant function. 3 Parallel cutting plane methods The classic cutting plane method is a approach that generates only one cut per iteration n. The idea put forth in this paper is to generate several per iteration. Assuming we have access p processors, then we can potentially generate p cuts per iteration. In this section, we define parallel cutting plane (PCP) methods. First Section 3.1 describes a naive parallelization of the cutting plane method. Then Section 3.2 explains a approach, using random perturbation; a variant is also proposed. Third, Section 3.3 explains the method to generate the cuts and also develops a variant. 3.1 Regular PCP The idea of a naive parallelization of the cutting plane method is to evaluate several points between x n and x n+1 rather than only x n+1. For some constant C >, we define a PCP as follows for the iteration n {, 1, 2,... }: x n+1 = arg min fn(x), (4) x ( i {1,..., p}, x n+1,i = x n + C i ) (x n+1 x n), (5) p c n+1,i = x c n+1,i(x) = f(x n+1,i) + f xn+1,i.(x x n+1,i), (6) f n+1 = max( f n, max c i n+1,i). (7)

where f is some heuristic. Obviously Eq.(6) can be computed in parallel. In the worst case scenario the only cut of relevance is from the point x n+1 in which case it boils down to the cutting plane method. When C = 1, it means that we evaluate cuts ly distributed from x n to x n+1 ; this is termed the PCP. When C > 1 (e.g. C = 2), we evaluate cuts also farther than x n+1 ; this is termed extended (or in the following figures). 3.2 Gaussian PCP Evolutionary algorithms are usually not the fastest optimization methods, but they have advantages in terms of robustness and parallelization. Combining cutting planes and evolutionary mutations is a natural idea for parallelizing the cutting planes method. The naive PCP from Section 3.1 has the drawback that it generates points distributed along a line. We propose a variant of the form N (µ, σ), where cuts are generated using µ = x n+1 and σ = x n+1 x n / d. Note the division by the square root of the dimension in order to keep the distance normalized. As such we propose the following variant: f = some heuristic (8) x n+1 = arg min fn(x), x (9) x n+1,1 = x n+1 (1) µ = x n+1 σ = x n+1 x n / d (11) i {2,..., p}, x n+1,i = N (µ, σ), (12) i {2,..., p}, c n+1,i(x) = f(x n+1,i) + f xn+1,i.(x x n+1,i), (13) f n+1 = max(f n, max c i n+1,i) (14) Our aim is to have points less ly distributed, hopefully with a better diversity in the cuts. Admittedly, a drawback of this method is that points are generated isotropically, which might be problematic for ill-conditioned problems.when the domain of f is constrained (C R d ) Eq. 12 might draw some points outside C; another instruction is then required to filter these wrong points. Special-Gaussian PCP We here propose a special variant of the Gaussian version above. The pseudo-code is the same, but µ and σ are such that the Gaussian distribution is centered on the average µ = 1 2 (x n + x n+1 ) and σ = x n+1 x n /2 d. 3.3 Billiard PCP In a convex setting, each (non-null) cut provide a half-space in which the optimum lies. With multiple cuts, we have a polyhedron in which the optimum necessarily lies. The billiard algorithm is a classical approach[1, 1, 7, 5] for sampling a polyhedron. We apply it, here and get p 1 points x n,2,...,x n,p approximately uniformly distributed in the polyhedron. x n,1 is set to x n.

sphere (d=3 ; 15 iterations) -.4 -.5 -.6 -.7 -.8 -.9-1 s- -.4 -.5 cigar (d=3 ; 15 iterations) s- 1 2 3 4 5 (a) Sphere 1 2 3 4 5 (b) Cigar Fig. 1: Dimension d = 3, 15 iterations, p = 5. Comparison of Simple Regret (SR) based on the number of processors p. Interpretation: for the sphere, s- Billiard and s-gaussian outperform the other variants when p > 3; for the cigar, billiard and s-billiard outperform the other variants when p > 3. Special-billiard PCP The center of mass might be an interesting point; therefore, for this special billiard, we keep one point for the average of the x n,i for i {2,..., p 1}: x n,p = 1 p 1 p 2 i=2 x n,i. This method is inspired by [9, 11]. It approximates the center of mass of the polyhedron of possible optima; a computationally faster method is also proposed in [15] (using an approximation of the centre of gravity by the center of the biggest ellipsoid). 4 Experiments In this section we compare 2 parallel cutting plane methods (C=1 and C=1.5, from Section 3.1), 2 approaches (defined in Section 3.2) and 2 billiards methods from Section 3.3 to 2 baseline: and cheat. The method is the vanilla version where there are no parallelization. The cheat method can execute each cut in parallel as if it was in the scenario, thus showing the absolute best possible outcome. Simple regret We consider in our results the Simple Regret (SR). The simple regret of an optimization run is the difference f(ˆx) inf x f(x), where ˆx is the approximation of the optimum provided by the optimization algorithm at the end of the optimization run. Comparison We compare our methods, as a function of the number p of processors. For the sake of comparison, we include a method. The method is in fact the classical cutting-plane method, but with each group of p iterations being considered as one iteration only. This is not a real parallel method: it is in the sense that processor #2 can use the result of the computation of processor #1, and so on. It is just aimed at showing the ideal

sphere (d=5 ; 5 iterations) 5 5 -.4 -.45 s- 5 cigar (d=5 ; 5 iterations) s- -.5 1 2 3 4 5 6 (a) Sphere 5 1 2 3 4 5 6 (b) Cigar Fig. 2: Dimension d = 5, 5 iterations, p = 6. Comparison of Simple Regret (SR) based on the number of processors p. Interpretation: for the sphere, Gaussian and s-gaussian outperform the other variants when p > 4; for the cigar, billiard and s-billiard outperform the other variants when p > 5. case, i.e. the performance that would be obtained if we could fully parallelize the cutting plane method so that p iterations on a machine could be run simultaneously on a machine with p processors. For the sake of comparison, we also include a method, in which we do not parallelize at all; only one processor is used, so that this is indeed the method. We consider the sphere function f(x) = x 2 for x [, 1] d, and the cigar function f(x) = x 2 1 + 1 3 i=d i=2 x2 i. All cutting plane methods start with a point randomly uniformly drawn in {x R d ; x = 1}. Results are averaged over 2 runs; each point in each curve is computed independently, so that superiority over each abscissa shows statistical significance. Figure 1 explores the different variants with a fixed dimension d = 3, a number of iterations n = 15 and a number of processors p = 5. For the sphere, s-billiard and s-gaussian outperform the other variants when p > 3. For the cigar, billiard and s-billiard outperform the other variants when p > 3. Our variants (s-gaussian and s-billiard) outperform their respective initial version (respectively Gaussian and Billiard). Figure 2 evaluates the different variants with a fixed dimension d = 5, a number of iterations n = 5 and a number of processors p = 6. For the sphere, Gaussian and s-gaussian outperform the other variants when p > 4. For the cigar, Billiard and s-billiard outperform the other variants when p > 5. Once again, our variants (s-gaussian and s-billiard) outperform their respective initial version (respectively Gaussian and Billiard). Figure 3 evaluates the different variants with a fixed dimension d = 1, a number of iterations n = 1 and a number of processors p = 7. For the sphere, Gaussian and s-gaussian outperform the other variants when p > 4. For the cigar, s-gaussian and s-billiard outperform the other variants when p > 5. Once again, our variants (s-gaussian and s-billiard) outperform their respective initial version (respectively Gaussian and Billiard).

sphere (d=1 ; 1 iterations) 5 (a) Sphere s- -.2 -.4 -.6 cigar (d=1 ; 1 iterations) -.8 (b) Cigar s- Fig. 3: Dimension d = 1, 1 iterations, p = 7. Comparison of Simple Regret (SR) based on the number of processors p. Interpretation: for the sphere, Gaussian and s-gaussian outperform the other variants when p > 4; for the cigar, s-gaussian and s-billiard outperform the other variants when p > 5. Figure 4 presents the impact of the number of iterations upon the Simple Regret (SR) for the 1 dimensions sphere and cigar objective functions. A good variant typically would require a number of iteration as low as possible since each iteration represents a step. The s-gaussian is clearly the best variant when p > 4, whatever the number of iteration. 5 Conclusion Our conclusions are as follows: The s-billiard method is better than the billiard method. This is consistent with the superiority of the ellipsoid method. This might make sense even in the case. Comparison between methods: the s-gaussian method is the best method in the well conditioned case ( sphere ). The s-billiard method is the best only in the ill-conditioned case ( cigar ) when the dimension is small. Speed-up: the speed-up is reasonably good; we have nearly a half or a third of a linear speed-up (i.e. we need twice or three times more processors than in the case). References 1. P. Baldwin. The billiard algorithm and ks entropy. Journal of Physics A: Mathematical and General, 24(16):L941, 1991. 2. C. G. Broyden. The convergence of a class of double-rank minimization algorithms 2. The New Algorithm. J. of the Inst. for Math. and Applications, 6:222 231, 197. 3. A. Cauchy. Méthode générale pour la résolution des systèmes d équations simultanées. Compte-rendus hebdomadaires de l académie des sciences, pages 536 538, 1847.

4. R. Fletcher. A new approach to variable-metric algorithms. Computer Journal, 13:317 322, 197. 5. T. Gensane. Dense packings of equal spheres in a cube. the electronic journal of combinatorics, 11(1):R33, 24. 6. D. Goldfarb. A family of variable-metric algorithms derived by variational means. Mathematics of Computation, 24:23 26, 197. 7. R. Herbrich, T. Graepel, and C. Campbell. Bayes point machines: Estimating the bayes point in kernel space. In IJCAI Workshop SVMs, pages 23 27, 1999. 8. J. E. Kelley. The cutting plane method for solving convex programs. Journal of the SIAM, 8:73 712, 196. 9. A. Levin. An algorithm for the minimization of convex functions. Soviet Math. Doklady, 6:286 29, 1965. 1. B. D. Lubachevsky. How to simulate billiards and similar systems. Journal of Computational Physics, 94(2):255 283, 1991. 11. D. J. Newman. Location of the maximum on unimodal surfaces. 12(3):395 398, July 1965. 12. M. V. F. Pereira and L. M. V. G. Pinto. Multi-stage stochastic optimization applied to energy planning. Math. Program., 52(2):359 375, Oct. 1991. 13. M. J. D. Powell. Developments of newuoa for minimization without derivatives. IMA J Numer Anal, pages drm47+, February 28. 14. D. F. Shanno. Conditioning of quasi-newton methods for function minimization. Mathematics of Computation, 24:647 656, 197. 15. S. Tarasov, L. Khachiyan, and I. Erlikh. The method of inscribed ellipsoids. Soviet Mathematics Doklady, 37(1):226 23, 1988.

5 5 sphere (d=1 ; 1 iterations) s-.65.6.55.5.45.4 cigar (d=1 ; 1 iterations) s- -.4 (a) Sphere sphere (d=1 ; 4 iterations) 5 5 -.4 -.45 (c) Sphere sphere (d=1 ; 1 iterations) 5 (e) Sphere s- s-.35.14.12.1.8.6.4.2 -.2 (b) Cigar cigar (d=1 ; 4 iterations) -.4 -.2 -.4 -.6 (d) Cigar cigar (d=1 ; 1 iterations) -.8 (f) Cigar s- s- Fig. 4: Dimension d = 1; 1, 4 and 1 iterations; p = 7. Comparison of Simple Regret (SR) based on the number of processors p. Interpretation: for the sphere and the cigar, s-gaussian outperform the other variants when p > 4, whatever the number of iteration.