Algoritmi di Ottimizzazione: Parte A Aspetti generali e algoritmi classici

Similar documents
Derivative-Free Optimization

Introduction (7.1) Genetic Algorithms (GA) (7.2) Simulated Annealing (SA) (7.3) Random Search (7.4) Downhill Simplex Search (DSS) (7.

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Introduction to optimization methods and line search

Simplex of Nelder & Mead Algorithm

Modern Methods of Data Analysis - WS 07/08

10703 Deep Reinforcement Learning and Control

Introduction. Optimization

An evolutionary annealing-simplex algorithm for global optimisation of water resource systems

Introduction to unconstrained optimization - derivative-free methods

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Multivariate Numerical Optimization

Classical Gradient Methods

Introduction to Optimization Problems and Methods

Optimization. Industrial AI Lab.

Numerical Optimization

Chapter 3 Numerical Methods

Numerical Optimization: Introduction and gradient-based methods

M. Sc. (Artificial Intelligence and Machine Learning)

Beyond Classical Search: Local Search. CMPSCI 383 September 23, 2011

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

APPLIED OPTIMIZATION WITH MATLAB PROGRAMMING

Computational Methods. Constrained Optimization

Wiswall, Applied Microeconometrics, Lecture Notes 1. In this section we focus on three very common computational tasks in applied

Fast oriented bounding box optimization on the rotation group SO(3, R)

An Evolutionary Algorithm for Minimizing Multimodal Functions

Global optimisation techniques in water resources management

CS281 Section 3: Practical Optimization

5 Machine Learning Abstractions and Numerical Optimization

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

NEW CERN PROTON SYNCHROTRON BEAM OPTIMIZATION TOOL

Computational Optimization. Constrained Optimization Algorithms

All lecture slides will be available at CSC2515_Winter15.html

Constrained and Unconstrained Optimization

LECTURE NOTES Non-Linear Programming

A REVISED SIMPLEX SEARCH PROCEDURE FOR STOCHASTIC SIMULATION RESPONSE-SURFACE OPTIMIZATION

Simultaneous Perturbation Stochastic Approximation Algorithm Combined with Neural Network and Fuzzy Simulation

Linear Discriminant Functions: Gradient Descent and Perceptron Convergence

Lecture 12: Feasible direction methods

Lecture 6 - Multivariate numerical optimization

Chapter 14 Global Search Algorithms

Multidimensional Minimization

Introduction to Optimization

Gradient Methods for Machine Learning

Optimizing the TracePro Optimization Process

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

March 19, Heuristics for Optimization. Outline. Problem formulation. Genetic algorithms

Deep Reinforcement Learning

A Brief Look at Optimization

Optimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

OPTIMIZATION METHODS. For more information visit: or send an to:

Chapter Multidimensional Gradient Method

Generic descent algorithm Generalization to multiple dimensions Problems of descent methods, possible improvements Fixes Local minima

Simplicial Global Optimization

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Introduction to Design Optimization: Search Methods

A Brief Introduction to Reinforcement Learning

A Study on the Optimization Methods for Optomechanical Alignment

Scalable Data Analysis

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

25. NLP algorithms. ˆ Overview. ˆ Local methods. ˆ Constrained optimization. ˆ Global methods. ˆ Black-box methods.

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Energy Minimization -Non-Derivative Methods -First Derivative Methods. Background Image Courtesy: 3dciencia.com visual life sciences

Perceptron: This is convolution!

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Image Registration Lecture 4: First Examples

What is machine learning?

Optimization in Brachytherapy. Gary A. Ezzell, Ph.D. Mayo Clinic Scottsdale

Introduction to Design Optimization: Search Methods

Machine Learning for Software Engineering

COMPARISON OF ALGORITHMS FOR NONLINEAR REGRESSION ESTIMATES

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Convex Optimization CMU-10725

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

EARLY INTERIOR-POINT METHODS

3.6.2 Generating admissible heuristics from relaxed problems

Short Reminder of Nonlinear Programming

Characterizing Improving Directions Unconstrained Optimization

Model Parameter Estimation

CS 188: Artificial Intelligence

PRIMAL-DUAL INTERIOR POINT METHOD FOR LINEAR PROGRAMMING. 1. Introduction

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

PLEASE SCROLL DOWN FOR ARTICLE

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Nelder-Mead Enhanced Extreme Learning Machine

Non-Derivative Optimization: Mathematics or Heuristics?

Programming, numerics and optimization

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Optimization Techniques for Design Space Exploration

Introduction to Artificial Intelligence 2 nd semester 2016/2017. Chapter 4: Beyond Classical Search

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

An Adaptive Random Search Alogrithm for Optimizing Network Protocol Parameters

PORTFOLIO OPTIMISATION

Derivative Free Optimization Methods: A Brief, Opinionated, and Incomplete Look at a Few Recent Developments

ME 555: Distributed Optimization

Investigation of global optimum seeking methods in water resources problems

Chapter 1. Introduction

Algorithms for convex optimization

MATH3016: OPTIMIZATION

Transcription:

Identificazione e Controllo Intelligente Algoritmi di Ottimizzazione: Parte A Aspetti generali e algoritmi classici David Naso A.A. 2006-2007 Identificazione e Controllo Intelligente 1 Search and optimization Problem: Let Q be the domain of allowable values for a vector q. Find the values of a vector q element of Q that minimizes a scalar valued loss function L(q). Such a problem can be encountered in all areas engineering, physical sciences, business or medicine Doing the most with the least is the essence of the optimization objective Identificazione e Controllo Intelligente 2 1

Local vs Global optimization 1/2 L(θ) L(θ) θ θ θlocal θglobal θlocal θglobal Identificazione e Controllo Intelligente 3 Local vs Global optimization 2/2 Due to the limitation of almost all the optimization algorithm it is only possible to approach a local optimum. The local minimum may be still a fully acceptable solution for the resources available (human time, money, computer time, ) to be spent in the optimization. Some algorithms are sometimes able to find global solutions (e.g. stochastic approximation, simulated annealing, genetic algorithms) Identificazione e Controllo Intelligente 4 2

Stochastic vs Deterministic optimization 1. There is noise in the loss function measurements. 2. There is a random choice to select the search direction. These hypotheses contrast with standard deterministic optimization that assumes to have perfect information about the loss function (steepest descent) and its derivatives (Newton Raphson) In most practical problems such an information is not available and deterministic algorithms are inappropriate Identificazione e Controllo Intelligente 5 No free lunch theorems Wolpert and Macready (1997) An algorithm that is effective in one class of problems is guaranteed to be ineffective on another class. The theorem applies to problems with a finite (but arbitrarily large) number of options. Just to get a feel for the theorem, consider the needle in the haystack problem: no search algorithm can beat blind random search. Identificazione e Controllo Intelligente 6 3

Direct search methods examples: Random search Nelder Mead algorithm require only loss function measurements are versatile and broad applicable easy to implement have a long record of practical efficiency useful when modest precision is required in the solution Identificazione e Controllo Intelligente 7 Gradient-based stochastic algorithms examples: Stochastic gradient algorithm Back-propagation loss function is assumed to be noisy and non-linear noisy measurements of the loss function gradient are needed countless applications in the last 50 years (neural network training, image restoration) Many real world possible applications but: the method needs gradient measurements Identificazione e Controllo Intelligente 8 4

Gradient-free stochastic algorithms examples: Stochastic approximation Finite difference algorithm only noisy loss function measurement are available gradient is not calculated but approximated using loss function measurement not efficient in high-dimensional problems Simultaneous perturbation stochastic approximation (SPSA) reduces the number of loss function measurements Identificazione e Controllo Intelligente 9 Global search algorithms examples: SPSA method Annealing type algorithms Genetic Algorithms capable to solve complex search problems performances depend on configuration parameters The best trade-off among effectiveness, simplicity, speed of convergence and noise immunity has to be pursued in real world optimization problems. Better a rough answer to the right question than an exact answer to the wrong one. Identificazione e Controllo Intelligente 10 5

Classical Gradient-Based Optimization Classical optimization setting of interest Find θ that minimizes the differentiable loss L(θ) subject to θ satisfying relevant constraints (θ Θ) Standard nonlinear unconstrained optimization setting: Find θ such that L θ = 0 L( θ) θ * L 0 = θ Identificazione e Controllo Intelligente 11 θ Constrained vs. Unconstrained L θ = 0 setting usually associated with unconstrained optimization Most real problems include constraints Many constrained problems can also be converted to L θ = 0 Penalty functions, projection methods, ad hoc methods and common sense, etc Considerations for constraints: Hard vs. soft Explicit vs. implicit Identificazione e Controllo Intelligente 12 6

Gradients and Hessians Often used directly in deterministic methods; indirectly in stochastic methods Exact gradients and Hessians generally not available in stochastic optimization Gradient g(θ) of L(θ) is the vector of 1st partial derivatives L g( θ) = θ Hessian of L(θ) is the matrix H(θ) consisting the 2nd partial derivatives 2 L H( θ) = T θ θ Hessian useful in characterizing shape of L and in providing search direction for (deterministic) Newton-Raphson algorithm Identificazione e Controllo Intelligente 13 Rationale Behind Steepest Descent Update Direction for i th Element of θ Identificazione e Controllo Intelligente 14 7

1 st -order (Steepest Descent) 2 nd -order (Newton-Raphson) Directions Identificazione e Controllo Intelligente 15 Varianti Gli algoritmi di ricerca derivative-based differiscono per il tipo di uso che si fa dell hessiano e poi dalla tecnica con cui si determina il passo (lo step size) una volta definita la direzione in cui andare. [Demo ottimizzazione derivative based] Identificazione e Controllo Intelligente 16 8

Nelder and Mead - Simplex Search Simplex: a set of n+1 points in n-dim. space A triangle in a 2D space A tetrahedron in a 3D space Concept of downhill simplex search Repeatedly replaces the highest points with a lower one Consecutive successful replacements lead to the enlargement of the simplex Consecutive unsucessful replacements lead to the shrinkage of the simplex Identificazione e Controllo Intelligente 17 Downhill Simplex Search Flowchart Identificazione e Controllo Intelligente 18 9

Steps of Nonlinear Simplex Algorithm θ max θmin θ max θmin θ cent θ cent θ 2max θ refl Reflection θ 2max θ refl θ exp Expansion when L(θ refl ) < L(θ min ) θ max θ 2max θ min θ cent θ cont θ refl Contraction when L(θ refl ) < L(θ max ) ( outside ) θ max θ min θ max θ min θ cont θ cent θ θ refl 2max Contraction when L(θ refl ) L(θ max ) ( inside ) θ cent θ cont θ θ refl 2max Shrink after failed contraction when L(θ refl )< L(θ max ) Identificazione e Controllo Intelligente 19 Downhill Simplex Search Example: Find the min. of the peaks function z = f(x, y) = 3*(1-x)^2*exp(-(x^2) - (y+1)^2) - 10*(x/5 - x^3 - y^5)*exp(-x^2-y^2) -1/3*exp(-(x+1)^2 - y^2). MATLAB file: go_simp.m Identificazione e Controllo Intelligente 20 10

Random Hillclimbing Properties: Intuitive Simple Analogy: Get down to a valley blindfolded Two heuristics: Reverse step Bias direction Identificazione e Controllo Intelligente 21 Random Hillclimbing Flowchart: Select a random dx f(x+b+dx)<f(x)? no yes x = x + b + dx b = 0.2 b + 0.4 dx f(x+b-dx)<f(x)? no yes x = x + b - dx b = b - 0.4 dx b = 0.5 b Identificazione e Controllo Intelligente 22 11

Random Search Example: Find the min. of the peaks function z = f(x, y) = 3*(1-x)^2*exp(-(x^2) - (y+1)^2) - 10*(x/5 - x^3 - y^5)*exp(-x^2-y^2) -1/3*exp(-(x+1)^2 - y^2). Identificazione e Controllo Intelligente 23 12