Optimization. Industrial AI Lab.

Similar documents
Optimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Introduction to optimization methods and line search

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Introduction to Optimization Problems and Methods

5 Machine Learning Abstractions and Numerical Optimization

A Brief Look at Optimization

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

6 BLAS (Basic Linear Algebra Subroutines)

Image Registration Lecture 4: First Examples

Symbolic Subdifferentiation in Python

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Introduction to Optimization

LECTURE NOTES Non-Linear Programming

Ensemble methods in machine learning. Example. Neural networks. Neural networks

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015

Perceptron: This is convolution!

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

Lecture 12: convergence. Derivative (one variable)

COMS 4771 Support Vector Machines. Nakul Verma

Recent Developments in Model-based Derivative-free Optimization

Modern Methods of Data Analysis - WS 07/08

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Characterizing Improving Directions Unconstrained Optimization

Machine Learning Basics. Sargur N. Srihari

Machine Learning Classifiers and Boosting

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Simulation. Lecture O1 Optimization: Linear Programming. Saeed Bastani April 2016

Constrained and Unconstrained Optimization

Convex Optimization. Lijun Zhang Modification of

Lecture 12: Feasible direction methods

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

M. Sc. (Artificial Intelligence and Machine Learning)

Neural Network Neurons

Convex Optimization MLSS 2015

Mathematical Programming and Research Methods (Part II)

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Introduction to CVX. Olivier Fercoq and Rachael Tappenden. 24 June pm, ICMS Newhaven Lecture Theatre

Alternating Direction Method of Multipliers

Solution Methods Numerical Algorithms

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

MATH3016: OPTIMIZATION

NOTATION AND TERMINOLOGY

Unconstrained Optimization Principles of Unconstrained Optimization Search Methods

ME 555: Distributed Optimization

Motivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

Theoretical Concepts of Machine Learning

Artificial Intelligence

Lec13p1, ORF363/COS323

Chapter 3 Numerical Methods

Multivariate Numerical Optimization

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

This lecture: Convex optimization Convex sets Convex functions Convex optimization problems Why convex optimization? Why so early in the course?

Beyond Classical Search: Local Search. CMPSCI 383 September 23, 2011

Minima, Maxima, Saddle points

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

16. LECTURE 16. I understand how to find the rate of change in any direction. I understand in what direction the maximum rate of change happens.

Introduction to Modern Control Systems

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

Lecture 2 September 3

A Systematic Overview of Data Mining Algorithms

REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Cost Functions in Machine Learning

Programming, numerics and optimization

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. Chapter 4 : Optimization for Machine Learning

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Multidimensional scaling

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Introduction to ANSYS DesignXplorer

Scalable Data Analysis

Machine Learning for Signal Processing Lecture 4: Optimization

Outline. CS 6776 Evolutionary Computation. Numerical Optimization. Fitness Function. ,x 2. ) = x 2 1. , x , 5.0 x 1.

Algebraic Iterative Methods for Computed Tomography

Constrained optimization

Convex Optimization and Machine Learning

Linear Regression Implementation

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Computational Methods. Constrained Optimization

Invariant shape similarity. Invariant shape similarity. Invariant similarity. Equivalence. Equivalence. Equivalence. Equal SIMILARITY TRANSFORMATION

Linear Regression Optimization

ECS289: Scalable Machine Learning

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

CS281 Section 3: Practical Optimization

PATTERN CLASSIFICATION AND SCENE ANALYSIS

IE598 Big Data Optimization Summary Nonconvex Optimization

Classification of Optimization Problems and the Place of Calculus of Variations in it

Convex Optimization. Erick Delage, and Ashutosh Saxena. October 20, (a) (b) (c)

Introduction to Design Optimization: Search Methods

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Case Study 1: Estimating Click Probabilities

Lecture 25 Nonlinear Programming. November 9, 2009

Transcription:

Optimization Industrial AI Lab.

Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2

Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do) 3

Optimization Nature optimizes (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do) 4

Optimization 3 key components 1) Objective function 2) Decision variable or unknown 3) Constraints Procedures 1) The process of identifying objective, variables, and constraints for a given problem (known as "modeling ) 2) Once the model has been formulated, optimization algorithm can be used to find its solutions 5

Optimization: Mathematical Model In mathematical expression x = x 1 x n R n is the decision variable f: R n R is objective function Feasible region: C = {x: g i (x) 0, i = 1,, m} x R n is an optimal solution if x C and f(x ) f x, x C 6

Optimization: Mathematical Model In mathematical expression Remarks: equivalent 7

Unconstrained vs. Constrained 8

Convex vs. Nonconvex 9

Convex Optimization 10

Convex Optimization An extremely powerful subset of all optimization problems f: R n R is a convex function and Feasible region C is a convex set Key property of convex optimization: all local solutions are global solutions We will use CVX (or CVXPY) as a convex optimization solver https://www.cvxpy.org/ Many examples later 11

Linear Interpolation between Two Points Ԧz = θ Ԧx + (1 θ) Ԧy and θ [0,1] 12

Convex Function and Convex Set convex function convex set Images from http://web.stanford.edu/class/ee364a/lectures.html 13

Solving Optimization Problems 14

Solving Optimization Problems Starting with the unconstrained, one dimensional case To find minimum point x, we can look at the derivative of the function f x Any location where f x = 0 will be a flat point in the function For convex problems, this is guaranteed to be a minimum 15

Solving Optimization Problems Generalization for multivariate function f: R n R the gradient of f must be zero For defined as above, gradient is a n-dimensional vector containing partial derivatives with respect to each dimension For continuously differentiable f and unconstrained optimization, optimal point must have x f x = 0 16

How do we Find x f x = 0 Direct solution In some cases, it is possible to analytically compute x such that x f x = 0 17

Gradients Matrix derivatives 18

How to Find x f x = 0 Direct solution In some cases, it is possible to analytically compute x such that x f x = 0 19

Examples, P = P T 20

Revisit: Least-Square Solution Scalar Objective: J = Ax y 2 21

How do we Find x f x = 0 Iterative methods More commonly the condition that the gradient equal zero will not have an analytical solution, require iterative methods The gradient points in the direction of steepest ascent for function f 22

Descent Direction (1D) It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient 23

Gradient Descent 24

Gradient Descent in High Dimension 25

Gradient Descent in High Dimension 26

Gradient Descent Update rule: 27

Choosing Step Size α Learning rate 28

Where will We Converge? Random initialization Multiple trials 29

Gradient Descent vs. Analytical Solution Analytical solution for MSE Gradient descent Easy to implement Very general, can be applied to any differentiable loss functions Requires less memory and computations (for stochastic methods) Gradient descent provides a general learning framework Can be used both for classification and regression Training Neural Networks: Gradient Descent 30

Practically Solving Optimization Problems The good news: for many classes of optimization problems, people have already done all the hard work of developing numerical algorithms A wide range of tools that can take optimization problems in natural forms and compute a solution We will use CVX (or CVXPY) as an optimization solver Only for convex problems Download: https://www.cvxpy.org/ Gradient descent Neural networks/deep learning TensorFlow 31

Summary: Training Neural Networks Optimization procedure It is not easy to numerically compute gradients in network in general. The good news: people have already done all the "hard work" of developing numerical solvers (or libraries) There are a wide range of tools We will use TensorFlow 32

Examples 33

Linear Programming Objective function and constraints are both linear Convex 34

Method 1: Geometric Approach x 2 x 3 * 2 x 1 35

Method 2: CVXPY Many examples will be provided throughout the lecture 36

Method 2: CVXPY 37

Quadratic Form 38

Quadratic Programming The problem can be found at http://courses.csail.mit.edu/6.867/wiki/images/e/ef/qp-quadprog.pdf 39

Quadratic Programming The problem can be found at http://courses.csail.mit.edu/6.867/wiki/images/e/ef/qp-quadprog.pdf 40

Example: Shortest Distance Find the best location to listen to singer's voice 41

Example: Shortest Distance 42

Example: Supply Chain Management Find a point that minimizes the sum of the transportation costs (or distance) from this point to 3 destination points 43