Optimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

Similar documents
Optimization. Industrial AI Lab.

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

LECTURE NOTES Non-Linear Programming

Introduction to optimization methods and line search

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Autoencoder. 1. Unsupervised Learning. By Prof. Seungchul Lee Industrial AI Lab POSTECH.

Convolutional Neural Networks (CNN)

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

Introduction to Optimization Problems and Methods

Symbolic Subdifferentiation in Python

COMS 4771 Support Vector Machines. Nakul Verma

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Machine Learning Classifiers and Boosting

5 Machine Learning Abstractions and Numerical Optimization

M. Sc. (Artificial Intelligence and Machine Learning)

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

Perceptron: This is convolution!

Autoencoder. 1. Unsupervised Learning. By Prof. Seungchul Lee Industrial AI Lab POSTECH.

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

STA141C: Big Data & High Performance Statistical Computing

Combine the PA Algorithm with a Proximal Classifier

Introduction to Optimization

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

6 BLAS (Basic Linear Algebra Subroutines)

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

ME 555: Distributed Optimization

Introduction to unconstrained optimization - derivative-free methods

Autoencoder. By Prof. Seungchul Lee isystems Design Lab UNIST. Table of Contents

Characterizing Improving Directions Unconstrained Optimization

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

A Brief Look at Optimization

Deep Learning and Its Applications

Unsupervised Learning: K-means Clustering

Lecture 12: Feasible direction methods

Lecture 4 of Artificial Intelligence. Heuristic Search. Produced by Qiangfu Zhao (2008), All rights reserved

Ensemble methods in machine learning. Example. Neural networks. Neural networks

DEEP LEARNING IN PYTHON. The need for optimization

Integration. Volume Estimation

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Multivariate Numerical Optimization

Theoretical Concepts of Machine Learning

Logistic Regression. May 28, Decision boundary is a property of the hypothesis and not the data set e z. g(z) = g(z) 0.

Monte Carlo Integration

Motivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Recent Developments in Model-based Derivative-free Optimization

3 Types of Gradient Descent Algorithms for Small & Large Data Sets

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Solution Methods Numerical Algorithms

Cost Functions in Machine Learning

Constrained and Unconstrained Optimization

Beyond Classical Search: Local Search. CMPSCI 383 September 23, 2011

Chapter 3 Numerical Methods

Natural Language Processing

Generic descent algorithm Generalization to multiple dimensions Problems of descent methods, possible improvements Fixes Local minima

Alternating Direction Method of Multipliers

Frameworks in Python for Numeric Computation / ML

16. LECTURE 16. I understand how to find the rate of change in any direction. I understand in what direction the maximum rate of change happens.

Case Study 1: Estimating Click Probabilities

Convex Optimization MLSS 2015

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. Chapter 4 : Optimization for Machine Learning

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Unconstrained Optimization Principles of Unconstrained Optimization Search Methods

Machine Learning Basics. Sargur N. Srihari

Logistic Regression

Lagrangean relaxation - exercises

Department of Mathematics Oleg Burdakov of 30 October Consider the following linear programming problem (LP):

Computational Methods. Constrained Optimization

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual

Constrained optimization

Lecture 12: convergence. Derivative (one variable)

Lecture 3: Theano Programming

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Constrained Optimization COS 323

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

ECS289: Scalable Machine Learning

Linear Regression Implementation

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1

Gradient Descent. Michail Michailidis & Patrick Maiden

Introduction to Machine Learning

MI2LS: Multi-Instance Learning from Multiple Information Sources

AVERAGING RANDOM PROJECTION: A FAST ONLINE SOLUTION FOR LARGE-SCALE CONSTRAINED STOCHASTIC OPTIMIZATION. Jialin Liu, Yuantao Gu, and Mengdi Wang

IST 597 Foundations of Deep Learning Fall 2018 Homework 1: Regression & Gradient Descent

Digital Image Processing Laboratory: MAP Image Restoration

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Deep Learning with Tensorflow AlexNet

NOTATION AND TERMINOLOGY

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

3 Perceptron Learning; Maximum Margin Classifiers

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Transcription:

Optimization by Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Optimization II. 2. Solving Optimization Problems III. 3. How do we Find x f(x) = 0 IV. 4. Gradient Descent V. 5. Practically Solving Optimization Problems I. 5.1. Gradient Descent vs. Analytical Solution II. 5.2. Training Neural Networks 1. Optimization an important tool in 1) engineering problem solving and 2) decision science People optimize

Nature optimizes (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do (http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)) 3 key components 1. objective 2. decision variable or unknown 3. constraints Procedures 1. The process of identifying objective, variables, and constraints for a given problem is known as "modeling" 2. Once the model has been formulated, optimization algorithm can be used to find its solutions. In mathematical expression min x subject to f(x) g i (x) 0, i = 1,, m x 1 x = R n is the decision variable f : R n x n R is objective function Feasible region : C = {x : (x) 0, i = 1,, m} g i x R 2 is an optimal solution if x C and f( x ) f(x), x C

Remarks : equivalent min f(x) x f(x) max x g i (x) 0 h(x) = 0 (x) 0 g i { h(x) 0 and h(x) 0 The good news: for many classes of optimization problems, people have already done all the "hardwork" of developing numerical algorithms 2. Solving Optimization Problems Starting with th unconstrained, one dimensional case To find minimum point x, we can look at the derivave of the function f : any location where = 0 will be a "flat" point in the function f (x) For convex problems, this is guaranteed to be a minimum (x)

Generalization for multivariate function f : the gradient of f must be zero R n R x f(x) = 0 For defined as above, gradient is a n-dimensional vector containing partial derivatives with respect to each dimension f(x) f(x) = x For continuously differentiable f and unconstrained optimization, optimal point must have x x f( ) = 0 x 1 f(x) x n 3. How do we Find x f(x) = 0 Direct solution In some cases, it is possible to analytically compute x x such that f( x ) = 0 f(x) = 2 x 2 1 + x 2 + x 1 x 2 6x 1 5x 2 4 x f(x) = [ x 1 + x 2 + 6 ] 2 x 2 + x 1 + 5 x 1 4 1 6 1 = [ ] [ ] = [ ] 1 2 5 2 Iterative methods More commonly the condition that the gradient equal zero will not have an analytical solution, require iterative methods The gradient points in the direction of "steepest ascent" for function f

4. Gradient Descent Decent Direction (1D) It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient x x α x f(x) for some step size α > 0 Gradient Descent Repeat : x x α f(x) for some step size α > 0 x

Gradient Descent in High Dimension Repeat : x x α x f(x) Gradient Descent Example min ( x 1 3 ) 2 + ( x 2 3) 2 1 2 0 = min [ ] [ ] [ ] [6 6] [ ] + 18 2 x x 1 x 1 x 1 2 0 2 x 2 x 2 Update rule X i+1 = X i α i f( X i ) In [1]: import numpy as np In [2]: H = np.array([[2, 0],[0, 2]]) f = -np.array([[6],[6]]) x = np.zeros((2,1)) alpha = 0.2 for i in range(25): g = H.dot(x) + f x = x - alpha*g print(x) [[ 2.99999147] [ 2.99999147]]

Choosing Step Size α Learning rate Where will We Converge? 5. Practically Solving Optimization Problems The good news: for many classes of optimization problems, people have already done all the hard work of developing numerical algorithms A wide range of tools that can take optimization problems in natural forms and compute a solution We will use CVX (or CVXPY) as an optimization solver Only for convex problems Download: http://cvxr.com/cvx/ (http://cvxr.com/cvx/) Gradient descent Neural networks/deep learning

5.1. Gradient Descent vs. Analytical Solution Analytical solution for MSE Gradient descent Easy to implement Very general, can be applied to any differentiable loss functions Requires less memory and computations (for stochastic methods) Gradient descent provides a general learning framework Can be used both for classification and regression Training Neural Networks: Gradient Descent 5.2. Training Neural Networks Optimization procedure It is not easy to numerically compute gradients in network in general. The good news: people have already done all the "hard work" of developing numerical solvers (or libraries) There are a wide range of tools We will use TensorFlow In [3]: %%javascript $.getscript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc. js')