Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Similar documents
Theoretical Concepts of Machine Learning

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

APPLIED OPTIMIZATION WITH MATLAB PROGRAMMING

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

3.6.2 Generating admissible heuristics from relaxed problems

Cost Functions in Machine Learning

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

25. NLP algorithms. ˆ Overview. ˆ Local methods. ˆ Constrained optimization. ˆ Global methods. ˆ Black-box methods.

Artificial Intelligence

Escaping Local Optima: Genetic Algorithm

10703 Deep Reinforcement Learning and Control

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Constrained Optimization COS 323

Introduction to Design Optimization: Search Methods

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Heuristic Optimisation

Finding optimal configurations Adversarial search

Lecture 4. Convexity Robust cost functions Optimizing non-convex functions. 3B1B Optimization Michaelmas 2017 A. Zisserman

A Brief Look at Optimization

Numerical Optimization: Introduction and gradient-based methods

Beyond Classical Search: Local Search. CMPSCI 383 September 23, 2011

An evolutionary annealing-simplex algorithm for global optimisation of water resource systems

5 Machine Learning Abstractions and Numerical Optimization

Gradient Descent. 1) S! initial state 2) Repeat: Similar to: - hill climbing with h - gradient descent over continuous space

Programming, numerics and optimization

METAHEURISTICS. Introduction. Introduction. Nature of metaheuristics. Local improvement procedure. Example: objective function

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Chapter 14 Global Search Algorithms

TDDC17. Intuitions behind heuristic search. Recall Uniform-Cost Search. Best-First Search. f(n) =... + h(n) g(n) = cost of path from root node to n

Topological Machining Fixture Layout Synthesis Using Genetic Algorithms

4 INFORMED SEARCH AND EXPLORATION. 4.1 Heuristic Search Strategies

Local Search (Ch )

Heuristic Optimization Introduction and Simple Heuristics

Introduction (7.1) Genetic Algorithms (GA) (7.2) Simulated Annealing (SA) (7.3) Random Search (7.4) Downhill Simplex Search (DSS) (7.

CS 331: Artificial Intelligence Local Search 1. Tough real-world problems

Optimization in Brachytherapy. Gary A. Ezzell, Ph.D. Mayo Clinic Scottsdale

Energy Minimization -Non-Derivative Methods -First Derivative Methods. Background Image Courtesy: 3dciencia.com visual life sciences

SPATIAL OPTIMIZATION METHODS

Using Genetic Algorithms in Integer Programming for Decision Support

Simplicial Global Optimization

Algorithm Design (4) Metaheuristics

TDDC17. Intuitions behind heuristic search. Best-First Search. Recall Uniform-Cost Search. f(n) =... + h(n) g(n) = cost of path from root node to n

March 19, Heuristics for Optimization. Outline. Problem formulation. Genetic algorithms

CS 188: Artificial Intelligence Spring Today

B553 Lecture 12: Global Optimization

Administrative. Local Search!

N-Queens problem. Administrative. Local Search

mywbut.com Informed Search Strategies-II

Artificial Intelligence

Introduction to Optimization Problems and Methods

Introduction to Design Optimization: Search Methods

Artificial Intelligence

HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS

A Systematic Overview of Data Mining Algorithms

CS 188: Artificial Intelligence

3 INTEGER LINEAR PROGRAMMING

Crash-Starting the Simplex Method

Machine Learning for Signal Processing Lecture 4: Optimization

International Journal of Current Research and Modern Education (IJCRME) ISSN (Online): & Impact Factor: Special Issue, NCFTCCPS -

Outline of the module

Introduction to ANSYS DesignXplorer

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

: Principles of Automated Reasoning and Decision Making Midterm

Today. CS 188: Artificial Intelligence Fall Example: Boolean Satisfiability. Reminder: CSPs. Example: 3-SAT. CSPs: Queries.

CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION

CS:4420 Artificial Intelligence

Artificial Intelligence

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

ARTIFICIAL INTELLIGENCE (CSCU9YE ) LECTURE 5: EVOLUTIONARY ALGORITHMS

Introduction to Optimization

Lecture 3, Review of Algorithms. What is Algorithm?

Introduction to Optimization

A Steady-State Genetic Algorithm for Traveling Salesman Problem with Pickup and Delivery

Heuristic (Informed) Search

Outline. Best-first search. Greedy best-first search A* search Heuristics Local search algorithms

Pre-requisite Material for Course Heuristics and Approximation Algorithms

x n+1 = x n f(x n) f (x n ), (1)

Lecture Plan. Best-first search Greedy search A* search Designing heuristics. Hill-climbing. 1 Informed search strategies. Informed strategies

Classical Gradient Methods

Local Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat:

Informed search algorithms. Chapter 4

Local and Global Minimum

Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing

Local Search. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Fast-Lipschitz Optimization

An Evolutionary Algorithm for Minimizing Multimodal Functions

Optimization Techniques for Design Space Exploration

Multi-objective Optimization

Artificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu. Lecture 5: Search 4.

56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997

Introduction to Optimization

Two approaches. Local Search TSP. Examples of algorithms using local search. Local search heuristics - To do list

Probabilistic Graphical Models

Lecture 4: Local and Randomized/Stochastic Search

Optimization. Industrial AI Lab.

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

Evolutionary Computation Algorithms for Cryptanalysis: A Study

Transcription:

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Search & Optimization Search and Optimization method deals with how to find the best model(s) that makes the score function producing the minimum (maximum) value Search for the best model structure from a set of candidates Optimize the model parameters within a given model Major problem: Both the number of the possible model structures and parameter space are very huge! How to conduct efficient search and optimization?

Simply search strategy for models Exhaustive search For every model in the candidate set, find the best parameters w.r.t. the score function, and then compare the scores of all the models to find the best. Pros: Cons: Guarantee to find the best model w.r.t. the score function May be implemented in a parallel fashion Re-optimize the parameter for each new model structure Face to potentially combinatorial explosion Highly inefficient, and most of the time infeasible!

Being smart with compromise (I) Making use of the decomposable score function The score function for a new structure will be an additive function of the score function for the previous structure as well as a term accounting for the change in structure Pros: Cons: Easy to obtain the score function value of the current model based on the previous model Limited to particular score function

Being smart with compromise (II) Use approximation for the best parameter ( incremental ) Leave the existing parameters in the model fixed to the previous values and only optimize the parameters added to the model Pros: Cons: Reduces the number of parameters to be estimated when changing the model structure for a little bit Can save up time for searching more candidate model structures Proven to be suboptimal Error accumulation problem

Being smart with compromise (III) Heuristic search Apply some heuristic to narrow down the search space of the model structures Pros: Efficient under combinatorial explosions Intuitive, and easy to implement Cons: Lack of mathematically validity Might be ineffective under certain unknown situation.

State-space formulation for model search Model search problem can be viewed as one of moving through a discrete set of states State space representation Each state corresponds to a particular model in the candidate set Each state can be represented as a vertex in a graph Search operators Search operators corresponds to legal moves in our search space Can be represented as edges between the (state) vertices in a graph

Simple greedy search algorithm 1. Initialize: 2. Iterate: Choose a initial state M 0, corresponding to a particular model structure M k evaluate the score function at all possible adjacent states (as defined by the operators) and move to the best one 3. Stopping criterion: Repeat step 2 until no further improvement can be attained in the local score function 4. Multiple restart: Repeat steps 1 through 3 from different initial starting points and choose the best solution found. Suboptimal

Systematic search Search tree Instead of following the single path to search the best at every step, we keep track of multiple models simultaneously. Traverse on search tree Blind search Breadth-first search Consume huge memories Depth-first search Memory-efficient

Systematic search Traverse on search tree with heuristics Beam search Keep track of the b best models at any point in the search! Suboptimal. A trade-off for efficiency! Branch-and-bound Keep track of the best model structure so far Calculate analytically a lower bound on the best possible score function from a particular branch of the search tree If the bound is greater than the best score function so far, prune this branch. Difficult to find a tight bound. Scalability is limited!

Parameter optimization For a given model structure, we can link the parameter for this model structure to the score function, and directly optimize over the score function. Optimization can be down by calculating the minimum (or maximum) value directly Close form solution: Let, solving d linear equations Iterative Optimization

Greedy method for optimizing smooth functions 1. Initialize (Randomly) Choose an initial value for the parameter vector 2. Iterate: Determined by Local information 3. Convergence How much to change the value Repeat 2 until S appears to have attained a local minimum 4. Multiple restart Repeat steps 1 through 3 from different initial starting points and choose the best solution found

Univariate Optimization (I) The Newton-Raphson Method Based on Taylor series expansion Since then, Second order method So the update rule is Pros: Cons: Convergence rate is quadratic if close to the solution May not converge at all if far from the solution

Univariate Optimization (II) The Gradient descent Method The update rule is: Momentum-Based methods: Accelerate the convergence of gradient descent by adding momentum term. Considering the history of the path information Bracketing Method Finding a bracket that provably contains the extremum of the function Accelerate the convergence in low curvature region.

From 1-D to n-d (I) Two key questions: In which direction should we move from θ i? How far should we step in that direction? Multivariate methods: Multivariate gradient descent If the learning rate is sufficiently small, the gradient descent is guaranteed to converge to local minimum of S

From 1-D to n-d (II) Multivariate methods Newton s method If S is quadratic, the step taken by Newton s method directly points to the minimum of S S can be regarded as locally quadratic near the minimum pint This method involved a inverse of a matrix, which might be time consuming

From 1-D to n-d (III) Other Multivariate methods Coordinate descent For each axis of the original space, iteratively conduct univariate gradient descent Using conjugate directions Use principal axes to transform data and then find the direction in the transformed space. Simplex search Use simplex reflection to find search direction and the size of simplex as type of step size

Constrained optimization What is constrained optimization? The parameters take value from the feasible region, instead of the whole space. How to solve? s.t. Introduces Lagrange multipliers Let gradient equal to 0 The optimum is obtained under the KKT condiction.

Maximizing likelihood with missing parameter Problem settings: Given: Data set Hidden variables associated to each data Goal: Optimize the parameters θ w.r.t. the log likelihood function Challenge: Both parameters and the hidden variables are unknown! Aha, we can fix one and determine the other, and then fix the other one and determine this one!

The EM Algorithm The Expectation-Maximization (EM) algorithm is used for finding MLE of parameters in probabilistic models that depends on unobserved under variables. It is an iterative procedure altering between Expectation step and Maximization step E step: computes an expectation of the log likelihood w.r.t. the current estimate of the conditional distribution for the hidden variables M step: computes the parameters which maximize the expected log likelihood found on the E step

Why EM works? Idea behind EM Find a lower bound of log likelihood function At each step, optimize the lower bound w.r.t. one set of unknown parameters by fixing other set of parameters at their current values Iterates until the process converges Maximizing the lower bound function will find the parameters that increases the log likelihood function.

Encodes the basic idea of EM with some mathematics The lower bound: Jessen s inequality Let s check it out in detail Iterative optimization: From k-th rounds to the (k+1)-th rounds Convergence: F value will be increased during the any successive rounds F is bounded from the top by L which has an maximum value

The EM cookbook To use EM, the following issues should be determined: What is the log likelihood function. (EM can apply to any likelihood function) What is the model parameters and what is the hidden variables What is the expectation in the E step, and how to compute it. What is supposed to be maximized based on the expectation, and how to compute it.

Optimizing parameters with single scan In many online applications Data come in stream Limited space for storing training data Need quick response for a given input. We can only receive one data point, make use of it and then discard it. Stochastic approximation would be applied Reform the batch score function to instantaneous score function which only depends on the current example Optimize the instantaneous score function (e.g., take a gradient descent) The average instantaneous score function should asymptotically approach to the batch score function to guarantee the approximation is good.

Heuristic search and optimization (I) Genetic search Genetic algorithms are general set of heuristic search techniques based on ideas from evolutionary biology A framework of GA Represent models as chromosomes (binary string) Evolve a population of such chromosomes by selectively pairing (according to their fitness defined by a score function) and mutate chromosomes to create off-springs Essential idea of GA: Maintaining a set of candidate models instead of one to allow simultaneously exploring of the state space New states are created based on combination of the current state, allowing jump to different part of the state-space to avoid being stuck in local minimum

Heuristic search and optimization (II) Simulated annealing Simulated annealing is a heuristic search technique based on ideas from physics The framework Allow the moves in state-space that decrease the score function to be minimized Allow some moves (with some probability) that can increase the score function, which is controlled by the temperature that gradually decreases. Key idea Higher temperature enables large moves in the parameter space to explore many possible place at beginning, in the hope that the large moves may lead to the deepest basin Lower temperature decreases the energy for large moves such that the search is stabilized into local search, to avoid escaping from the deep basin

Let s move to Chapter 9