Lecture 17: Feature Subset Selection II

Similar documents
Lecture 13: Validation

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

Lecture 11: PI/T parallel I/O, part I

CIS 121. Introduction to Trees

. Written in factored form it is easy to see that the roots are 2, 2, i,

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP

Lecture 5. Counting Sort / Radix Sort

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Małgorzata Sterna. Mateusz Cicheński, Mateusz Jarus, Michał Miszkiewicz, Jarosław Szymczak

Computational Geometry

6.854J / J Advanced Algorithms Fall 2008

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

Lecture 18. Optimization in n dimensions

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Homework 1 Solutions MA 522 Fall 2017

Designing a learning system

Ones Assignment Method for Solving Traveling Salesman Problem

Fuzzy Rule Selection by Data Mining Criteria and Genetic Algorithms

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

University of Waterloo Department of Electrical and Computer Engineering ECE 250 Algorithms and Data Structures

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Pattern Recognition Systems Lab 1 Least Mean Squares

Data Structures Week #5. Trees (Ağaçlar)

Elementary Educational Computer

condition w i B i S maximum u i

Algorithm. Counting Sort Analysis of Algorithms

Data Structures and Algorithms Part 1.4

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Lecture 1: Introduction and Strassen s Algorithm

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

Heuristic Set-Covering-Based Postprocessing for Improving the Quine-McCluskey Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

Heuristic Approaches for Solving the Multidimensional Knapsack Problem (MKP)

quality/quantity peak time/ratio

Data diverse software fault tolerance techniques

The isoperimetric problem on the hypercube

Parabolic Path to a Best Best-Fit Line:

Solution printed. Do not start the test until instructed to do so! CS 2604 Data Structures Midterm Spring, Instructions:

A Novel Approach to Solve Multiple Traveling Salesmen Problem by Genetic Algorithm

How do we evaluate algorithms?

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

Evolutionary Hybrid Genetic-Firefly Algorithm for Global Optimization

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

Sorting 9/15/2009. Sorting Problem. Insertion Sort: Soundness. Insertion Sort. Insertion Sort: Running Time. Insertion Sort: Soundness

ECE4050 Data Structures and Algorithms. Lecture 6: Searching

ISSN (Print) Research Article. *Corresponding author Nengfa Hu

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Analysis of Algorithms

An Estimation of Distribution Algorithm for solving the Knapsack problem

Analysis of Algorithms

Graphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU)

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

Analysis of Algorithms

1 Graph Sparsfication

DATA SHEET AND USER GUIDE

Data Structures and Algorithms. Analysis of Algorithms

Designing a learning system

Numerical Methods Lecture 6 - Curve Fitting Techniques

A Note on Least-norm Solution of Global WireWarping

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Abstract. Chapter 4 Computation. Overview 8/13/18. Bjarne Stroustrup Note:

Overview. Chapter 18 Vectors and Arrays. Reminder. vector. Bjarne Stroustrup

Improving Template Based Spike Detection

BST Sequence of Operations

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual)

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Lecture 2: MC68000 interrupt in C language

DATA STRUCTURES. amortized analysis binomial heaps Fibonacci heaps union-find. Data structures. Appetizer. Appetizer

CS 683: Advanced Design and Analysis of Algorithms

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

The golden search method: Question 1

Exact Minimum Lower Bound Algorithm for Traveling Salesman Problem

Optimal Mapped Mesh on the Circle

Variance as a Stopping Criterion for Genetic Algorithms with Elitist Model

A Parallel DFA Minimization Algorithm

Massachusetts Institute of Technology Lecture : Theory of Parallel Systems Feb. 25, Lecture 6: List contraction, tree contraction, and

Project 2.5 Improved Euler Implementation

Chapter 5. Functions for All Subtasks. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Big-O Analysis. Asymptotics

The Magma Database file formats

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Image Segmentation EEE 508

5.3 Recursive definitions and structural induction

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

1.2 Binomial Coefficients and Subsets

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

NTH, GEOMETRIC, AND TELESCOPING TEST

Transcription:

Lecture 17: Feature Subset Selectio II Expoetial search methods Brach ad Boud Approximate Mootoicity with Brach ad Boud Beam Search Radomized alorithms Radom Geeratio plus Sequetial Selectio Simulated Aeali Geetic Alorithms Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 1

Brach ad Boud (B&B) (1) The Brach ad Boud alorithm, developed by Naredra ad Fukuaa i 1977, is uarateed to fid the optimal feature subset uder the mootoicity assumptio The mootoicity assumptio states that the additio of features ca oly icrease the value of the objective fuctio, this is J ( x ) < J( x, x ) < J( x, x, x ) <L< J( x, x, L, ) 1 i1 i2 i1 i2 i3 i1 i2 in i x Brach ad Boud starts from the full set ad removes features usi a depth-first stratey Nodes whose objective fuctio are lower tha the curret best are ot explored sice the mootoicity assumptio esures that their childre will ot cotai a better solutio Empty feature set Full feature set Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 2

Brach ad Boud (2) Alorithm The alorithm is better explaied by cosideri the subsets of M =N-M features already discarded, where N is the dimesioality of the state space ad M is the desired umber of features Sice the order of the features is irrelevat, we will oly cosider a icreasi orderi i 1 <i 2 <...i M of the feature idices, this will avoid explori states that differ oly i the orderi of their features The Brach ad Boud tree for N=6 ad M=2 is show below (umbers idicate features that are bei removed) Notice that at the level directly below the root we oly cosider removi features 1, 2 or 3, sice a hiher umber would ot allow sequeces (i 1 < i 2 < i 3 < i 4 ) with four idices 1 2 3 2 3 4 3 4 4 3 4 5 4 5 5 4 5 5 5 4 5 6 5 6 6 5 6 6 6 5 6 6 6 6 Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 3

Brach ad Boud (3) 1. Iitialize: α=-, k=0 2. Geerate successors of the curret ode ad store them i LIST(k) 3. Select ew ode if if LIST(k) is empty o to Step 5 else i = armax delete i k [ J( x,x,...x, j) ] k i1 i2 ik 1 j LIST(k) from LIST(k) 4. Check boud if if J( xi,x,...x ) 1 i2 i < k o to 5 else if if k=m (we have the desired umber of features) o to 6 else k=k+1 o to 2 5. Backtrack to lower level set k=k-1 if if k=0 termiate alorithm else o to 3 6. Last level * Set = J( xi 1,xi 2,...xi k 1, j) ad YM = { xi 1,xi 2,... xi k } o to 5 Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 4

Approximate Mootoicity with B & B (AMB&B) AMB&B is a variatio of the classical Brach ad Boud alorithm AMB&B allows o-mootoic fuctios to be used, typically classifiers, by relaxi the cutoff coditio that termiates the search o a specific ode Assume that we ru B&B by setti a threshold error rate τ rather tha a umber of features M Uder AMB&B, a ive feature subset Y will be cosidered Feasible if J(Y) τ Coditioally feasible if J(Y) τ(1+ ) Ufeasible if J(Y) τ(1+ ) is a tolerace placed o the threshold to accommodate o-mootoic fuctios Rather tha limiti the search to feasible odes (like B&B does), AMB&B allows the search to explore coditioally feasible odes with the hope that these odes will lead to a feasible solutio However, AMB&B will ot retur coditioally feasible odes as solutios, it oly allows the search to explore them! Otherwise it would ot be ay differet tha B&B with a hiher threshold of τ(1+ ) Empty feature set Full feature set Coditioally feasible Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 5

Beam Search (1) Beam Search is a variatio of best-first search with a bouded queue to limit the scope of the search The queue oraizes states from best to worst, with the best states placed at the head of the queue At every iteratio, BS evaluates all possible states that result from addi a feature to the feature subset, ad the results are iserted ito the queue i their proper locatios It is trivial to otice that BS deeerates to Exhaustive search if there is o limit o the size of the queue. Similarly, if the queue size is set to oe, BS is equivalet to Sequetial Forward Selectio Empty feature set Full feature set Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 6

Beam Search (2) The example below illustrates BS for a 4-dimesioal search space ad a queue of size 3 BS caot uaratee that the optimal subset is foud: i the example, the optimal is 2-3-4(9), which is ever explored however, with the proper queue size, Beam Search ca avoid etti trapped i local miimal by preservi solutios from varyi reios i the search space root 2(6) 1(5) 3(5) 4(2) 2(3) 3(8) 4(6) 3(4) 4(5) 4(1) LIST={ } 3(7) 4(3) 4(5) 4(9) 4(5) root 1(5) 2(3) 3(4) 2(6) 3(5) 4(2) 3(8) 4(6) 4(5) 3(7) 4(3) 4(5) 4(9) 4(5) root 1(5) 2(3) 3(4) 2(6) 3(5) 4(2) 3(8) 4(6) 4(5) 3(7) 4(3) 4(5) 4(9) 4(5) root 1(5) 2(3) 3(4) 2(6) 3(5) 4(2) 3(8) 4(6) 4(5) 3(7) 4(3) 4(5) 4(9) 4(5) 4(1) 4(1) 4(1) LIST={1(5), 3(4), 2(3)} LIST={1-2(6), 1-3(5), 3(4)} LIST={1-2-3(7), 1-3(5), 3(4)} Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 7

Radom Geeratio plus Sequetial Selectio RGSS is a attempt to itroduce radomess ito SFS ad SBS i order to escape local miima The alorithm is self-explaatory Empty feature set 1. Repeat for a umber of iteratios 1a.Geerate a radom feature subset 1b.Perform SFS o this subset 1c. Perform SBS o this subset Full feature set Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 8

Simulated Aeali (1) Simulated Aeali is a stochastic optimizatio method that derives its ame from the aeali process used to re-crystallize metals Duri the aeali process i metals, the alloy is cooled dow slowly to allow its atoms to reach a cofiuratio of miimum eery (a perfectly reular crystal) If the alloy is aealed too fast, such a oraizatio caot propaate throuhout the material. The result will be a material with reios of reular structure separated by boudaries. These boudaries are potetial fault-lies where fractures are most likely to occur whe the material is stressed The laws of thermodyamics state that, at temperature T, the probability of a icrease i eery E i the system is ive by the expressio ( kt ( () P = e where k is kow as the Boltzma s costat The alorithm is a straihtforward implemetatio of these ideas Empty feature set 1. Determie a aeali schedule T(i) 2. Create a iitial solutio Y(0) 3. While T(i)>T MIN 3a.Geerate a ew solutio Y(i+1) which is a eihbor of Y(i) 3b.Compute E= - [ J(Y(i+1)) - J(Y(i)) ] 3b.If E<0 the always accept the move from Y(i) to Y(i+1) else accept the move with probability P=exp(- E/T(i)) Full feature set Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 9

Simulated Aeali (2) Simulated aeali is summarized with the followi idea [Hayki, 1999] Whe optimizi a very lare ad complex system (i.e., a system with may derees of freedom), istead of always oi dowhill, try to o dowhill most of the times The previous formulatio of the Simulated Aeali alorithm ca be used for ay type of miimizatio problem ad it oly requires specificatio of A trasform to eerate a local eihbor from the curret solutio (i.e. add a radom vector) For Feature Subset Selectio, the trasform will cosist of addi or removi features, typically implemeted as a radom mutatio with low probability A aeali schedule, typically T(i+1)=rT(i), with 0.0 r 1.0 A iitial temperature T(0) Selectio of the aeali schedule is critical If r is chose close too lare, the temperature decreases very slowly, allowi moves to hiher eery states to occur more frequetly. This results i slow coverece If r is chose too small, the temperature decreases very fast, ad the alorithm is likely to covere to a local miima Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 10

Simulated Aeali (3) A uique feature of simulated aeali is its adaptive ature At hih temperature the alorithm is oly looki at the ross features of the optimizatio surface, while at low temperatures, the fier details of the surface start to appear J(Y) Hih T Low T Y Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 11

Geetic Alorithms Geetic alorithms are optimizatio techiques that mimic the evolutioary process of survival of the fittest Starti with a iitial radom populatio of solutios, evolve ew populatios by mati (crossover) pairs of solutios ad mutati solutios accordi to their fitess (objective fuctio) The better solutios are more likely to be selected for the mati ad mutatio operatios ad therefore carry their eetic code from eeratio to eeratio For the problem of Feature Subset Selectio, idividual solutios are simply represeted with a biary umber (1 if the ive feature is selected, 0 otherwise), which is the oriial represetatio proposed by Hollad i 1974 Empty feature set Alorithm 1. Create a iitial radom populatio 2. Evaluate iitial populatio 2. Repeat util coverece (or a umber of eeratios) 2a.Select the fittest idividuals i the populatio 2b.Perform crossover o the selected idividuals to create offspri 2c. Perform mutatio o the selected idividuals 2d.Create the ew populatio from the old populatio ad the offspri 2e.Evaluate the ew populatio Full feature set Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 12

Geetic operators Sile-poit crossover Select two idividuals (parets) accordi to their fitess Select a crossover poit With probability P c (0.95 is reasoable) create two offspri by combii the parets Crossover poit selected radomly Paret i Paret j 01001010110 11010110000 Crossover 11011010110 01100110000 Offspri i Offspri j Biary mutatio Select a idividual accordi to its fitess With probability P M (0.01 is reasoable) mutate each oe of its bits Mutated bits Idividual 11010110000 Mutatio 11001010111 Offspri i Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 13

Selectio methods The selectio of idividuals is based o their fitess (the value of the objective fuctio) We will describe a selectio method called Geometric selectio Several other methods are available: Roulette Wheel, Touramet Selectio, etc. Geometric selectio The probability of selecti the r th best idividual is ive by the eometric probability mass fuctio q is the probability of selecti the best idividual (0.05 is a reasoable value) Therefore, the eometric distributio assis hiher probability to idividuals raked better, but also allows ufit idividuals to be selected I additio, it is typical to carry the best idividual of each populatio to the ext oe () r q( 1- q) r-1 P = This is called the Elitist Model Selectio probability 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 q=0.08 q=0.04 q=0.02 q=0.01 0 10 20 30 40 50 60 70 80 90 100 Idividual rak Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 14

GAs, parameter choices for Feature selectio The choice of crossover rate P C is ot critical You will wat a value close to 1.0 to have a lare umber of offspri The choice of mutatio rate P M is very critical A optimal choice of P M will allow the GA to explore the more promisi reios while avoidi etti trapped i local miima A lare value (i.e., P M >0.25) will ot allow the search to focus o the better reios, ad the GA will perform like radom search A small value (i.e., close to 0.0) will ot allow the search to escape local miima The choice of q, the probability of selecti the best idividual is also critical A optimal value of q will allow the GA to explore the most promisi solutio, ad at the same time provide sufficiet diversity to avoid early coverece of the alorithm I eeral, poorly selected cotrol parameters will result i sub-optimal solutios due to early coverece Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 15

Search Strateies, summary Accuracy Complexity Advataes Disadvataes Exhaustive Sequetial Radomized Always fids the optimal solutio Good if o backtracki eeded Good with proper cotrol parameters Expoetial Hih accuracy Hih complexity Quadratic O(N EX 2) Simple ad fast Geerally low Desied to escape local miima Caot backtrack Difficult to choose ood parameters A hihly recommeded review of the material preseted i these two lectures is Justi Doak A evaluatio of feature selectio methods ad their applicatio to Computer Security Uiversity of Califoria at Davis, Tech Report CSE-92-18 Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 16