Lecture 17: Feature Subset Selectio II Expoetial search methods Brach ad Boud Approximate Mootoicity with Brach ad Boud Beam Search Radomized alorithms Radom Geeratio plus Sequetial Selectio Simulated Aeali Geetic Alorithms Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 1
Brach ad Boud (B&B) (1) The Brach ad Boud alorithm, developed by Naredra ad Fukuaa i 1977, is uarateed to fid the optimal feature subset uder the mootoicity assumptio The mootoicity assumptio states that the additio of features ca oly icrease the value of the objective fuctio, this is J ( x ) < J( x, x ) < J( x, x, x ) <L< J( x, x, L, ) 1 i1 i2 i1 i2 i3 i1 i2 in i x Brach ad Boud starts from the full set ad removes features usi a depth-first stratey Nodes whose objective fuctio are lower tha the curret best are ot explored sice the mootoicity assumptio esures that their childre will ot cotai a better solutio Empty feature set Full feature set Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 2
Brach ad Boud (2) Alorithm The alorithm is better explaied by cosideri the subsets of M =N-M features already discarded, where N is the dimesioality of the state space ad M is the desired umber of features Sice the order of the features is irrelevat, we will oly cosider a icreasi orderi i 1 <i 2 <...i M of the feature idices, this will avoid explori states that differ oly i the orderi of their features The Brach ad Boud tree for N=6 ad M=2 is show below (umbers idicate features that are bei removed) Notice that at the level directly below the root we oly cosider removi features 1, 2 or 3, sice a hiher umber would ot allow sequeces (i 1 < i 2 < i 3 < i 4 ) with four idices 1 2 3 2 3 4 3 4 4 3 4 5 4 5 5 4 5 5 5 4 5 6 5 6 6 5 6 6 6 5 6 6 6 6 Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 3
Brach ad Boud (3) 1. Iitialize: α=-, k=0 2. Geerate successors of the curret ode ad store them i LIST(k) 3. Select ew ode if if LIST(k) is empty o to Step 5 else i = armax delete i k [ J( x,x,...x, j) ] k i1 i2 ik 1 j LIST(k) from LIST(k) 4. Check boud if if J( xi,x,...x ) 1 i2 i < k o to 5 else if if k=m (we have the desired umber of features) o to 6 else k=k+1 o to 2 5. Backtrack to lower level set k=k-1 if if k=0 termiate alorithm else o to 3 6. Last level * Set = J( xi 1,xi 2,...xi k 1, j) ad YM = { xi 1,xi 2,... xi k } o to 5 Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 4
Approximate Mootoicity with B & B (AMB&B) AMB&B is a variatio of the classical Brach ad Boud alorithm AMB&B allows o-mootoic fuctios to be used, typically classifiers, by relaxi the cutoff coditio that termiates the search o a specific ode Assume that we ru B&B by setti a threshold error rate τ rather tha a umber of features M Uder AMB&B, a ive feature subset Y will be cosidered Feasible if J(Y) τ Coditioally feasible if J(Y) τ(1+ ) Ufeasible if J(Y) τ(1+ ) is a tolerace placed o the threshold to accommodate o-mootoic fuctios Rather tha limiti the search to feasible odes (like B&B does), AMB&B allows the search to explore coditioally feasible odes with the hope that these odes will lead to a feasible solutio However, AMB&B will ot retur coditioally feasible odes as solutios, it oly allows the search to explore them! Otherwise it would ot be ay differet tha B&B with a hiher threshold of τ(1+ ) Empty feature set Full feature set Coditioally feasible Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 5
Beam Search (1) Beam Search is a variatio of best-first search with a bouded queue to limit the scope of the search The queue oraizes states from best to worst, with the best states placed at the head of the queue At every iteratio, BS evaluates all possible states that result from addi a feature to the feature subset, ad the results are iserted ito the queue i their proper locatios It is trivial to otice that BS deeerates to Exhaustive search if there is o limit o the size of the queue. Similarly, if the queue size is set to oe, BS is equivalet to Sequetial Forward Selectio Empty feature set Full feature set Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 6
Beam Search (2) The example below illustrates BS for a 4-dimesioal search space ad a queue of size 3 BS caot uaratee that the optimal subset is foud: i the example, the optimal is 2-3-4(9), which is ever explored however, with the proper queue size, Beam Search ca avoid etti trapped i local miimal by preservi solutios from varyi reios i the search space root 2(6) 1(5) 3(5) 4(2) 2(3) 3(8) 4(6) 3(4) 4(5) 4(1) LIST={ } 3(7) 4(3) 4(5) 4(9) 4(5) root 1(5) 2(3) 3(4) 2(6) 3(5) 4(2) 3(8) 4(6) 4(5) 3(7) 4(3) 4(5) 4(9) 4(5) root 1(5) 2(3) 3(4) 2(6) 3(5) 4(2) 3(8) 4(6) 4(5) 3(7) 4(3) 4(5) 4(9) 4(5) root 1(5) 2(3) 3(4) 2(6) 3(5) 4(2) 3(8) 4(6) 4(5) 3(7) 4(3) 4(5) 4(9) 4(5) 4(1) 4(1) 4(1) LIST={1(5), 3(4), 2(3)} LIST={1-2(6), 1-3(5), 3(4)} LIST={1-2-3(7), 1-3(5), 3(4)} Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 7
Radom Geeratio plus Sequetial Selectio RGSS is a attempt to itroduce radomess ito SFS ad SBS i order to escape local miima The alorithm is self-explaatory Empty feature set 1. Repeat for a umber of iteratios 1a.Geerate a radom feature subset 1b.Perform SFS o this subset 1c. Perform SBS o this subset Full feature set Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 8
Simulated Aeali (1) Simulated Aeali is a stochastic optimizatio method that derives its ame from the aeali process used to re-crystallize metals Duri the aeali process i metals, the alloy is cooled dow slowly to allow its atoms to reach a cofiuratio of miimum eery (a perfectly reular crystal) If the alloy is aealed too fast, such a oraizatio caot propaate throuhout the material. The result will be a material with reios of reular structure separated by boudaries. These boudaries are potetial fault-lies where fractures are most likely to occur whe the material is stressed The laws of thermodyamics state that, at temperature T, the probability of a icrease i eery E i the system is ive by the expressio ( kt ( () P = e where k is kow as the Boltzma s costat The alorithm is a straihtforward implemetatio of these ideas Empty feature set 1. Determie a aeali schedule T(i) 2. Create a iitial solutio Y(0) 3. While T(i)>T MIN 3a.Geerate a ew solutio Y(i+1) which is a eihbor of Y(i) 3b.Compute E= - [ J(Y(i+1)) - J(Y(i)) ] 3b.If E<0 the always accept the move from Y(i) to Y(i+1) else accept the move with probability P=exp(- E/T(i)) Full feature set Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 9
Simulated Aeali (2) Simulated aeali is summarized with the followi idea [Hayki, 1999] Whe optimizi a very lare ad complex system (i.e., a system with may derees of freedom), istead of always oi dowhill, try to o dowhill most of the times The previous formulatio of the Simulated Aeali alorithm ca be used for ay type of miimizatio problem ad it oly requires specificatio of A trasform to eerate a local eihbor from the curret solutio (i.e. add a radom vector) For Feature Subset Selectio, the trasform will cosist of addi or removi features, typically implemeted as a radom mutatio with low probability A aeali schedule, typically T(i+1)=rT(i), with 0.0 r 1.0 A iitial temperature T(0) Selectio of the aeali schedule is critical If r is chose close too lare, the temperature decreases very slowly, allowi moves to hiher eery states to occur more frequetly. This results i slow coverece If r is chose too small, the temperature decreases very fast, ad the alorithm is likely to covere to a local miima Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 10
Simulated Aeali (3) A uique feature of simulated aeali is its adaptive ature At hih temperature the alorithm is oly looki at the ross features of the optimizatio surface, while at low temperatures, the fier details of the surface start to appear J(Y) Hih T Low T Y Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 11
Geetic Alorithms Geetic alorithms are optimizatio techiques that mimic the evolutioary process of survival of the fittest Starti with a iitial radom populatio of solutios, evolve ew populatios by mati (crossover) pairs of solutios ad mutati solutios accordi to their fitess (objective fuctio) The better solutios are more likely to be selected for the mati ad mutatio operatios ad therefore carry their eetic code from eeratio to eeratio For the problem of Feature Subset Selectio, idividual solutios are simply represeted with a biary umber (1 if the ive feature is selected, 0 otherwise), which is the oriial represetatio proposed by Hollad i 1974 Empty feature set Alorithm 1. Create a iitial radom populatio 2. Evaluate iitial populatio 2. Repeat util coverece (or a umber of eeratios) 2a.Select the fittest idividuals i the populatio 2b.Perform crossover o the selected idividuals to create offspri 2c. Perform mutatio o the selected idividuals 2d.Create the ew populatio from the old populatio ad the offspri 2e.Evaluate the ew populatio Full feature set Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 12
Geetic operators Sile-poit crossover Select two idividuals (parets) accordi to their fitess Select a crossover poit With probability P c (0.95 is reasoable) create two offspri by combii the parets Crossover poit selected radomly Paret i Paret j 01001010110 11010110000 Crossover 11011010110 01100110000 Offspri i Offspri j Biary mutatio Select a idividual accordi to its fitess With probability P M (0.01 is reasoable) mutate each oe of its bits Mutated bits Idividual 11010110000 Mutatio 11001010111 Offspri i Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 13
Selectio methods The selectio of idividuals is based o their fitess (the value of the objective fuctio) We will describe a selectio method called Geometric selectio Several other methods are available: Roulette Wheel, Touramet Selectio, etc. Geometric selectio The probability of selecti the r th best idividual is ive by the eometric probability mass fuctio q is the probability of selecti the best idividual (0.05 is a reasoable value) Therefore, the eometric distributio assis hiher probability to idividuals raked better, but also allows ufit idividuals to be selected I additio, it is typical to carry the best idividual of each populatio to the ext oe () r q( 1- q) r-1 P = This is called the Elitist Model Selectio probability 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 q=0.08 q=0.04 q=0.02 q=0.01 0 10 20 30 40 50 60 70 80 90 100 Idividual rak Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 14
GAs, parameter choices for Feature selectio The choice of crossover rate P C is ot critical You will wat a value close to 1.0 to have a lare umber of offspri The choice of mutatio rate P M is very critical A optimal choice of P M will allow the GA to explore the more promisi reios while avoidi etti trapped i local miima A lare value (i.e., P M >0.25) will ot allow the search to focus o the better reios, ad the GA will perform like radom search A small value (i.e., close to 0.0) will ot allow the search to escape local miima The choice of q, the probability of selecti the best idividual is also critical A optimal value of q will allow the GA to explore the most promisi solutio, ad at the same time provide sufficiet diversity to avoid early coverece of the alorithm I eeral, poorly selected cotrol parameters will result i sub-optimal solutios due to early coverece Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 15
Search Strateies, summary Accuracy Complexity Advataes Disadvataes Exhaustive Sequetial Radomized Always fids the optimal solutio Good if o backtracki eeded Good with proper cotrol parameters Expoetial Hih accuracy Hih complexity Quadratic O(N EX 2) Simple ad fast Geerally low Desied to escape local miima Caot backtrack Difficult to choose ood parameters A hihly recommeded review of the material preseted i these two lectures is Justi Doak A evaluatio of feature selectio methods ad their applicatio to Computer Security Uiversity of Califoria at Davis, Tech Report CSE-92-18 Itroductio to Patter Recoitio Ricardo Gutierrez-Osua Wriht State Uiversity 16