GARCON Genetic Algorithm for Rectangular Cuts OptimizatioN Salavat Abdullin FNAL Alexey Drozdetskiy 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 1
How to optimize cuts 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 2
GARCON Introduction program description and user s manual available in hep-ph/0605143 web-page: http://drozdets.home.cern.ch/drozdets/home/genetic/ A few recent examples of usage High Energy Physics analyses of Compact Muon Solenoid (CMS) collaboration (published in CMS Physics TDR v.2 (J. Phys. G: Nucl. Part. Phys. 34 (2007) 995-1579) and Public CMS Notes) HZZ4μ Standard Model Higgs boson search (CMS Note 2006/122) SUSY discovery potential (e.g. CMS Note 2006/134) Single-top events selection (CMS Note 2006/084) In short: Genetic algorithm (GA) is a set of algorithms inspired by concepts of natural selection with evolving individuals, which allowed to be created randomly, to mutate, inherit their qualities, etc. useful in extremum finding tasks with a large number of discrete solutions Typically High Energy Physics (HEP) analysis has quite a few selection criteria (cuts) to optimize for example a significance of an excess of the signal over background events. In such cases simple scan over multi-dimensional cuts space (especially when done on top of a scan over theoretical predictions parameters space like for SUSY e.g.) leads to CPU time demand varying from days to many years... 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 3
fast and efficient GARCON features automatically performs an optimization and results stability verification effectively trying ~10 50 cut set parameters/values permutations for millions of input events in hours time optimization output is transparent efficient cut variables optimal cut values more features many significance function available to user (S signal, B - background) starting from S1 = S / B, including e.g. Scl = 2( S + B)ln(1 + S / B) 2S and with possibility of a user defined optimization function user defined optimization precision additional user defined requirements minimum number of signal/background events to survive after final cuts variables/processes to be used for a particular optimization run number of optimizations inside one run to ensure that optimization converges/finds not just a local maximum(s), but a global one as well (in case of a complicated phase space) automatic verification of results stability 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 4
GARCON - genetic algorithm Definitions/algorithm individual a set of qualities to be optimized min/max cut values on kinematical variables in HEP analysis environment a set of requirement on evolutionary process quality function (significance) which shows how good cuts (individuals) are in separating signal from background community a number of individuals involved in evolution breeding of new individuals (combining best qualities, cuts, for new individuals from two parent ones) breeding with possible mutation (one of the algorithms that allows new individuals to try new parts of phase space, thus not restricting community evolution inside one local maximum of quality function) death of individuals (the better quality of an individual the longer it lives, produces new individuals, thus improving community as a whole) cataclysmic update starting community from scratch (another possibility for GA to look outside of a particular local maximum of quality function) the very best individual after a number of years of evolution is an output of GARCON optimized cut values for HEP analysis 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 5
Toy analysis msugra (constrained MSSM Supersymmetry model) generator level analysis one particular point in msugra parameter space chosen as a signal details in the hep-ph/0605143 characteristic qualities of SUSY events, following from a consideration of signal Feynman diagrams are: large MET (mainly due to massive stable invisible SUSY particles, LSP) and large jet ETs (due to heavy SUSY particles cascade decays) considered backgrounds QCD, W/Z+jets, double weak-boson production, tt statistics divided into two parts one for cuts optimization, another to verify stability of results optimization: 10 cut parameters (20 values) used for optimization 2.5% precision for each variable (40 possible values for each variable) ~0.5*10 6 events used in optimization with goal to separate signal events from background ones with best possible significance 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 6
Performance after ~50 years (~3.5 hours) close to the best result is already achieved thus effectively tried 40 20 permutations of possible cut values (20 variables to optimize, 40 possible values for each) on 0.5 millions of events 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 7
Optimized cut values example an example of a cut value chosen independently by eye and optimized by GARCON (for missing transverse energy variable) really discriminating variable an example of a cut value chosen independently by eye and optimized by GARCON (transverse energy of the leading jet) not really discriminating variable (compare distribution before and after other cuts) GARCON (optimized) before cuts after cuts by eye (not optimized) 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 8
Optimized vs. constant cuts An example of results with GARCON and classical approach in full simulation analysis done by two different universities Standard Model Higgs boson search in HZZ4μ channel CMS H4µ 2006 Gain in significance in probabilistic terms, e.g. m H =130 (at the edge of 5σ discovery for 30fb -1 integrated luminosity): significance: 5.1 vs. 6 σ 55% vs. 80% chances to observe greater than 5σ excess CMS Physics TDR, v2 (J. Phys. G: Nucl. Part. Phys. 34 (2007) 995-1579) 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 9
Stability of results An example of results with GARCON and classical approach in full simulation analysis done by two different universities Standard Model Higgs boson search in HZZ4μ channel stable results even with small statistics available for the analysis CMS Public Note 2006/122 Good example of complicated analysis, where not only many variables and values for cuts analyzed, but also many points for possible signal studied. 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 10
Summary We presented GARCON program, illustrated its functionality with a simple HEP analysis example much more complicated examples described for example in CMS Physics Technical Design Report, v2, 2006 CMS Public Notes The program automatically performs rectangular cuts optimization and verification for stability in a multidimensional phase space All-in-all it is simple yet powerful ready-to-use publicly available tool with flexible and transparent optimization and verification parameters setup 22-27 April '07 ACAT07, April '07. Alexey Drozdetskiy, University of Florida 11