Application of Neural Networks to b-quark jet detection in Z b b

Similar documents
PoS(IHEP-LHC-2011)002

PoS(ACAT08)101. An Overview of the b-tagging Algorithms in the CMS Offline Software. Christophe Saout

Theoretical Concepts of Machine Learning

Multi Layer Perceptron trained by Quasi Newton learning rule

Deep Learning Photon Identification in a SuperGranular Calorimeter

ATLAS NOTE ATLAS-CONF July 20, Commissioning of the ATLAS high-performance b-tagging algorithms in the 7 TeV collision data

Performance of the GlueX Detector Systems

Logistic Regression

Neural Networks (pp )

MIP Reconstruction Techniques and Minimum Spanning Tree Clustering

A Systematic Overview of Data Mining Algorithms

b-jet identification at High Level Trigger in CMS

Dijet A LL Dijet cross section 2006 Dijet A LL Preparation for Tai Sakuma MIT

A neural network that classifies glass either as window or non-window depending on the glass chemistry.

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Recapitulation on Transformations in Neural Network Back Propagation Algorithm

A New Segment Building Algorithm for the Cathode Strip Chambers in the CMS Experiment

Visual object classification by sparse convolutional neural networks

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Study of the Higgs boson coupling to the top quark and of the b jet identification with the ATLAS experiment at the Large Hadron Collider.

Spatial Variation of Sea-Level Sea level reconstruction

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

PoS(EPS-HEP2017)523. The CMS trigger in Run 2. Mia Tosi CERN

CMPT 882 Week 3 Summary

Optimisation Studies for the CLIC Vertex-Detector Geometry

PoS(High-pT physics09)036

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

Tracking POG Update. Tracking POG Meeting March 17, 2009

Efficient training algorithms for a class of shunting inhibitory convolutional neural networks

A Compensatory Wavelet Neuron Model

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms

Convex Optimization CMU-10725

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation

The Mathematics Behind Neural Networks

Introduction to Optimization

Neural Network Neurons

How to discover the Higgs Boson in an Oracle database. Maaike Limper

CP365 Artificial Intelligence

Notes on Multilayer, Feedforward Neural Networks

Performance of the ATLAS Inner Detector at the LHC

Tracking and Vertex reconstruction at LHCb for Run II

Image Compression: An Artificial Neural Network Approach

Deep Neural Networks Optimization

Calorimeter Object Status. A&S week, Feb M. Chefdeville, LAPP, Annecy

OPERA: A First ντ Appearance Candidate

Package FCNN4R. March 9, 2016

Jet algoritms Pavel Weber , KIP Jet Algorithms 1/23

arxiv:hep-ph/ v1 11 Mar 2002

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Precision Timing in High Pile-Up and Time-Based Vertex Reconstruction

Experimental Data and Training

Multi Layer Perceptron trained by Quasi Newton Algorithm

A new inclusive secondary vertex algorithm for b-jet tagging in ATLAS

Neural Network model for a biped robot

Monte Carlo programs

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Dept. of Computing Science & Math

Electron and Photon Reconstruction and Identification with the ATLAS Detector

Topics for the TKR Software Review Tracy Usher, Leon Rochester

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Primary Vertex Reconstruction at LHCb

NN-GVEIN: Neural Network-Based Modeling of Velocity Profile inside Graft-To-Vein Connection

Review on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network

A Neural Network Model Of Insurance Customer Ratings

Non-Profiled Deep Learning-Based Side-Channel Attacks

First LHCb measurement with data from the LHC Run 2

Early tube leak detection system for steam boiler at KEV power plant

Perceptron: This is convolution!

Solar Radiation Data Modeling with a Novel Surface Fitting Approach

University of Florida CISE department Gator Engineering. Clustering Part 2

ATLAS, CMS and LHCb Trigger systems for flavour physics

Early experience with the Run 2 ATLAS analysis model

A new inclusive secondary vertex algorithm for b- jet tagging in ATLAS

arxiv:physics/ v1 [physics.ins-det] 5 Mar 2007 Gate Arrays

CS6220: DATA MINING TECHNIQUES

Data Mining. Neural Networks

MULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS CHINMAY RANE. Presented to the Faculty of Graduate School of

b-tagging activities Aug 9, 2007

β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual

Data Reconstruction in Modern Particle Physics

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

A Brief Look at Optimization

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

APPLICATION OF A MULTI- LAYER PERCEPTRON FOR MASS VALUATION OF REAL ESTATES

Improving Generators Interface to Support LHEF V3 Format

An Integer Recurrent Artificial Neural Network for Classifying Feature Vectors

6.034 Quiz 2, Spring 2005

Linear Discriminant Functions: Gradient Descent and Perceptron Convergence

Deep (1) Matthieu Cord LIP6 / UPMC Paris 6

PoS(Baldin ISHEPP XXII)134

Dynamic Analysis of Structures Using Neural Networks

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

CHAPTER VI BACK PROPAGATION ALGORITHM

DAta Mining Exploration Project

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Alignment of the CMS Silicon Tracker

Transcription:

Application of Neural Networks to b-quark jet detection in Z b b Stephen Poprocki REU 25 Department of Physics, The College of Wooster, Wooster, Ohio 44691 Advisors: Gustaaf Brooijmans, Andy Haas Nevis Laboratories, Irvington, NY 1533 August 5, 25 Abstract A multilayer perceptron (MLP) artificial neural network (NN) was trained with Monte Carlo data to detect. A variety of NNs were tested to maximize performance. The best NN was run on data with different reconstruction options. It was found that a simple MLP NN with 6 variables and 6 hidden neurons performed better than using only a decay length significance cut for detecting. The NN was run on data from the DØ experiment to measure the mass of the Z boson from Z b b decays. Simple background subtraction techniques were used to yield a shifted Z peak at 73.3 ± 3.4 GeV. 1

1 Introduction The Standard Model (SM) has largely been successful in explaining almost all observed particle phenomena to date [1, 2]. However, the mechanism for electro-weak symmetry breaking is not yet understood. Many theories predict at least one Higgs field which gives mass to the W ± and Z weak bosons. The theories also predict one or more Higgs bosons. The most promising channels for Higgs boson detection at the Tevatron are p p W H lνb b, p p ZH l + l b b, and p p ZH ν νb b [3]. Due to the extremely small chance of detecting these events, improved b-jet detection is useful for possible Higgs discoveries. In this paper we focus on the Z b b channel of the DØ experiment. Since the mass of the Z boson is well known, measuring it from Z b b can provide a means of calibrating b-quark calorimeters. 2 Purity & Efficiency Only taggable MC jets were used to measure b-tagging efficiency and purity. That is, each jet must be matched with 2 tracks within R = ( η) 2 + ( φ) 2 < 1.. Further, we only use MC jets with at least 1 secondary vertex and assume vertex jets are non-. To correct for these vertex jets, we use the following formulas: efficiency = correctly tagged nv>, nv> + nv= purity = incorrectly tagged non- nv>. non-nv> + non-nv= Purity versus efficiency plots can be obtained by varying a cut on the. 3 Neural Networks The NNs were implemented in ROOT 4.4/2 with the TMultiLayerPerceptron class. A MLP is a simple feed-forward network. Figure 1 shows a simple model of a MLP NN. Each neuron is connected to all neurons on the layer below it. We utilize only one output neuron, namely, the probability of the jet having a b-quark. The output of a neuron is a linear combination of the previous neurons and their weights. The sigmoid function is used to normalize the output between and 1. Multiple hidden layers are possible but typically only complicate matters. 2

input values input layer weight matrix 1 hidden layer weight matrix 2 output layer output value 3.1 Training Figure 1: Simple multilayer perceptron neural network. Training is the process of minimizing the error of the. Out of a total of 263 767 jets, half were used for training and the other half for testing. The TMultiLayerPerceptron class implements six different training methods: 1. Stochastic minimization 2. Steepest descent with fixed step size (batch learning) 3. Steepest descent algorithm 4. Conjugate gradients with the Polak-Ribiere updating formula 5. Conjugate gradients with the Fletcher-Reeves updating formula 6. Broyden, Fletcher, Goldfarb, Shanno (BFGS) method The training methods adjust the weights based on the error in the output. One weight update is called an epoch, cycle, or iteration. In principle, too few epochs results in an under-trained NN and too many epochs results in an over-trained NN. An over-trained NN is not desirable because it has learned features that are specific to the training data. Figure 2 shows the errors as a function of the epoch for each training method. Figure 3 shows the signal and background outputs of the NN for the different training methods. Figure 4 shows the purity versus efficiency for the different training methods. The stochastic minimization method was chosen for its simplicity, speed, and slightly superior performance. 3

Error.5.45.4 iterations: 6 method: Stochastic Training sample Test sample Error.5.45.4 iterations: 6 method: Batch Training sample Test sample Error.5.45.4 iterations: 6 method: Steepest Descent Training sample Test sample.35.35.35.3.3.3.25.25.25.2 1 2 3 4 5 6 Epoch.2 1 2 3 4 5 6 Epoch.2 1 2 3 4 5 6 Epoch Error.5.45.4 iterations: 6 method: Ribiere Polak Training sample Test sample Error.5.45.4 iterations: 6 method: Fletcher Reeves Training sample Test sample Error.5.45.4 iterations: 6 method: BFGS Training sample Test sample.35.35.35.3.3.3.25.25.25.2 1 2 3 4 5 6 Epoch.2 1 2 3 4 5 6 Epoch.2 1 2 3 4 5 6 Epoch Figure 2: Training error for various training methods as a function of number of epochs. 3.2 Variable Sets Initially, the NNs were trained with many of combinations of variables. Then two variable sets were chosen for comparison: simple : 3D decay length significance (sdls3d) R (dr) mass 2D decay length (dl2d) χ 2 (chi2) multiplicity (mult) fancy : All of simple plus: number of vertices (nv) tertiary vertex: 3D decay length significance (sdls3d2) 4

iterations: 6 method: Stochastic Non iterations: 6 method: Batch Non iterations: 6 method: Steepest Descent Non 8 6 4 2 -.4 -.2.2.4.6.8 1 1.2 1.4 8 6 4 2 -.4 -.2.2.4.6.8 1 1.2 1.4 8 6 4 2 -.4 -.2.2.4.6.8 1 1.2 1.4 iterations: 6 method: Ribiere Polak Non iterations: 6 method: Fletcher Reeves Non iterations: 6 method: BFGS Non 8 6 4 2 -.4 -.2.2.4.6.8 1 1.2 1.4 8 6 4 2 -.4 -.2.2.4.6.8 1 1.2 1.4 8 6 4 2 -.4 -.2.2.4.6.8 1 1.2 1.4 Figure 3: Signal and background outputs of the NN for the different training methods. R (dr2) mass (mass2) χ 2 (chi22) multiplicity (mult2) The purpose is to see if adding the tertiary vertex variables yields an improvement. Figures 5 and 6 show no improved performance with the tertiary vertex variables. 3.3 Hidden Neurons The NNs were trained with varying number of hidden neurons. The result was that six hidden neurons for simple and 12 hidden neurons for fancy were best. That is, the same number of hidden neurons as variables. 5

Purity 1.99.98 Stochastic.97 Batch Steepest Descent.96 Ribiere Polak Fletcher Reeves BFGS.95.1.2.3.4.5.6.7 Efficiency Figure 4: Purity versus efficiency for the different training methods. 3.4 Reconstruction Options MC data from a total of five different reconstruction options were compared: All of Andy Haas options for reconstruction (cab3). Default secondary vertexing options such as 2D track impact parameter significance (cab_default_sv). Default trackjet options (cab_default-tj). Default primary vertex options (cab_noadapt). No merging of overlapping trackjets (cab_no-tj-merge). The result is that all but cab_default_sv had the same performance, and cab_default_sv had poor performance. Figure 7 compares their performances. 3.5 Neural Network Results For b-jet detection, NNs provide an improvement over cutting on 3D decay length significance (sdls3d) alone. This is because the NN is able to learn the significances of the minor variables. Figure 8 compares the NN with the 3D decay length significance cut. 6

simple fancy iterations: 4 method: Stochastic Non iterations: 4 method: Stochastic Non 8 8 6 6 4 4 2 2 -.4 -.2.2.4.6.8 1 1.2 1.4 -.4 -.2.2.4.6.8 1 1.2 1.4 Figure 5: Signal and background outputs of the NN for the different variable sets. 4 Z b b Analysis Using the neural network technique as previously described, the invariant mass of the Z boson was obtained from Z b b events from approximately 1 pb 1 of Run II data with the cab3 reconstruction options. The cut for tagging was set to yield an efficiency of 5% and purity of 99% in the MC data. 4.1 Background Subtraction For this analysis, an understanding of the background in the double b-tagged data is essential. The background is essentially composed of heavy-flavor dijet production and mistagged gluon/light-quark jet production, and these events cannot be accurately simulated with current techniques [4]. If the background shape can be accurately derived, it can simply be subtracted from the data. The method used here is similar to [4] and makes use of a tag-rate function (TRF) to estimate the amount of double b-tagged background. Figure 9 shows the TRF derived from the single-tagged data. The TRF is then used to estimate the double-tagged background (Fig. 1). This estimated background is then subtracted from the double-tagged data to yield Figure 11. 7

Purity 1.99.98.97.96.95 simple fancy.1.2.3.4.5.6.7 Efficiency Figure 6: Purity versus efficiency for the different variable sets. 4.2 1 Correction There is one important correction we have made. The double-tagged data has more heavyflavor jets than the single-tagged data which the TRF is applied to. This difference in heavyflavor jets is called the 1 shift. The shift is measured using untagged data compared to the single-tagged data. This 1 correction is then subtracted from the expected double-tagged background. Figure 12 compares the uncorrected and 1 corrected Z peaks. 5 Conclusion We have shown that NNs yield better b-jet detection performance than a decay length significance cut alone. It was also found that tertiary vertex NN variables don t yield an improvement, but the MC reconstruction options can make a difference. We note that there is much room for improvement for the background subtraction technique. Additional background correction techniques can be utilized as in [4]. Unfortunately, due to the short time-span of this REU, these correction techniques were not studied. 8

Purity 1.99.98.97.96.95 cab3 cab_default_sv cab_default-tj cab_noadapt cab_no-tj-merge.1.2.3.4.5.6.7 Efficiency References Figure 7: Purity versus efficiency for the different reconstruction options. [1] J. Erler and P. Langacker, in Proceedings of the 5th International WEIN Symposium: A Conference on Physics Beyond the Standard Model, Santa Fe, NM, 1998; e-print hep-ph/989352. [2] G. Altarelli, CERN report, CERN-TH/97-278; e-print hep-ph/971434. [3] P. C. Bhat, R. Gilmartin, H. B. Prosper, A Strategy for Discovering a Low-Mass Higgs Boson at the Tevatron, e-print hep-ph/1152. [4] A. Jenkins, P. Jonsson, G. Davies, A. Haas, Observation of the Z b b Decay at DØ, DØ note in preparation. 9

Purity 1.99.98.97.96 Neural Network Cut Method.95.1.2.3.4.5.6.7 Efficiency Figure 8: Purity versus efficiency for the neural network and the decay length significance cut method..16.14.12 M1_1tags Entries 53599 Mean 22.7 RMS 16.1.1.8.6.4.2 5 1 15 2 25 3 35 4 m 1 (GeV) Figure 9: The TRF derived from the single b-tagged data used to estimate the double-tagged background. 1

4 35 3 M1_2tags Entries 44257 Mean 89.67 RMS 37.86 25 2 15 1 5 5 1 15 2 25 3 35 4 m 1 (GeV) Figure 1: Comparison between double b-tagged data (points) and the expected background before any background corrections. The distribution from the MC is shown for comparison. 3 2 M1_2tags Entries -491652 Mean 12 RMS 71.96 χ 2 / ndf 15.35 / 13 Prob.2861 Constant 211.1 ± 43. Mean 61.63 ± 2.96 Sigma 9.978 ± 2.179 1-1 5 1 15 2 25 3 35 4 m 1 (GeV) Figure 11: The Z b b peak derived from the data after no corrections. The distribution shape from the MC is shown for comparison. 11

25 2 15 1 M1_1tags Entries -2591179 Mean 134.8 RMS 77.45 χ 2 / ndf 14.44 / 13 Prob.3438 Constant 177.7 ± 43.4 Mean 73.26 ± 3.37 Sigma 9.997 ± 2.23 5-5 -1 5 1 15 2 25 3 35 4 m 1 (GeV) Figure 12: The Z b b peak derived from the data after the 1 correction. distribution shape from the MC is shown for comparison. The 12