Identification of the correct hard-scatter vertex at the Large Hadron Collider

Similar documents
CS145: INTRODUCTION TO DATA MINING

Lina Guzman, DIRECTV

Network Traffic Measurements and Analysis

CS249: ADVANCED DATA MINING

Performance Evaluation of Various Classification Algorithms

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

PARALLEL SELECTIVE SAMPLING USING RELEVANCE VECTOR MACHINE FOR IMBALANCE DATA M. Athitya Kumaraguru 1, Viji Vinod 2, N.

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

PROBLEM 4

Fraud Detection using Machine Learning

CS229 Final Project: Predicting Expected Response Times

CMS Conference Report

Naïve Bayes for text classification

Applying Supervised Learning

Predicting Popular Xbox games based on Search Queries of Users

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

5 Learning hypothesis classes (16 points)

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

Slides for Data Mining by I. H. Witten and E. Frank

Weka ( )

Charged Particle Reconstruction in HIC Detectors

Adding timing to the VELO

Comparison Study of Different Pattern Classifiers

Lecture 25: Review I

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Detecting Thoracic Diseases from Chest X-Ray Images Binit Topiwala, Mariam Alawadi, Hari Prasad { topbinit, malawadi, hprasad

Applying Improved Random Forest Explainability (RFEX 2.0) steps on synthetic data for variable features having a unimodal distribution

Machine Learning Techniques for Data Mining

Constrained Classification of Large Imbalanced Data

MIP Reconstruction Techniques and Minimum Spanning Tree Clustering

Multi-label classification using rule-based classifier systems

A Feature Selection Method to Handle Imbalanced Data in Text Classification

6.034 Quiz 2, Spring 2005

Primary Vertex Reconstruction at LHCb

CSC 411 Lecture 4: Ensembles I

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Supervised Learning Classification Algorithms Comparison

arxiv: v1 [hep-ex] 14 Oct 2018

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique

Fast pattern recognition with the ATLAS L1Track trigger for the HL-LHC

Pattern Recognition ( , RIT) Exercise 1 Solution

Deep Learning Photon Identification in a SuperGranular Calorimeter

A New Segment Building Algorithm for the Cathode Strip Chambers in the CMS Experiment

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Practice Exam Sample Solutions

Machine Learning. Supervised Learning. Manfred Huber

Evaluating Classifiers

8/3/2017. Contour Assessment for Quality Assurance and Data Mining. Objective. Outline. Tom Purdie, PhD, MCCPM

Classification and Detection in Images. D.A. Forsyth

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Precision Timing in High Pile-Up and Time-Based Vertex Reconstruction

Random Forest A. Fornaser

Machine Learning Classifiers and Boosting

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

ATLAS ITk Layout Design and Optimisation

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Recurrent Neural Nets II

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

Support Vector Machine with Restarting Genetic Algorithm for Classifying Imbalanced Data

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

A Learning Algorithm for Piecewise Linear Regression

The exam is closed book, closed notes except your one-page cheat sheet.

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

b-jet identification at High Level Trigger in CMS

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Contents. Preface to the Second Edition

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

The class imbalance problem

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

The Curse of Dimensionality

Deep Learning With Noise

Racing for unbalanced methods selection

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Particle Track Reconstruction with Deep Learning

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

The Data Mining Application Based on WEKA: Geographical Original of Music

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Cross-validation and the Bootstrap

COMPUTATIONAL INTELLIGENCE

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

Lab 9. Julia Janicki. Introduction

Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem

Using Machine Learning to Optimize Storage Systems

Classification Algorithms in Data Mining

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Ester Bernadó-Mansilla. Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Transcription:

Identification of the correct hard-scatter vertex at the Large Hadron Collider Pratik Kumar, Neel Mani Singh pratikk@stanford.edu, neelmani@stanford.edu Under the guidance of Prof. Ariel Schwartzman( sch@slac.stanford.edu ) I. INTRODUCTION The Large Hadron collider(lhc) is the largest and most powerful particle collider ever built. Every 25ns, bunches of protons collide at the highest energies, producing 100 megapixel pictures of collision events that may contain signatures of new subatomic particles or forces. One key challenge for the analysis of LHC events is pile-up i.e. multiple overlapping proton-proton collisions in a single event picture. Under high luminosity conditions, each event picture at the LHC will contain, on average, 200 overlapping interactions. Out of which 60 interactions are reconstructed on an average. Only one interaction is of interest. All others are noise that contaminate the event. One important problem for physics event reconstruction is the identification of the correct hard-scatter primary vertex among all the overlapping pile-up interactions on an every by event basis. Selecting the correct interaction is important to correctly reconstruct the full event characteristics. The total momentum of a particle can be decomposed in two directions: longitudinal to the beam direction (z direction), and transverse (x-y). Since the beams move along z, when protons collide, one cannot know what is the fraction of the momentum (or speed) that the constituent quarks/gluons of the protons will have along the z axis. So, the resulting particles can have any value of momentum along z. On the other hand, we know that the quark/gluons do not have any speed (momentum) in x-y as they only move along z. So, the energy is conserved in the transverse plane. One starts with 0 transverse momentum, and ends with 0 transverse momentum. So, at LHC, the magnitude of interest is the transverse momentum: p T. Energy is approximately equal to momentum for particles that travel close to the speed of light. At LHC one utilizes transverse momentum as the measure of the total energy of the collision. In this text we will be using momentum and energy of particle interchangeably. II. PROBLEM - INPUT AND OUTPUT Figure 1 [1] : An event with 25 interactions We propose to create a supervised machine learning algorithm to identify the correct primary vertex and distinguish it from all the overlapping pileup vertices. Our input to the problem will be attributes related to the proton-proton collision namely energies of the particles after collision, angles with the transverse plane at which particles go after collision, missing energy of a collision and sum of the energies of the particles created in a collision.

III. CURRENT VERTEX SELECTION ALGORITHM The current technique [6] for the identification of the primary vertex selects the vertex with the highest total energy. The total energy is computed as the scalar sum of all particle tracks associated to the vertex. This method has a very poor performance when the number of pileup interactions is large, selecting the wrong vertex 40% of the time. Figure 2: Vertex selection efficiency vs Pileup density of the current algorithm IV.DATA The data consist of computer simulated events of Higgs bosons. Each event picture consists of a list of vertices (60 on average) and each vertex consists of a list of particle tracks. Each track is represented by a direction in 3D space, an origin (given by the vertex it belongs to), and its energy. These are used to generate features for our classifier. V. FEATURES The information of all tracks in each vertex will be used to define features. We define the following features - sump T each vertex is made of many tracks (the colored lines in Figure 1). Each track is a particle. Each particle has a p T, which is the transverse momentum. It is basically the energy of the particle. This gives us the total scalar sum of track p T of the vertex. sump T w is the weighted sum [2] of track p 2. T MET is the missing transverse energy. This is the vector sum of track p T s of each vertex. In a pileup vertex, we expect energy to be conserved (in the transverse plane), so MET, on average, should be 0. For signal, since the Higgs boson decays to particles that escape the detector without being detected, there should be a significant imbalance of energy. So, MET should peak at a positive value. pt1, pt2, pt3 is the transverse momentum of top 3 energetic particle track. eta1, eta2, eta3 is the angle between top 3 energetic particle track and transverse (x-y) plane. Each vertex has a truth label that says if the vertex is the correct hard-scatter primary vertex or if it is a pile-up vertex. Class 1 or positive class denotes class for primary vertex whereas Class 0 or negative class deontes pile-up background vertex. VI.UNBALANCED DATA AND METRICS The data we have is inherently unbalanced because for each experiment 60 interaction, on an average, of particles are reconstructed and only one of those are point of interest. There are some techniques that can be used to handle the unbalanced data like - Undersampling randomly downsamples the majority class. For example, Tomek [3] links are pairs of instances of opposite classes who are their own nearest neighbors. In other words, they are pairs of opposing instances that are very close together. Tomek s algorithm looks for such pairs and removes the majority instance of the pair. Oversampling randomly replicates minority instances to increase their population. Another way of oversampling involves synthesizing new instances. The idea is to create new minority examples by interpolating between existing ones. For example, SMOTE [4] system creates new instances halfway between the instance and its neighbours.

Balanced Bagging [5] approach involves Bagging and Undersampling/Oversampling. The trick is to create balanced subset of training set for each model in the process of bagging. VII. METHODS We used two different methods of evaluation. In this section and going forward we will use notation for data points as. x (i) : i th data point i.e. i {1, 2,..., m} First method evaluates each vertex, i.e. x (i), and classifies it as primary vertex or pile-up background vertex one at time. This is a typical machine learning way of classifying input data point. We have used the following classification techniques: Logistic Regression(LR) is a type of regression where response variable is dependent on linear combination of predictor variable. The hypothesis function is T h θ (x) = σ(θ x) 1 where σ (x) = is the sigmoid function 1+ e x Balanced Bagging with Logistic Regression(BBLR) Bagging or Bootstrap aggregating is ensemble meta-algorithm used in statistical classification and regression. Given a training set X bagging creates k new training set X i by sampling from X uniformly with replacement. The k models are fitted using the above k bootstrap samples and some classification algorithm, combined by averaging the output (for regression) or voting (for classification). Balanced Bagging is a bagging technique which is same as bagging except in creating new training set from given training set. It creates bootstrap samples such that each class has approximately same number of data points. Artificial Neural Network(NN) is a model of computation which consist of layers of artificial neuron and connections among them. We have used the simplest one where neuron in one layer is connected only to the next one. Artificial neuron is a computation unit like a LR where each one has some input features and output is T defined by hypothesis function h θ (x) = g(θ x) where g (z) is activation function. We have used ReLU for hidden layer and sigmoid for output layer. Where ReLU is g (z) = max(0, z) Second method is more closer to the experiment. In the above method, we treated each of the vertex as a single data input. But for our original problem, we need to select a vertex from a group of vertices in an experiment. We consider a single experiment as a single data now. We evaluate our model on vertices per experiment and chose the vertex that gives the highest probability. E = {e (i) } set of experiments e (i) = {x (i) } set of vertex in i th experiment v (i) : vertex selected from i th experiment v (i) = argmax h (x) x e (i) θ VIII. METRICS The accuracy, precision and recall doesn t give correct estimate of the classifiers performance in case of unbalanced data because a small changes in classification of the majority class cause a big change in the precision and the accuracy. On the other hand, a big change in classification of the minority class does not have any major impact on accuracy and precision. We use weighted F1-score, weighted by support and AUC_ROC score to evaluate our classifier for the first method. We also used accuracy score based on vertex selection criteria. We calculate the fraction of experiment that picks the correct vertex. We also calculate this fraction as a function of pileup density.

IX.EXPERIMENTS AND RESULTS We started our experimentation with Logistic Regression with L2 regularization. The model achieved accuracy of 98.85%, precision of 82.65% and recall of 37.92%. The confusion matrix of this experiment revealed that most of the misclassification occurred in the positive class. Even though the accuracy is high, the precision and recall are low which is due to the class imbalance in our data. CONFUSION MATRIX LOGISTIC REGRESSION True Class 0 349366 465 True Class 1 3627 2215 Although the model was learning the negative class well but not the positive class. The data is unbalanced and logistic regression is not able to model the imbalanced data properly. We tried Bagging with Logistic Regression. Number models used is 50 and each one was trained on 0.02% of data. The model achieved accuracy of 86.63%, precision of 0.18% and recall of 1.3% CONFUSION MATRIX LOGISTIC REGRESSION(L2) True Class 0 308065 41766 True Class 1 5766 76 We tried Balanced Bagging with Logistic Regression and L2 regularization. Number models used is 50 and each one was trained on 0.02% of data. The model achieved accuracy of 93.17%, precision of 17% and recall of 81.41%. Here the precision is disproportionately low because there is a small misclassification in the negative class which is quite high compared to the total number of positive class samples. CONFUSION MATRIX BALANCED BAGGING LOGISTIC REGRESSION We also tried Neural Networks and the results were similar to BBLR. The NN we used had one hidden layer with 100 neuron and α(regularization param) 0.001. CONFUSION MATRIX NEURAL NETWORK True Class 0 329953 19878 True Class 1 1183 4659 Based on the above results, we see that the metrics like precision, accuracy does not work well for imbalanced data. Therefore we have used weighted F1-score and AUC_ROC score as described in the Metrics section. METRICS FOR OUR METHODS Models Metrics F1 (train) F1 (test) AUC_ROC (train) AUC_ROC (test) LR 98.62 98.63 68.90 68.89 BBLR 96.37 96.32 87.49 87.39 NN 95.79 95.82 87.17 87.03 Based on AUC_ROC score, we see that BBLR performs the best as it takes into account the unbalanced nature of the data. BBLR creates multiple model on balanced subset of data. NN also works similar to BBLR. LR does not have a good performance because of unbalanced nature of data. Figure 3: ROC Curve for BBLR True Class 0 326611 23220 True Class 1 1086 4756

Let us now move to the second method of evaluation i.e. vertex selection. Vertex Selection Efficiency is the fraction of experiment in which correct vertex was classified as primary vertex. COMPARISON FOR VERTEX SELECTION EFFICIENCY Models Metrics Vertex Selection Efficiency LR 40.41 BBLR 78.24 NN 63.00 Current Method 57.90 Based on the results, we see that LR performs worse than the current technique. BBLR and NN performs better than the current technique with BBLR having an edge over NN. We also see the variation of vertex selection efficiency with pileup density in Figure 4 below. The result are similar to the overall vertex selection efficiency. Figure 5: BBLR ROC Curve with subset feature X. FUTURE WORK We were expecting a better result with neural network because of its capability to learn complex nature of data. We would like to tune various parameters like regularization coefficient, number of hidden layers, neurons in the hidden layers, etc. We would also like explore other types of NN like Convolution NN, LSTM. We would like to explore more on feature selection to see how it perform with vertex selection algorithm. Figure 4: Comparison of Vertex selection efficiency vs Pileup density for different approach We tried various models like gaussian naive bayes, linear GDA, quadratic GDA, k-nearest neighbour classifiers. We did not get time to analyze each of them which we would like to do. Feature Selection: Since the total number of different combination of features are 512 and each one was not taking very long train we did exhaustive search on feature space for BBLR. We found that the subset of eta1, eta2, pt1, pt2 gave the best result. CONFUSION MATRIX BALANCED BAGGING LOGISTIC REGRESSION True Class 0 334380 15451 True Class 1 1072 4770

CONTRIBUTIONS Both of us worked together collaboratively most of the time. So there is no clear distinction in the contributions. REFERENCES [1] https://atlas.cern/sites/atlas-public.web.cern.ch/files/field/im age/2012_highpileup.png. [2] Taken from slides of Prof. Ariel Schwartzman. [3] Debashree Devi, Saroj kr. Biswas and Biswajit Purkayastha, Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance, 2016. [4] Kevin W. Bowyer, Nitesh V. Chawla, Lawrence O. Hall and W. Philip Kegelmeyer, SMOTE: Synthetic Minority Over-sampling Technique, CoRR, 2011. [5] https://svds.com/learning-imbalanced-classes/. [6] Strandlie, Are et al. Track and vertex reconstruction: From classical to adaptive methods Rev.Mod.Phys. 82 (2010) 1419-1458