Machine Learning Software ROOT/TMVA

Similar documents
Practical Statistics for Particle Physics Analyses: Introduction to Computing Examples

Development of Machine Learning Tools in ROOT

A very different type of Ansatz - Decision Trees

Jupyter and TMVA. Attila Bagoly (Eötvös Loránd University, Hungary) Mentors: Sergei V. Gleyzer Enric Tejedor Saavedra

Tutorial on Machine Learning Tools

Simple ML Tutorial. Mike Williams MIT June 16, 2017

Deep Learning Photon Identification in a SuperGranular Calorimeter

Statistical Methods for Data Analysis. Multivariate discriminators with TMVA

Homework 01 : Deep learning Tutorial

Data Science Bootcamp Curriculum. NYC Data Science Academy

ML 프로그래밍 ( 보충 ) Scikit-Learn

Tutorials (M. Biehl)

Analytics Platform for ATLAS Computing Services

Frameworks in Python for Numeric Computation / ML

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Supervised vs unsupervised clustering

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

CERN openlab & IBM Research Workshop Trip Report

Facial Expression Classification with Random Filters Feature Extraction

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

Summary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

Multivariate Data Analysis and Machine Learning in High Energy Physics (V)

arxiv: v2 [hep-ex] 14 Nov 2018

TOOLS FOR DATA ANALYSIS INVOLVING

Lecture 25: Review I

Python Certification Training

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Deep Learning Frameworks. COSC 7336: Advanced Natural Language Processing Fall 2017

Python With Data Science

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Data analysis case study using R for readily available data set using any one machine learning Algorithm

7 Techniques for Data Dimensionality Reduction

Performance Measures

Review: The best frameworks for machine learning and deep learning

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

BAYESIAN GLOBAL OPTIMIZATION

Facilitating Collaborative Analysis in SWAN

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

Evaluating Classifiers

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Data Analytics and Machine Learning: From Node to Cluster

Otto Group Product Classification Challenge

SCIENCE. An Introduction to Python Brief History Why Python Where to use

DI TRANSFORM. The regressive analyses. identify relationships

Convex and Distributed Optimization. Thomas Ropars

Keras: Handwritten Digit Recognition using MNIST Dataset

Matrix Computations and " Neural Networks in Spark

Community edition(open-source) Enterprise edition

Intel Distribution for Python* и Intel Performance Libraries

Contents. Preface to the Second Edition

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Context Change and Versatile Models in Machine Learning

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Scaled Machine Learning at Matroid

Big Data Analytics Tools. Applied to ATLAS Event Data

Machine Learning in the Process Industry. Anders Hedlund Analytics Specialist

Data Mining With Weka A Short Tutorial

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

S8873 GBM INFERENCING ON GPU. Shankara Rao Thejaswi Nanditale, Vinay Deshpande

Intel Distribution For Python*

Louis Fourrier Fabien Gaie Thomas Rolf

DUE By 11:59 PM on Thursday March 15 via make turnitin on acad. The standard 10% per day deduction for late assignments applies.

Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models

Keras: Handwritten Digit Recognition using MNIST Dataset

ECE 5470 Classification, Machine Learning, and Neural Network Review

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016.

A performance comparison of Deep Learning frameworks on KNL

Using Existing Numerical Libraries on Spark

Image Classification pipeline. Lecture 2-1

National College of Ireland Project Submission Sheet 2015/2016 School of Computing

Fast or furious? - User analysis of SF Express Inc

Kernel Methods and Visualization for Interval Data Mining

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Subject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.

Business Club. Decision Trees

Machine Learning for (fast) simulation

DETECTING RESOLVERS AT.NZ. Jing Qiao, Sebastian Castro DNS-OARC 29 Amsterdam, October 2018

Machine Learning with MATLAB --classification

SD 372 Pattern Recognition

Deep Learning Frameworks with Spark and GPUs

Network Traffic Measurements and Analysis

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Predicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Evaluating Classifiers

DECISION SUPPORT SYSTEM USING TENSOR FLOW

Belle II - Git migration

Applied Machine Learning

Transcription:

Machine Learning Software ROOT/TMVA LIP Data Science School / 12-14 March 2018

ROOT ROOT is a software toolkit which provides building blocks for: Data processing Data analysis Data visualisation Data storage ROOT is written mainly in C++ (C++11 standard) Bindings for Python are provided. http://root.cern.ch Adopted in High Energy Physics and other sciences (but also industry) ~250 PetaBytes of data in ROOT format on the LHC Computing Grid Fits and parameters estimations for discoveries (e.g. the Higgs) Thousands of ROOT plots in scientific publications 59

TMVA ROOT Machine Learning tools are provided in the package TMVA (Toolkit for MultiVariate Analysis) Provides a set of algorithms for standard HEP usage Used in LHC experiment production and in several analysis (e.g. Higgs studies) Easy interface for beginners, powerful for experts Several active contributors and several features added recently (e.g. deep learning) 60

TMVA TMVA is not only a collection of multi-variate methods. It is a common interface to different methods common interface for classification and regression easy training and testing of different methods on the same dataset consistent evaluation and comparison same data pre-processing several tools provided for pre-processing embedded in ROOT complete and understandable users guide 61

TMVA Methods The available methods are: Rectangular cut optimisation Projective likelihood estimation (PDE approach) Multidimensional probability density estimation (PDE - rangesearch approach) Multidimensional k-nearest neighbour classifier Linear discriminant analysis (H-Matrix and Fisher discriminants) Function discriminant analysis (FDA) Artificial neural networks (various implementations) Boosted/Bagged decision trees Predictive learning via rule ensembles (RuleFit) Support Vector Machine (SVM) 62

New Features New features added since 2016: Deep Learning support for parallel training on CPU and GPU (with CUDA and OpenCL) Cross Validation and Hyper-parameter optimisation Improved loss functions for regression Interactive training and visualization for Jupyter notebooks new pre-processing features (variance threshold) 63

Using TMVA 64

Workflow in TMVA Reading input data Select input features and preprocessing Training find optimal classification or regression parameters using data with known labels (e.g. signal and background MC events) Testing evaluate performance of the classifier in an independent test sample compare different methods Application apply classifier/regressor to real data where labels are not known 65

TMVA Custumizations and Features TMVA supports: ROOT Tree input data (or ASCII, e.g. csv) HSF support might come soon pre-selection cuts on input data event weights (negative weights for some methods) various method for splitting training/test samples k-fold cross-validation support variable importance hyper-parameter optimisations 66

TMVA Session We will see better with a real example (e.g. TMVAClassification.C tutorial) [E. v. Toerne] 67

TMVA Toy Example 4 Gaussian variable with linear correlations {x 1 = v 1 + v 2, x 2 = v 1 v 2, x 3 = v 3, x 4 = v 4 } where {v 1,..v 4 } are normal variables 68

Pre-processing of the Input Variables Example: decorrelation of variable before training can be useful Several others pre-processing available (see Users Guide) 69

Available Preprocessing This is the list of available pre-processing in TMVA Normalization Decorrelation (using Cholesky decomposition) Principal Component Analysis Uniformization Gaussianization 70

TMVA GUI At the end of training + test phase TMVA produces an output file that can be examined with a special GUI (TMVAGui) 71

ROC Curve in TMVA For example from GUI one can obtain a ROC curve for each method trained and tested on an independent data set ROC Curve Comparison of several methods 72

TMVA Regression GUI A dedicated GUI exists for regression (TMVARegGui) 73

Regression in TMVA New Regression Features: Loss function Huber (default) Least Squares Absolute Deviation Custom Function Higher is better Important for regression performance 74

Cross Validation in TMVA TMVA supports k-fold cross-validation Hyper-parameter tuning find optimised parameters (BDT-SVM) Providing support for parallel execution multi-process/multi-threads and on a cluster using Spark or MPI 75

TMVA Interfaces External tools are available as additional methods in TMVA and they can be trained and evaluated as any other internal ones. RMVA: Interface to Machine Learning methods in R c50, xgboost, RSNNS, e1071 see http://oproject.org/rmva PYMVA: Python Interface skikit-learn with RandomForest, Gradiend Tree Boost, Ada Boost) see http://oproject.org/pymva Keras (Theano + Tensorflow) support model definition in Python see https://indico.cern.ch/event/565647/contributions/2308668/attachments/1345527/2028480/29sep2016_iml_keras.pdf Input data are copied internally from TMVA to Numpy array 76

Jupyter Integration New Python package for using TMVA in Jupyter notebook (jsmva) Improved Python API for TMVA functions Visualisation of BDT and DNN Enhanced output and plots (e.g. ROC plots) Improved interactivity (e.g. pause/resume/stop of training) see example in SWAN gallery https://swan.web.cern.ch/content/machine-learning L. Moneta Data Science School in (Astro) Particle Physics, Lisbon, 12-14 March 2018 77

TMVA Tutorial

TMVA Tutorial Run tutorial on notebook use SWAN go to swan.cern.ch or running local notebooks root notebook If you don t have CERN account for using SWAN please contact me Some temporary account can be made available L. Moneta LIP Data Science School, 12-14 March 2018 79

Starting SWAN click here to start L. Moneta LIP Data Science School, 12-14 March 2018 80

Starting a Terminal in SWAN After login cernbox home directory will be visible Start a terminal window L. Moneta LIP Data Science School, 12-14 March 2018 81

Getting the Notebooks Clone the git repository clone directly Lisbon-tutorial-2018 branch git clone branch Lisbon-tutorial-2018 https://github.com/lmoneta/tmva-tutorial.git Go back to Home page and select the directory tmva_tutorial/tutorial_lisbon Start using the notebooks L. Moneta LIP Data Science School, 12-14 March 2018 82

Notebooks 1. 3. 2. L. Moneta LIP Data Science School, 12-14 March 2018 83

TMVA Classification L. Moneta LIP Data Science School, 12-14 March 2018 84