MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Similar documents
Random Forest A. Fornaser

Classification Algorithms in Data Mining

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Business Club. Decision Trees

Artificial Intelligence. Programming Styles

CS 229 Midterm Review

An introduction to random forests

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Evaluating Classifiers

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Network Traffic Measurements and Analysis

Evaluating Classifiers

8. Tree-based approaches

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Contents. Preface to the Second Edition

Ensemble Methods, Decision Trees

WRAPPER feature selection method with SIPINA and R (RWeka package). Comparison with a FILTER approach implemented into TANAGRA.

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Introduction to Automated Text Analysis. bit.ly/poir599

7. Boosting and Bagging Bagging

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

Supervised vs unsupervised clustering

Data Mining and Knowledge Discovery Practice notes 2

INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN

Lecture 25: Review I

Machine Learning Models for Pattern Classification. Comp 473/6731

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Performance Evaluation of Various Classification Algorithms

Fast or furious? - User analysis of SF Express Inc

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Cyber attack detection using decision tree approach

CSE4334/5334 DATA MINING

A Systematic Overview of Data Mining Algorithms

Data Mining and Knowledge Discovery: Practice Notes

Information Management course

Introduction to Machine Learning

Supervised Learning Classification Algorithms Comparison

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Data Mining Lecture 8: Decision Trees

Tree-based methods for classification and regression

Trade-offs in Explanatory

Classification with Decision Tree Induction

Oliver Dürr. Statistisches Data Mining (StDM) Woche 12. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Classification with PAM and Random Forest

International Journal of Software and Web Sciences (IJSWS)

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

CS4491/CS 7265 BIG DATA ANALYTICS

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

Lecture 5: Decision Trees (Part II)

Random Forest Classification and Attribute Selection Program rfc3d

Advanced and Predictive Analytics with JMP 12 PRO. JMP User Meeting 9. Juni Schwalbach

Supervised Learning for Image Segmentation

CS249: ADVANCED DATA MINING

Tutorials Case studies

Classification. Instructor: Wei Ding

Allstate Insurance Claims Severity: A Machine Learning Approach

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Classification and Regression Trees

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Nearest neighbor classification DSE 220

CS145: INTRODUCTION TO DATA MINING

Machine Learning in Biology

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

Machine Learning Lecture 11

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Lecture 7: Decision Trees

Classification and Regression Trees

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

Applying Supervised Learning

Logical Rhythm - Class 3. August 27, 2018

COMP 465: Data Mining Classification Basics

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

Data Mining and Knowledge Discovery: Practice Notes

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

Knowledge Discovery and Data Mining

Classification. Instructor: Wei Ding

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Topics in Machine Learning

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Didacticiel - Études de cas. Comparison of the implementation of the CART algorithm under Tanagra and R (rpart package).

Part I. Instructor: Wei Ding

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Fraud Detection using Machine Learning

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation

BUILDING A TRAINING SET FOR AN AUTOMATIC (LSST) LIGHT CURVE CLASSIFIER

Ensemble Methods: Bagging

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Interpretable Machine Learning with Applications to Banking

Transcription:

Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis MIT Samberg Center Cambridge, MA, USA May 30 th June 2 nd, 2017 1

the successful avoidance of disruption events is extremely relevant for fusion devices and in particular for ITER the problem complexity has motivated the recent efforts for development of datadriven predictors and Machine Learning studies to successfully predict disruption events with sufficient warning time SQL database was created, gathering time series of relevant plasma parameters: similar tables available on DIII-D, EAST and C-Mod cross-device analysis exploratory studies to gain further insight on the DIII-D dataset before addressing the problem of a disruption-warning algorithm binary and multi-label classification analysis through Machine Learning algorithms variable importance ranking accuracy metrics and comparison between different models 2

we chose a subset of features and samples for ML applications to the DIII-D database for disruption prediction 10 features out of ~40 available parameters cfr. Granetz talk for a full db description mainly dimensionless or machine-independent parameters li Ip_error_fraction Vloop q95 radiated_fraction n1amp beta_p dwmhd_dt n/ng Te_HWHM focus on flattop disruptions: 195 flattop disruptions complemented by an analogous number of discharges that did not disrupt, possibly extracted from the same experiments (similar operational space) 394 discharges ~70,000 samples for each of the 10 chosen input variables reliable knowledge base capable of highlighting the underlying physics : validated signals and EFIT reconstructions avoided intentional disruptions avoided hardware-related disruptions by specifically checking for feedback control on plasma current or UFOs events 3

a plethora of ML algorithms is available in literature already tested on other devices for disruption prediction ML supervised and unsupervised algorithms, mainly developed at JET and also studied in real-time environment: Artificial Neural Networks[1], multi-tiered Support Vector Machines[2], Manifolds and Generative Topographic Maps[3] [1] B. Cannas et al. Nuclear Fusion 44 (2004) 68-76 [2] J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B. Cannas et al. Nuclear Fusion 57 (2013) 093023 4

a plethora of ML algorithms is available in literature already tested on other devices for disruption prediction ML supervised and unsupervised algorithms, mainly developed at JET and also studied in real-time environment: Artificial Neural Networks[1], multi-tiered Support Vector Machines[2], Manifolds and Generative Topographic Maps[3] [1] B. Cannas et al. Nuclear Fusion 44 (2004) 68-76 [2] J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B. Cannas et al. Nuclear Fusion 57 (2013) 093023 to better understand the dataset and gain further insights : white box approach inner components and logic are available for inspection importance of individual features can be determined Random Forests[4]: a large collection of randomized de-correlated decision trees that can be used to solve both classification and regression problems [4] L. Breiman, Random Forests, Machine Learning, 45(1), 5-32, 2001 5

decision trees are hierarchical data structures implementing the divide-and-conquer strategy: 2D classification example CART (Classification and Regression Trees) algorithms repeatedly partition the input space, implementing a test function at each split (node), to build a tree whose nodes are as pure as possible 2 features (x 1, x 2 ) and 2 classes (red, blue) the algorithm selects a splitting value to partition the dataset, by minimizing an impurity measure: E. Alpaydin, Introduction to Machine Learning, 2nd edition, MIT Press 6

decision trees are hierarchical data structures implementing the divide-and-conquer strategy: 2D classification example CART (Classification and Regression Trees) algorithms repeatedly partition the input space, implementing a test function at each split (node), to build a tree whose nodes are as pure as possible 2 features (x 1, x 2 ) and 2 classes (red, blue) R 1 R 2 the algorithm selects a splitting value to partition the dataset, by minimizing an impurity measure: a set of rules is defined and can be visualized as a tree structure R 3 E. Alpaydin, Introduction to Machine Learning, 2nd edition, MIT Press 7 all the subdivided regions are pure red or blue data

decision trees are hierarchical data structures implementing the divide-and-conquer strategy: 2D classification example CART (Classification and Regression Trees) algorithms repeatedly partition the input space, implementing a test function at each split (node), to build a tree whose nodes are as pure as possible 2 features (x 1, x 2 ) and 2 classes (red, blue) R 1 R 2 the algorithm selects a splitting value to partition the dataset, by minimizing an impurity measure: a set of rules is defined and can be visualized as a tree structure -0.55 R 3 E. Alpaydin, Introduction to Machine Learning, 2nd edition, MIT Press 0.3 8 all the subdivided regions are pure red or blue data

decision trees are hierarchical data structures implementing the divide-and-conquer strategy: 2D classification example CART (Classification and Regression Trees) algorithms repeatedly partition the input space, implementing a test function at each split (node), to build a tree whose nodes are as pure as possible 2 features (x 1, x 2 ) and 2 classes (red, blue) R 1 R 2? the algorithm selects a splitting value to partition the dataset, by minimizing an impurity measure: this set of rules is used to classify a new, unseen (test) sample -0.55 R 3 E. Alpaydin, Introduction to Machine Learning, 2nd edition, MIT Press 0.3 9 all the subdivided regions are pure red or blue data

recursive binary trees have a key feature: interpretability, but they tend to overfit pruning needed decision trees have advantages and limitations, as well as other ML algorithms 10

recursive binary trees have a key feature: interpretability, but they tend to overfit pruning needed decision trees have advantages and limitations, as well as other ML algorithms for classification purposes: RF Tree Neural Nets SVMs no overfitting intrinsic feature selection and robustness to outliers parameter tuning non-parametric models (no a-priori assumptions) interpretability natural handling of mixed type data prediction accuracy time handling considerations valid in case of classification purposes only Random Forests seems a promising algorithm 11 good fair poor

Random Forests is an ensemble method, leveraging bootstrapping and averaging techniques main steps of the algorithm: grow many iid trees (e.g., 500 trees) on bootstrap samples of the original training set random sampling with replacement from the training set minimize bias by fully growing trees (no pruning needed) reduce variance of noisy but unbiased (fully grown) trees by averaging (regression) or majority voting (classification) for the final decisions on test samples the in-between-trees correlation minimization is obtained through the maximization of variance reduction: bootstrap data for each tree random sub-selection of input variables at each split best split chosen by minimizing an impurity measure (as for CART models) 12

binary classification problem: disrupted/non-disrupted -0.55 (x 1, x 2 ) li Ip_error_fraction Vloop q95 radiated_fraction n1amp beta_p dwmhd_dt 0.3 from a 2D space with few points to a 10D space with many thousands of points n/ng Te_HWHM 13

binary classification problem: disrupted/non-disrupted graphical depiction of a single tree in a Random Forests 0.3-0.55 (x 1, x 2 ) from a 2D space with few points to a 10D space with many thousands of points li Ip_error_fraction Vloop q95 radiated_fraction n1amp beta_p n/ng dwmhd_dt Te_HWHM 10 variables ~ 70,000 time slices 394 discharges 195 flattop disruptions 500 decision trees 14

graphical depiction of a single tree in a Random Forests in a classification scheme disrupted/non-disrupted first three layers of a tree features are not scaled a-priori at each node, random subselection of features impurity minimization (Gini index) by choosing the best split fully grown tree 15

binary classification problem: disrupted/non-disrupted no time dependency leaf nodes: complete separation of the training samples in the two classes original dataset split in training and test subsets 500 trees built on training data to define a set of rules 16 set of rules is used to decide if new, unseen samples in our test set, belong to one of the two classes

how to classify a new sample belonging to the test subset with a Random Forests and how to assess the classifier s accuracy new observation new observation new observation for all the trees in the forest source: O. Dürr, ZHAW School of Engineering final class will be chosen through majority vote or by averaging the probabilities 17

confusion matrix is used as an accuracy metrics to assess the model s capability to discriminate between class labels binary classification - no time dependency 15ms black window before disruption event non-disrupted disrupted 18

confusion matrix is used as an accuracy metrics to assess the model s capability to discriminate between class labels binary classification - no time dependency 15ms black window before disruption event successfully predicted non-disrupted samples non-disrupted Confusion matrix 94.2% 5.8% false positive alarms non-disrupted 19 disrupted confusion matrix percentages are normalized with respect to each true class label True label disrupted missed detections (false negatives) 5.0% 95.0% non-disrupted Predicted label disrupted successfully predicted disrupted samples

relative importance ranking extracted from the Random Forests binary classification case no time dependency Te_HWHM ip_error_frac dwmhd_dt prad_frac relative variable importance wrt label predictability is defined as mean decrease impurity possibility to implement the permutation importance metric q95 is the relatively most important variable q95 betap n/ng li Vloop n1amp 20 0. 00 0. 06 0. 12 0. 18 0. 24 0. 30 Relative Importance

relative importance ranking extracted from the Random Forests binary classification case no time dependency relative variable importance wrt label predictability is defined as mean decrease impurity possibility to implement the permutation importance metric q95 is the relatively most important variable q95 betap n/ng blue: non-disrupted discharges flattop only red: disruptions during flattop li Te_HWHM ip_error_frac Vloop dwmhd_dt prad_frac n1amp 21 0. 00 0. 06 0. 12 0. 18 0. 24 0. 30 Relative Importance

in the multi-label classification, the time dependency is introduced through the definition of different class labels multi-label classification non-disrupted non-disrupted Confusion matrix 91.6% 6.5% 1.9% far from disr close to disr True label far from disr 18.3% 81.4% 0.3% close to disr non-disrupted : time slices for non disruptive shots far from disr : time_until_disrupt > 1s close to disr : time_until_disrupt < 0.1s 22 2.6% 0.1% 97.3% non-disrupted far from disr Predicted label close to disr

relative importance ranking extracted from the Random Forests multi-label classification case class labels have an induced time dependency for the human eye similar results with respect to the binary classification case q95 n/ng betap li Te_HWHM dwmhd_dt ip_error_frac Vloop prad_frac n1amp 23 0. 00 0. 05 0. 10 0. 15 0. 20 0. 25 Relative Importance

ROCs illustrate relative tradeoff between benefits (true positives) and costs (false positives) of binary classifiers RF (blue) maximizes the AUC (Area Under the Curve) if compared to Multi-Layer Perceptron (MLP) or Support Vector Machine (SVM) RF catches a higher number of correct classification with respect to a lower number of false positives 1. 0 ROC curve 1. 00 ROC curve (zoomed in at top left) 0. 8 0. 96 True positive rate 0. 6 0. 4 random guess True positive rate 0. 92 0. 88 24 0. 2 RF AUC 0.98, model score 0.94 MLP AUC 0.95, model score 0.89 SVM AUC 0.93, model score 0.85 0. 0 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 False positive rate 0. 84 RF AUC 0.98, model score 0.94 MLP AUC 0.95, model score 0.89 SVM AUC 0.93, model score 0.85 0. 80 0. 00 0. 06 0. 12 0. 18 0. 24 0. 30 False positive rate

summary and conclusions ML classification gives promising results, with optimal performances (Random Forests) and possibility of gaining further insights on the dataset better features and better definition of the target variable to address the disruptionproximity problem: how much time until the discharge is going to disrupt evaluate different algorithms that could best perform in case of regression problems real-time integration with the PCS dimensionless and machine-independent features enable cross-device analysis: comparison with EAST and C-Mod data 25