Tutorial on Machine Learning Tools

Similar documents
WEKA homepage.

Keras: Handwritten Digit Recognition using MNIST Dataset

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Keras: Handwritten Digit Recognition using MNIST Dataset

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

ECE 5470 Classification, Machine Learning, and Neural Network Review

Data Mining With Weka A Short Tutorial

Facial Expression Classification with Random Filters Feature Extraction

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Data Mining: STATISTICA

Machine Learning Software ROOT/TMVA

Classifying Depositional Environments in Satellite Images

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

Deep Learning and Its Applications

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

MoonRiver: Deep Neural Network in C++

Tutorials (M. Biehl)

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Hand Written Digit Recognition Using Tensorflow and Python

Polytechnic University of Tirana

Evaluating Classifiers

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Machine Learning in Action

scikit-learn (Machine Learning in Python)

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

Summary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

Vulnerability of machine learning models to adversarial examples

Lecture 25: Review I

Classification using Weka (Brain, Computation, and Neural Learning)

WEKA Explorer User Guide for Version 3-4

CNN Basics. Chongruo Wu

WEKA KnowledgeFlow Tutorial for Version 3-5-6

Advanced Machine Learning

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Scaled Machine Learning at Matroid

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Data Science Bootcamp Curriculum. NYC Data Science Academy

Convolutional Neural Networks (CNN)

The Explorer. chapter Getting started

Deep Model Compression

The exam is closed book, closed notes except your one-page cheat sheet.

Network Traffic Measurements and Analysis

Kaggle Data Science Bowl 2017 Technical Report

Computing a Gain Chart. Comparing the computation time of data mining tools on a large dataset under Linux.

Sensor-based Semantic-level Human Activity Recognition using Temporal Classification

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Weka ( )

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Deep Learning with Tensorflow AlexNet

Dynamic Routing Between Capsules

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Short instructions on using Weka

Classification of objects from Video Data (Group 30)

What s New in MATLAB May 16, 2017

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

Machine Learning 13. week

1 Introduction. 3 Data Preprocessing. 2 Literature Review

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

Recurrent Neural Nets II

Character Recognition from Google Street View Images

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

A Quick Guide on Training a neural network using Keras.

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

ML 프로그래밍 ( 보충 ) Scikit-Learn

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

Evaluating Classifiers

Multinomial Regression and the Softmax Activation Function. Gary Cottrell!

Introduction to Automated Text Analysis. bit.ly/poir599

Spatial Localization and Detection. Lecture 8-1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

MACHINE LEARNING CLASSIFIERS ADVANTAGES AND CHALLENGES OF SELECTED METHODS

Applying Supervised Learning

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Deep Learning Cook Book

Music Genre Classification

SVM multiclass classification in 10 steps 17/32

BAYESIAN GLOBAL OPTIMIZATION

Machine Learning with MATLAB --classification

Practical Guidance for Machine Learning Applications

ML4Bio Lecture #1: Introduc3on. February 24 th, 2016 Quaid Morris

WEKA Explorer User Guide for Version 3-5-5

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition

Deep Learning for Computer Vision

Transcription:

Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht

Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI

Outline Matlab Apps Weka 3 UI TensorFlow

Outline Matlab Apps Introduction App for Classification (Classification Learner) App for Regression (Regression Learner) App for Time Series (Neural Net Time Series) Weka 3 UI TensorFlow

Matlab Apps - Introduction What are Matlab apps? A set of apps that can complete basic tasks related to machine learning For machine learning models: Classification Regression Clustering Time series For Applications: Signal processing, biomedical computing, marketing, etc

Classification Learner An app including multiple classical classifiers: Logistic regression, decision tree, discriminant analysis, SVM, knn, ensembling methods, etc We can launch classification learner by clicking

Step 1: importing data From a matrix in current workspace, OR From an external file (.xls,.txt,.csv, etc)

Step 2: Specify the target (response) Specify the validation (K-fold cross validation or x% holdout validation)

Step 3: feature reduction Feature selection PCA Directly remove certain features Specify the proportion of information preserved (explained variance), OR Specify the number of components preserved

Step 4: model selection Step 5: start training By clicking

Step 6: plotting Confusion matrix ROC Curve

Step 7: export model To workspace, OR To workspace w/o training data, OR Generate code

Regression Learner An app including multiple classical regression models: Linear regression, SVM, Gaussian process, regression tree, ensembling trees, etc We can launch regression learner by clicking

Step 1 ~ 5 are the same Step 6: plotting Response plot True & predicted for each instance Predicted vs actual plot Predicted for each true value Residuals plot (True predicted) for each true value

Neural Net Time Series What is time series? In time series, data instances are time dependent In classification & regression Output of current instance <= input of current instance In time series Output of current instance <= input of current instance & output of previous instances & input of previous instances We can launch neural net time series by clicking

Step 1: model selection NARX (recommended) NAR NIO Both input and output dependent Only output dependent Only input dependent

Step 2: data loading From workspace, OR From file Step 3: set feature format Cell (scalar) Column vector Row vector

Step 4: set validation Step 5: set architecture # of hidden units # of delay # of dependent previous instances

Step 6: select algorithm Step 7: start training By clicking

Step 8: result visualization Error histogram Distribution of errors Response True and predicted for each instance Step 9: generate function

Outline Matlab Apps Weka 3 UI Introduction Weka Explorer Weka KnowledgeFlow TensorFlow

Weka - Introduction What is Weka? A set of Java APIs for machine learning and data mining with several GUIs For machine learning models: Classification Regression Clustering Rule-based models

Weka Explorer A more user-friendly GUI Multiple data source (local, URL, JDBC, etc) Complex data preprocessing techniques Machine learning models (SVM, regression, tree, instance-based, rulebased, Bayesian, etc) We can launch explorer by clicking

Step 1: data loading Step 2: feature selection Remove selected features

Step 3: filtering Conversion (e.g. numeric to nominal) Essential to classification Normalization Discretization PCA

Click for advanced settings Hover over the name for description You can use multiple filters consequently

Step 4: model selection Gray items are unavailable Click for advanced settings

Step 5: set validation K-fold or x% split Step 6: select target

Step 7: output Tree structure (if any) Error rate Confusion matrix Step 8: visualization True and predicted in 2-d feature space Tree structure (if any) Marginal distribution ROC

Weka KnowledgeFlow Disadvantage of Weka explorer Must load the whole dataset into memory Cannot control the workflow among different models Weka KnowledgeFlow is a GUI for building medium-large projects on large datasets We can launch knowledgeflow by clicking

Step 1: data loading Pick a loader for your data Click to draw it on the panel Double click for advanced settings Select the data file

Step 2: target settings In Filters Convert target to categorical In Evaluation Set target attribute Set positive class Step 3: add other filters

Step 4: validation settings In Evaluation Step 5: model selection You must manually check the availability of the model

Step 6: evaluation In Evaluation Generate error rate, confusion matrix, ROC, etc Step 7: visualization In Visualization Model structure Statistics in texts ROC and other plots

Step 8: connections Right click each module and select an output type For data source: Load the whole data, OR Load by instance

For cross validation: Output both training and testing set to the model For model: Output graph for visualizing the structure Output batchclassifier for evaluation For evaluation Output text for error rates and confusion matrix Output visualizableerror for ROC and other plots

Step 9: model building Run the flow Right click all the three viewers for results

Outline Matlab Apps Weka 3 UI TensorFlow Introduction Example Linear Regression Example - CNN

TensorFlow - Introduction A multi-language (mainly for Python 3 64bit) API for the newest deep learning algorithms Convolutional Neural Network Recurrent Neural Network Long-Short Term Memory Autoencoder * For details of functions, please refer to TensorFlow API and NumPy API. We do not have enough time explaining each line of the codes in this tutorial

File New Project Use TensorFlow in PyCharm Select Inherit global site-packages

Example Linear Regression Generate 100 random scalars as x, and y=0.1x+0.3 Set w (starting from random) and b (starting from 0) Set mean squard error Set gradient descent (with step length of 0.5)

Example Linear Regression Set initializer Start the workflow Run for 201 steps and print per 20 steps Output:

Example - CNN Recall CNN in previous slides Convolution layer Activation function Pooling layer Fully connected layer In this example, we use MNIST data 784 features (28*28*1 gray-scale image) 10 classes (digit 0 ~ digit 9)

CNN - Preprocessing Import MNIST data Set dimension # of x and class # of y Reshape x into 28*28*1

CNN Convolution Layer Set weight & bias Size 5*5*1, 32 kernels Set activation function Rectified Linear Unit (ReLU) 28*28*32 features Set pooling 2*2 => 1*1 14*14*32 features

CNN Fully Connected Layer Flatten all the convolution features Set weight & bias & activation function 14*14*32 => 1024 features No convolution

CNN Output Layer Set weight & bias 10 outputs Set predictions Softmax function

CNN Model Settings Set generalization error Cross entropy Set training algorithm Adam (step length 1e-4) Run for 501 steps and report accuracy per 50 steps

Questions?