Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

Similar documents
Naïve Bayes for text classification

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Linear methods for supervised learning

Introduction to Automated Text Analysis. bit.ly/poir599

SOCIAL MEDIA MINING. Data Mining Essentials

Fraud Detection using Machine Learning

Chapter 4: Algorithms CS 795

ADVANCED CLASSIFICATION TECHNIQUES

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

Support Vector Machines

Supervised Learning Classification Algorithms Comparison

Support Vector Machines

CS229 Final Project: Predicting Expected Response Times

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Applying Supervised Learning

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

CS 229 Midterm Review

Lecture 9: Support Vector Machines

Link Prediction for Social Network

CSE 573: Artificial Intelligence Autumn 2010

Classification. 1 o Semestre 2007/2008

Performance Evaluation of Various Classification Algorithms

Chapter 4: Algorithms CS 795

SUPPORT VECTOR MACHINES

Supervised vs unsupervised clustering

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

CS570: Introduction to Data Mining

Contents. Preface to the Second Edition

Support Vector Machines

Classification Algorithms in Data Mining

Generative and discriminative classification techniques

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

Machine Learning: Algorithms and Applications Mockup Examination

Application of Support Vector Machine Algorithm in Spam Filtering

VECTOR SPACE CLASSIFICATION

CSE 158. Web Mining and Recommender Systems. Midterm recap

Multi-label classification using rule-based classifier systems

CSEP 573: Artificial Intelligence

Machine Learning. Chao Lan

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Classification: Feature Vectors

Predicting Popular Xbox games based on Search Queries of Users

Statistics 202: Statistical Aspects of Data Mining

Data mining with Support Vector Machine

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Semi-supervised Learning

Function Algorithms: Linear Regression, Logistic Regression

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Machine Learning: Think Big and Parallel

ECG782: Multidimensional Digital Signal Processing

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Logistic Regression: Probabilistic Interpretation

PV211: Introduction to Information Retrieval

Data Mining Algorithms: Basic Methods

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

Supervised Learning: K-Nearest Neighbors and Decision Trees

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Louis Fourrier Fabien Gaie Thomas Rolf

Machine Learning in Biology

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

SUPPORT VECTOR MACHINES

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Basic Concepts Weka Workbench and its terminology

CS 343: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Machine Learning / Jan 27, 2010

Data Mining and Knowledge Discovery: Practice Notes

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Business Club. Decision Trees

Introduction to Support Vector Machines

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

CS145: INTRODUCTION TO DATA MINING

To earn the extra credit, one of the following has to hold true. Please circle and sign.

Support vector machines

CISC 4631 Data Mining

Kernels and Clustering

STA 4273H: Statistical Machine Learning

Based on Raymond J. Mooney s slides

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Data Mining. Lecture 03: Nearest Neighbor Learning

CS145: INTRODUCTION TO DATA MINING

Data Mining and Knowledge Discovery: Practice Notes

CS249: ADVANCED DATA MINING

CS513-Data Mining. Lecture 2: Understanding the Data. Waheed Noor

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Support vector machines

Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Topics in Machine Learning

Transcription:

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D.

Agenda Background Data-Mining (Un-) Conditional Classifiers Implementation Data Performance Measures Experimental Results Conclusions

Background Mode choice modeling is an integral part of the 4-step travel demand forecasting procedure Process: Estimating the distribution of mode choices given a set of trip attributes Input: Set of attributes related to the trip, person, and household Output: Probability distribution across set of mode choices

Background Discrete choice models (e.g. multinomial logit) have historically dominated this area of research Major problem with discrete choice models is their predictive capability Increasing attention is being paid to data-mining techniques borrowed from the artificial intelligence and machine learning communities Historically, they have shown competitive performance

Background However, most data-mining approaches have treated trips within a tour as independent With the exception of Miller et al. (2005) who build an agent-based mode-choice model that explicitly treats the dependence between trips Our approach follows in the vein of Miller, but avoids developing an explicit framework

Data-Mining Process of extracting hidden patterns from data Example uses: Marketing, fraud detection and scientific discovery Classifiers: map attributes to labels (mode) Decision Trees, Naïve Bayes, Simple Logistic, Support Vector Machines Ensemble Method

Decision Trees Repeated attribute partitioning To maximize class homogeneity Heuristic function i.e. information gain Partitions form If-Then rules High degree of interpretability Outlook = Rain /\ Windy = False => Play Outlook = Sunny /\ Humidity > 70 => Don t Play

Naïve Bayes Purely probabilistic approach Estimate class posterior probabilities For an example d (a vector of attributes) Compute Pr(C = c j d = <A 1 = a 1, A 2 = a 2, A n = a n >), for all classes c j Using Bayes rule: Pr(C = c j ) Pr(A i = a i C = c j ) Pr(C = c j ) and Pr(A i = a i C = c j ) can be estimated from data by occurrence counts Select class with highest probability

Simple Logistic Based on linear regression method Supported by LogitBoost algorithm Fits a succession of logistic models Each successive model learns from previous classification mistakes Model parameters are fine-tuned to find the best (least error) fit Best attributes are automatically selected using cross-validation

Support Vector Machines Linear learning Binary classifier Finds the maximum margin hyperplane that separates two classes Soft margins for nonlinearly separable data

Support Vector Machines (cont.) Kernel functions can be used to allow for non-linear boundaries Transformation into higher dimensional space Idea: non-linear data will become linearly separable φ : X F x φ( x)

Ensemble Method Build multiple classifiers and use their outputs as a form of voting for final class selection AdaBoost Trains a sequence of classifiers Each one is dependent on the previous classifier Dataset is re-weighted in order to focus on previous classifier s errors Final classification is performed by passing each instance through the set of classifiers and combining their weighted output

(Un-) Conditional Classifiers Notion of anchor mode is used in this study The mode selected when departing from an anchor point (e.g. home) Work Home Store

(Un-) Conditional Classifiers Un-conditional classifier: for first trip on tour Calculates P(mode = anchor mode attributes) Conditional classifier: for each subsequent trip Calculates P(mode = i attributes, anchor mode = j) Classifier outputs are combined probabilistically P(mode = i) = Σ j P(mode = i attributes, anchor mode = j) * P(anchor mode = j)

Implementation Data-Mining classifiers Developed Java application to perform (un-) conditional classification Leveraged Weka Data Mining Toolkit API for implementations of all data mining algorithms Discrete Choice Model Biogeme modeling software used to develop (un-) conditional multinomial logit (MNL) models Developed experimental framework in Java to evaluate MNL models in identical manner

Data Models were developed using the Chicago Travel Tracker Survey (2007-2008) data Consists of 1- and 2-day activity diaries from 32,118 people among 14,315 households in the 11 counties neighboring Chicago Data used for experimentation contained 19,118 tours decomposed into 116,666 trip links

Performance Measures Three metrics from the information-retrieval literature are leveraged: Mean Precision Mean Recall Accuracy Precision/recall used when interest centers around classification on particular classes Accuracy complements precision/recall with aggregate performance across classes

Performance Measures Precision Recall Accuracy

Performance Measures For purposes of evaluating mode choice prediction, recall is most important metric Mode choice is not so much a classification task, but a problem of distribution estimation Recall captures the sum of the deviation for each mode, from the real distribution

Experimental Results To test usefulness of anchor mode attribute, classifiers were built with and without knowing the anchor mode While anchor mode will never be known with 100% certainty, these tests provided an upper bound for any expected performance gain Classifiers tested were: C4.5 decision trees, Naïve Bayes, Simple Logistic and SVM

Experimental Results

Experimental Results Anchor mode improves the classification performance A second stage of testing was performed using (un-) conditional models Best performance achieved using different algorithms for conditional and un-conditional models

Experimental Results

Experimental Results The AdaBoost-NaiveBayes un-conditional / AdaBoost-C4.5 conditional model (AB-NB/AB- C4.5) is considered best performing Marginally lower recall than best, much higher precision and better accuracy Combination of high accuracy and recall simultaneously make it the best overall classifier

Experimental Results Conditional and un-conditional MNL models were built and evaluated Attribute selection based on t-test significance Adjusted rho-squared (ρ 2 ) values were 0.684 and 0.691 for the un-conditional and conditional models respectively

Experimental Results

Conclusions The AB-NB/AB-C4.5 combination of classifiers achieved a high level of accuracy, precision and recall, outperforming the MNL models Importantly, recall performance is higher by a large margin Performance over MNL is higher than may have been previously thought It may be advantageous to consider using both techniques as complementary tools

Contributions Showing superiority of data-mining models Use of anchor mode with un-conditional classifiers Arguing for mean recall as the best metric to use Showing that the AB-NB/AB-C4.5 combination has the best overall performance

Thank You! Questions?