Parameter Estimation with MOCK algorithm

Similar documents
Machine Learning in the Process Industry. Anders Hedlund Analytics Specialist

Package breakdown. June 14, 2018

Modeling Wine Preferences from Physicochemical Properties using Fuzzy Techniques

Lecture 13: Model selection and regularization

Ranking Between the Lines

Regularization and model selection

Machine Learning Techniques for Data Mining

10601 Machine Learning. Model and feature selection

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262

Business Club. Decision Trees

ARTICLE; BIOINFORMATICS Clustering performance comparison using K-means and expectation maximization algorithms

Slides for Data Mining by I. H. Witten and E. Frank

Network Traffic Measurements and Analysis

Allstate Insurance Claims Severity: A Machine Learning Approach

Statistical Matching using Fractional Imputation

Simple Model Selection Cross Validation Regularization Neural Networks

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

Machine Learning / Jan 27, 2010

The exam is closed book, closed notes except your one-page cheat sheet.

Cost-sensitive C4.5 with post-pruning and competition

Model selection and validation 1: Cross-validation

Louis Fourrier Fabien Gaie Thomas Rolf

Chapter 8 The C 4.5*stat algorithm

5 Learning hypothesis classes (16 points)

Linear methods for supervised learning

CS6220: DATA MINING TECHNIQUES

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Lecture 25: Review I

CS 229 Midterm Review

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

Algorithms for LTS regression

Data Mining in Bioinformatics Day 1: Classification

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Lecture 16: High-dimensional regression, non-linear regression

Support Vector Machines.

Variable Selection 6.783, Biomedical Decision Support

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Knowledge Discovery and Data Mining

A Systematic Overview of Data Mining Algorithms

Divide and Conquer Kernel Ridge Regression

CS229 Final Project: Predicting Expected Response Times

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

Prediction of Dialysis Length. Adrian Loy, Antje Schubotz 2 February 2017

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

MCMC Methods for data modeling

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization

Lecture 9: Support Vector Machines

Package sgd. August 29, 2016

Unsupervised Learning: Clustering

Performance Evaluation of Various Classification Algorithms

Bagging for One-Class Learning

1 Methods for Posterior Simulation

Algorithms: Decision Trees

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Multivariate Data Analysis and Machine Learning in High Energy Physics (V)

MULTICORE LEARNING ALGORITHM

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

6.034 Quiz 2, Spring 2005

Topics in Machine Learning

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

Mining Gradual Dependencies based on Fuzzy Rank Correlation

PV211: Introduction to Information Retrieval

Monte Carlo for Spatial Models

Model validation T , , Heli Hiisilä

Knowledge Discovery and Data Mining

Supervised Learning Classification Algorithms Comparison

Introduction to Automated Text Analysis. bit.ly/poir599

Cover Page. The handle holds various files of this Leiden University dissertation.

Preface 5. Introduction: Explanation & Prediction 6. Tools You Already Have 7. The Loss Function 10. Regularization 12. Contents. Some Terminology 7

CSEP 573: Artificial Intelligence

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

MLCC 2018 Local Methods and Bias Variance Trade-Off. Lorenzo Rosasco UNIGE-MIT-IIT

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

CSE446: Linear Regression. Spring 2017

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

TECHNICAL REPORT NO Random Forests and Adaptive Nearest Neighbors 1

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Final Report: Kaggle Soil Property Prediction Challenge

Support Vector Machines + Classification for IR

ISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report. Team Member Names: Xi Yang, Yi Wen, Xue Zhang

CSE 573: Artificial Intelligence Autumn 2010

Markov chain Monte Carlo methods

Data mining techniques for actuaries: an overview

Assessing the Quality of the Natural Cubic Spline Approximation

Machine Learning in Telecommunications

Predicting Popular Xbox games based on Search Queries of Users

Features: representation, normalization, selection. Chapter e-9

STA 414/2104 S: February Administration

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Transcription:

Parameter Estimation with MOCK algorithm Bowen Deng (bdeng2) November 28, 2015 1 Introduction Wine quality has been considered a difficult task that only few human expert could evaluate. However, it is infeasible to give all wine produced a quality label by one of those experts. In this project, we focus on the task of predicting wine quality from physicochemical properties via data mining. We propose a new algorithm for this task, called MOnte Carlo Kernel (MOCK), and compared its performance with ordinary linear square (OLS). 1.1 Related work [2] proposed a neural network approach for predicting human wine taste preferences. [4] proposed a regression tree model for the same problem. Both models obtained pretty good accuracy: 84.7% by neural net, and 88.29% by regression tree. Regression tree seems a better fit not only because the accuracy is higher, but also because neural network is prone to overfitting given only thousands of examples. On the contrarary, regression tree is not that sensitive to noise, and it can avoid overfitting by cross validation and pruning. However, both models are painful to train, and computationally expensive. 1.2 Data Overview The data we use is Wine Quality Data Set from [1]. The dataset has 4898 examples, with 11 features for each example. The output feature is quality, which is an integer score between 0 and 10. The input features Feature Min Max Mean Fixed acidity (g(tartaric acid)/dm 3 ) 3.8 14.2 6.9 Volatile acidity (g(acetic acid)/dm 3 ) 0.1 1.1 0.3 Citric acid (g/dm 3 ) 0.0 1.7 0.3 Residual sugar (g/dm 3 ) 0.6 65.8 6.4 Free sulfur dioxide (mg/dm 3 ) 2 289 35 Total sulfur dioxide (mg/dm 3 ) 9 440 138 Density (g/cm 3 ) 0.987 1.039 0.994 ph 2.7 3.8 3.1 Sulphates ((potassium sulphate)/dm 3 ) 0.2 1.1 0.5 Alcohol (vol.%) 8.0 14.2 10.4 Table 1: The physicochemical data statistics for input features are fixed acidity, volatile acidity, citric acid, residual sugar, free sulfur dioxide, total sulfur dioxide, density, ph, sulphates, and alcohol. All values are numeric. No missing values is present. 1

Notice the range of the score is large, we view this task as regression rather than classification. The correlation matrix for the dataset is shown in figure 1. Figure 1: Heatmap for correlation matrix of wine quality data 2 Methods We first present the general MOCK algorithm, then focus on the wine quality prediction problem. Take Θ a subset of Θ, and a kernel function K τ : R m R m R. Algorithm 1 MOnte Carlo Kernel algorithm for j = 1,, B do Sample θ(j) Unif(Θ ) for i = 1,, m do z i f θ(j) (x (i) ) end for w(j) K τ ( end for ˆθ = B j=1 w(j)θ(j) B j=1 w(j) y 1. y m, z 1. z m ) Theorem 1. Assume β [ M/2, M/2] n, for each x (i), y (i) sampled from N(β T x (i), σ 2 ). And we use a symmetric kernel function K τ (y, z) = K( y z τ ) satisfying K(x) < 2

then as M, m, ˆβ β The proof will be presented in supplementary material. Remark. These properties hold for common kernel functions including Gaussian kernel and uniform kernel. For linear model, which we will be using for the regression on wine quality, the algorithm will have the following degenerations. f θ(j) (x (i) ) = θ(j) T x (i) K τ (y, z) = exp ( y z 2 /τ)(gaussian kernel) We split the data into 80% of training examples (3918 examples) and 20% of training examples (980 examples), and we use root of mean square error (RMSE) as the evaluation criteria. 3 Results Set M = 2.0, τ = m. We implement the MOCK algorithm in R. The β we obtained by MOCK algorithm is: Intercept fixed.acidity volatile.acidity citric.acid residual.sugar free.sulfur.dioxide 5.89-0.03 0.01-0.06 0.02-0.01 total.sulfur.dioxide density ph suphates alcohol -0.00 0.14 0.12 0.00-0.02 Table 2: β obtained by MOCK algorithm with τ = m = 3918, M = 2.0. 3.1 Comparison We compare the RMSE and runtime of MOCK with those of OLS. To prove the accuracy is not by pure chance. We apply permutation test[3]: shuffle the order Figure 2: Comparison of RMSE (left) and runtime (right) for textbfmock and OLS of values for training data, train the parameter, and predict with the perturbed parameter. The result is also shown on the figure. 3

The runnng time of MOCK versus number of iterations is also shown. The running time of OLS is fixed as we are using the built-in stats::lm function. For the task of predicting wine quality, the loss of MOCK is less than that of OLS when iteration number is more than 4000. On the other hand, MOCK has an almost linear runtime, which exceeds the time for OLS when number of iteration is more than 7500. There is a clear tradeoff between the accuracy (RMSE) and the runtime. However, 6000 is a sweet spot where accuracy is almost converged and it s 18% lower than that of OLS, and the runtime is 25% smaller than that of OLS. By looking at the error for perturbed dataset, the RMSE obtained by both MOCK and OLS are significantly lower than that of perturbed dataset. Hence, we can safely draw the conclusion that both algorithms are capturing the signal rather than noise. We don t have the detailed accuracy result of neural net and regression tree. Therefore, we cannot compare the results with that of theirs. However, the focus of this project is to evaluate the MOCK algorithm rather than proposing models that beat the existing work. We believe MOCK also work if we modify the model to neural network model, though with more parameter tuning. 4 Conclusion 4.1 Discussion The MOCK algorithm is easy to implement. It converges quickly, and the accuracy is relatively high. The drawback of MOCK compared with OLS, is that it requires more parameter tuning. In the wine quality example, we tuned τ (the bandwidth), M (the bound of parameters) for several times to obtain good training error. Another issue is that MOCK doesn t provide a p value indicating the significance of each variable as OLS does. 4.2 Future work In this project, we only focus on parameter estimation with linear model. And due to this limitation, we only compared the MOCK estimator with OLS estimator. Another extension is to study heuristics to setup τ and M, so users don t have to tune the parameter for a long time. One possible way is to use cross validation. This approach isn t used in this project due to the limited size of the training examples. As mentioned in Methods section, this algorithm is valid for a very general class of problems, including classification, confidence interval estimation, etc. As a future work, we can apply the MOCK algorithm on other interesting datasets, and compare the performance with other approach, including Naive Bayes, logistic regression, SVM, even neural network. On the theoretical side, we only proved the consistency for linear model, and have not provide probability bound which gives the minimum m required to obtain ε-accuracy. Apart from giving an exact bounds for m, we can also exploit the mathematical proof for the consistency for classification models. 5 Supplementary Material Proof for theorem 1. 4

Proof. By law of large numbers, K(z(i), y(i)) E[K(z, y)] m Similarly, K(z(i), y(i)) ˆβ(i) m EK(z, y) ˆβ = EK(z, y)β + X 1 EK(z, y)[x ˆβ Xβ] = EK(z, y)β + X 1 EK(X ˆβ, Xβ + ε)[x ˆβ Xβ] EK(z, y)β + X 1 E[(X ˆβ Xβ)K(X ˆβ, Xβ) + (X ˆβ Xβ)K 2(X ˆβ, Xβ)ε] = EK(z, y)β + X 1 E[(X ˆβ Xβ)K(X ˆβ, Xβ)] = EK(z, y)β + X 1 s=x ˆβ X[ M,M] n (s Xβ)K(s Xβ, 0) = EK(z, y)β X 1 s/ X[ M,M] n (s Xβ)K(s Xβ, 0) The last equality holds because (s t)k(s t, 0) = sk(s, 0) = ( s)k( s, 0) = sk(s, 0) = 0 s R m s R m s R m s The integral diminishes as K <, and by Cauchy s rule. ˆβ = = β m EK(z, y)β EK(z, y) K(z(i), y(i)) ˆβ(i)/m K(z(i), y(i))/m References [1] A. Asuncion, D. Newman, UCI Machine Learning Repository, University of California, Irvine, 2007 http://archive.ics.uci.edu/ml/datasets/wine+quality [2] P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, Elsevier, 47(4):547-553, 2009. [3] Fisher, R.A. The Design of Experiments. New York: Hafner. 1935. [4] Michal Horak. Prediction of wine quality from physicochemical properties. Data mining, Czech Technical University in Prague. 2009/10. 5