A Proposal of Regression Hybrid Modeling for Combining Random Forest and X-Means Methods

Size: px
Start display at page:

Download "A Proposal of Regression Hybrid Modeling for Combining Random Forest and X-Means Methods"

Transcription

1 Total Quality Science Vol, No A Proposal of Regression Hybrid Modeling f Combining Random Fest and X-Means Methods Yuma Ueno*, Yasushi Nagata Waseda University, -4- Okubo, Shinjuku-ku Tokyo, 69-8, Japan *contact auth s address : uen-yum@tokiwasedajp Abstract: To derive useful infmation from complicated data, many hybrid modeling strategies that combine nonparametric and parametric methods have been proposed In this study, we propose a new hybrid modeling strategy that combines the random fest and the -means methods using linear regression analysis This strategy is referred to as XR regression This study has three purposes: to improve the perfmance of a strategy of hybrid modeling using the random fest method, to determine an optimal class automatically using the -means method, and to compare the prediction accuracy of this method with that of other eisting methods To determine the characteristics of XR regression, we compare its prediction accuracy with that of the eisting methods using Monte Carlo simulations The simulation results show that XR regression has a high perfmance in any situation, especially in data sets that include interaction effects Keywds Parametric model, linear regression analysis, interaction, tree topology, err dispersion Introduction Linear regression analysis is widely popular as a tool f data analysis and is used frequently to grasp and predict data structures Here, linear regression analysis is called the parametric method under a much wider definition because we assume a specic distribution in its model However, when data become large and complicated, the parametric method alone does not suffice f obtaining all useful infmation Therefe, the nonparametric method, which does not assume a specic distribution, becomes necessary However, the nonparametric method has a few disadvantages, such as overlearning Thus, even the nonparametric method cannot yield all useful infmation Therefe, in previous studies, the semi-parametric method (Robinson (988), Sakamoto and Shirahata (996)) and the hybrid model (Kadowaki et al (a, b)) were proposed The semi-parametric method assumes a specic distribution as a part of the model, and the hybrid model combines the nonparametric method with the parametric method A hybrid model using classication and regression tree (CART) analysis was proposed in previous studies (Kadowaki et al (a, b)), and we call this model the Kadowaki hybrid model (Kadowaki HM) However, other combinations of machine learning methods were not considered in previous studies, so we believe it is possible to propose a new hybrid model with higher predictability As pilot studies, we investigated the perfmances of several hybrid models that combined cluster analysis, the k means method, and the -means method with machine learning methods such as the random fest method, the suppt vect machine (SVM), and the neural netwk Since we found that the hybrid model that combines the -means method and the random fest method had the highest perfmance, we propose this combination method in this study [DOI:99/tqs] Copyright Journal of the Japanese Society f Quality Control All rights reserved

2 Regression hybrid model, Ueno et al Furtherme, we evaluate the perfmance of the proposed hybrid model quantitatively using Monte Carlo simulations The construction of this paper is as follows In section, we eplain the random fest and the -means methods In section, we elucidate the proposed hybrid method In section 4, we compare the accuracy of the proposed method with that of the previous study using a real data set In section, we conduct a simulation study to evaluate the perfmance of the proposed method In section 6, we give conclusions The -means method and the random fest method The -means method The -means method was named by Pelleg and Moe () and is one of the cluster automatic decision methods The first step is determining a small enough cluster division, which is then repeated to the etent that the two divisions are assumed to be suitable f each cluster In this study, we use the improved -means method, which was proposed by Ishioka (, 6) This method proceeds as follows: Determine an initial parameter k (the default value is ) f the number of small enough clusters Apply the k-means method under the condition of k=k (here, k epresses the number of clusters) Then, divide the whole data set, and let the clusters after the division be C,C,, Ck Repeat procedures four and five under the conditions of i=,,,k 4 Apply the k-means method to the cluster Ci under the condition of Let the clusters after the division be C i, C i Compare the Bayesian infmation criterion after the division (BIC) with the same criterion befe the division (BIC) Divide it BIC>BIC, and stop the division not 6 Finish dividing when there is no cluster left to divide further The random fest method The random fest method is one of the machine learning methods It repeatedly constructs decision trees using dferent bootstrap samples from the data The algithm is as follows (Liaw and Wiener ()): Draw bootstrap samples from the iginal data F each of the bootstrap samples, grow an unpruned classication regression tree with the following modication: at each node, rather than choosing the best split among all predicts, select a random sample of predicts and choose the best split from among these variables Predict new data by aggregating the predictions of the trees The random fest method has been applied to various areas F eample, Ishioka () applied it to a national test, and Niizuma and Saito (9) applied it to music classication Proposed hybrid model in this study (XR regression) In this study, we propose a hybrid model called XR regression We determine several classes of learning data automatically using the -means method Using the random fest method, we identy to which class each data set f prediction belongs Then, we add class dummy variables as eplanaty variables and eecute a linear regression analysis XR regression is a method that is intended to enhance prediction accuracy We assume that a learning data set eists in hand, and each of the data sets f prediction is predicted using the learning data set i The detailed procedures are as follows: Assume that a learning data set that has p items and n samples eists Let the eplanaty variables be ( i,, p) and the objective variable be y k Procedure : Divide the learning data set into q classes Cj ( j,,q ) using the -means method Procedure : Add class labels Cj ( j,,q ) to the learning data set as dummy variables [DOI:99/tqs] Copyright Journal of the Japanese Society f Quality Control All rights reserved

3 Total Quality Science Vol, No Procedure : Estimate the class of each data set f prediction using the random fest method, and add the estimated classes to the data sets f prediction as dummy variables Procedure 4: Construct regression model (), which includes the dummy variables described in Procedure y p p C q Cq () where βi ( i,, p ) are the regression coefficients f the eisting eplanaty variables, γj ( j,,q ) are the regression coefficients f the dummy variables of the classes, and is the err term Procedure : Apply the data sets f prediction to the estimated regression model provided by procedure 4 and predict them 4 Real data analysis 4 Analytical procedure In this section, we analyze real data and compare the prediction accuracy of the proposed method with those of the previous methods The eisting methods we compare in this section are linear regression and the Kadowaki HM The number of repetitions is, We use an average absolute err (we call it the prediction err (PE) here) as the evaluation inde n y i yˆ i n i (4) 4 Boston housing price data We use housing price data f Boston These data are included in the MASS package of the statistical analysis software R The Boston data set has non-linear and interactive structures and includes 6 samples and 4 variables We divide the 6 samples into two groups of equal size at random We use one group as the learning data set and the other group as the data set f prediction This data frame contains the following variables crim: Per capita crime rate by town zn: Proption of residential land zoned f lots over, sq ft indus: Proption of non-retail business acres per town chas: Charles River dummy variable ( tract bounds river; otherwise) no: Nitrogen oides concentration (parts per million) rm: Average number of rooms per dwelling age: Proption of owner-occupied units built pri to 94 dis: Weighted mean of distances to five Boston employment centers rad: Inde of accessibility to radial highways ta: Full-value property-ta rate per $, ptratio: Pupil-teacher ratio by town black: The proption of black residents by town lstat: Percentage of the population that is lower status medv: Median value of owner-occupied homes in $,s [DOI:99/tqs] Copyright Journal of the Japanese Society f Quality Control All rights reserved

4 Regression hybrid model, Ueno et al Figure Accuracy comparison f Boston data Figure shows the accuracies of three methods f the Boston data We can see that XR regression has a better accuracy than linear regression and the Kadowaki HM However, it cannot be inferred that the dference is meaningful Hence, we conducted Monte Carlo simulations to confirm the kinds of data features f which the proposed method perfms well Perfmance evaluation of the hybrid model by simulation Outline of the simulations We conducted Monte Carlo simulations to eamine what kinds of data features are best suited f the proposed hybrid model effectively In this study, to produce data f simulation, we added the tree topology structure and the interaction structure to each of the linear and non-linear models, and we changed the err dispersion The methods compared were linear regression, the Kadowaki HM, and XR regression The detailed settings in the simulation study were as follows The number of simulations was set to be, The number of sample size was We assumed the err term followed N(, ) 4 We used an average absolute err (PE) as the evaluation inde Linear model At first, we added the tree topology structure, the interaction structure, and the change in the err dispersion to the linear model to produce data and compare the accuracy Linear model data with a tree topology structure We eecuted this simulation based on linear model data with a tree topology structure We produced the data accding to fmula () The number of eplanaty variables was five, and we assumed that all of them followed the unm distribution U(,) We used function () to add the compleity of the divergence in reference to a function called f (tree), which Miyataka () used to break the linear structure because a tree topology model has a feature that deals with variables as non-continuous y () 4 f ( tree) [DOI:99/tqs] Copyright Journal of the Japanese Society f Quality Control All rights reserved 4

5 Total Quality Science Vol, No Copyright Journal of the Japanese Society f Quality Control All rights reserved [DOI:99/tqs] treevalue treevalue treevalue treevalue treevalue treevalue treevalue treevalue ( tree ) f () Figure shows the simulation results of the linear model plus the tree topology structure The number of clusters is four in the XR regression We can see from Figure that the accuracy of the PE of the XR regression is the best From these simulation results, we prove that XR regression is me powerful to grasp the tree topology structure f cluster analysis using all variables at the same time than is CART, which uses every single variable Figure Accuracy comparison under the linear model + the tree topology structure Linear model data with an interaction structure

6 Regression hybrid model, Ueno et al We eecuted this simulation based on linear model data with an interaction structure We produced data accding to fmula () There were five eplanaty variables, and all of them were quantitative variables We allotted the standard values called a and a, which each took values of, to and Then, a and a gave y effects accding to rule (4) We used a function called g(interaction), which Miyataka () used, to produce the interaction Interactionvalue was a fied number, and we changed it from to, 4, and were quantitative variables that followed the unm distribution U(,) and were quantitative variables that followed the nmal distributions described in Table y g( interactio ) () 4 n interactionvalue a, a g ( interaction) (4) interactionvalue else Table Distribution that each standard value a and a follows a a a a ~ (, ) ~ (, ) ~ (, ) ~ (, ) N N N N Figure shows the simulation results of the linear model plus the interaction structure The number of clusters is two in the XR regression We can see that the accuracy of the PE of the XR regression is the best The interaction between eplanaty variables cannot be detected well by the Kadowaki HM using CART, but it can be detected well by XR regression using clustering We think that this finding is because it is hard f CART to detect an interaction using only one variable On the other hand, it is easy f cluster analysis to detect the interaction using all of the variables The PE suddenly decreases from a certain point, and it can be said that the larger the interaction, the greater the usefulness of the XR regression Figure Accuracy comparison under the linear model + the interaction structure [DOI:99/tqs] Copyright Journal of the Japanese Society f Quality Control All rights reserved 6

7 Total Quality Science Vol, No Linear model data with changes in the err dispersion The influence of the err dispersion is sometimes large in real data Thus, we changed and simulated the err variance to check how the err dispersion influences accuracy in this subsection We produced data accding to fmula () The number of eplanaty variables was five, and all of them followed the unm distribution U(,) We had assumed ~ N(, ) f the err term up to now, but we assumed ~ N(, ) f the err term in this subsection We changed err value, which means the value of the err dispersion σ, from to, and we simulated it y () 4 Figure 4 Accuracy comparison under the linear model + err change Figure 4 shows the simulation results of the linear model with changes in the err The number of clusters is si in the XR regression We can see that the accuracy of the XR regression is the highest in a linear model with a large err dispersion The accuracy of the Kadowaki HM is less than that of linear regression, so the influence of the err dispersion is large f the Kadowaki HM Non-linear model Generally, it is rare that real data are based on a perfectly linear model Most data partly include some non-linear structures Thus, in this section, we assumed a multiplicative epression as a non-linear model and added the tree topology structure and the interaction structure to each of the non-linear models and changed the err dispersion to produce data f simulation Non-linear model data with a tree topology structure We eecuted this simulation based on non-linear model data with a tree topology structure We produced data accding to fmula (6) The number of eplanaty variables was five, and all of them followed the unm distribution U(,) We used function () as f (tree) y 4 f ( tree) (6) [DOI:99/tqs] Copyright Journal of the Japanese Society f Quality Control All rights reserved

8 Regression hybrid model, Ueno et al Figure Accuracy comparison under the non-linear model + the tree topology structure Figure shows the simulation results of the non-linear model with the tree topology structure The number of clusters is nine in the XR regression We can see that the accuracy of the PE of the XR regression is the best On the other hand, the Kadowaki HM could not detect the tree topology most of the time It can be said that XR regression evades the influence of the tree topology well by clustering The tree topology structure becomes me dficult to grasp in case of a non-linear model, and the accuracy becomes wse However, the accuracy is relatively stable f the XR regression using all variables Non-linear model data with an interaction structure We eecuted this simulation based on non-linear model data with an interaction structure We produced data accding to fmula () There were five eplanaty variables, and all of them were quantitative We used function (4) as g ( interaction), 4 and were quantitative variables that followed the unm distribution U(,) and were quantitative variables as described in Table 4 ) y g( interaction () Figure 6 Accuracy comparison under the non-linear model + the interaction structure [DOI:99/tqs] Copyright Journal of the Japanese Society f Quality Control All rights reserved 8

9 Total Quality Science Vol, No Figure 6 shows the simulation results of the non-linear model with the interaction structure The number of clusters is four in the XR regression Because influence of the interaction is small until interactio nvalue becomes, the interaction effect cannot be detected well; thus, the Kadowaki HM mostly maintains the best accuracy However, the influence of the interaction grows larger after the interaction value eceeds, and the accuracies of all of the methods ecept XR regression become wse Only XR regression maintains a good accuracy That is, we find that the Kadowaki HM is effective when the interaction value is small and XR regression is effective when the interaction value is large We find that XR regression can detect the influence of the interaction However, f non-linear structure models in which the value of the variable itself varies greatly without the interaction, CART using one variable achieves higher detection Non-linear model data with changes in the err dispersion This time, we produced data accding to fmula (8) The number of eplanaty variables was five, and all of them followed the unm distribution U(,) 4 y (8) Figure shows the simulation results of the non-linear model with err changes The number of clusters is three in the XR regression We can see that the accuracy of XR regression is the highest even f a non-linear model with a large err dispersion In case of the Kadowaki HM, we find that the accuracy of the err is low, similar to the result with the linear model 6 Conclusion Figure Accuracy comparison under the non-linear model + err change We proposed a new hybrid model that combined the random fest and -means methods At first, in der to very the accuracy of the proposed method, we used Boston house price data The Boston data had non-linearity and interaction structures, and the accuracy of the XR regression was slightly better than that of the Kadowaki HM We then conducted Monte Carlo simulations to very f which kinds of data features the XR regression perfmed well When the influence of an interaction was small in a non-linear model, the Kadowaki HM showed good accuracy However, the Kadowaki HM was not so effective in other simulation settings On the other hand, XR regression maintained good accuracy in basically all situations, and we found it to be a wellbalanced method overall There are three future challenges First, because the most suitable cluster automatic decision method already eists along with the -means method, which we used f XR regression, we should compare the accuracy using other methods as well Second, in this study, we eecuted the simulation only f particular data in the linear and [DOI:99/tqs] Copyright Journal of the Japanese Society f Quality Control All rights reserved 9

10 Regression hybrid model, Ueno et al non-linear models Thus, we should eecute other simulations using data with crelations between eplanaty variables with many variables Third, there might be new discoveries we veried what happens to the hybrid effect when using a regression method besides linear regression analysis References Ishioka, T (), Etended K-means with an Efficient Estimation of the Number of Clusters, Japanese Journal of Applied Statistics, Vol9, No, pp4-49 Ishioka, T (6), An Epansion of X-means -Progressive Iteration of K-means and Merging of the Clusters-, Japanese Society of Computational Statistics, Vol8, No, pp- Ishioka, T (), Data Imputation by Random Fest-The Principle and Its Application f National Center Test in Japan, Japanese Journal of Applied Statistics, Vol4, No, pp9-9 Kadowaki, T, Suzuki, N, Suzuki, T and Otaki, A (a), Application of Hybrid Modeling to POS Data Analysis, Japanese Journal of Quality, Vol, No4, pp9- Kadowaki, T and Otaki, A (b), Application of Hybrid Modeling to Air Quality Data by Combining CART Analysis with Regression Model, Memoirs of the Institute of Science and Technology, Meiji University, Vol9, No9, pp69- Liaw, A and Wiener, M (), Classication and Regression by randomfest, R news, ISSN69-6 Miyataka, T (), Study about a Hybrid Model Combined a Regression Model and a Tree Topology Model, Master s thesis, Graduate school at Waseda University Niitsuma, M and Saito, H (9), Music Genre Classication Using Random Fest, Infmation Processing Society of Japan, Vol, No, pp9-9 Pelleg, D and Moe, A (), X-means: Etending K-means with Efficient Estimation of Clusters, ICML Robinson, P M (988), Root-N-Consistent Semiparametric Regression, Econometrica, Vol6, No4, pp9-94 Sakamoto, W and Shirahata, S (996), Spline Smoothing on Semiparametric Regression Problem, Japanese Society of Computational Statistics, Vol9, No, pp- Acknowledgement: We would like to thank the anonymous referees f their valuable comments This wk was partly suppted by JSPS Grants-in-Aid f Scientic Research Grant Number K6 Auths biographical notes Yuma Ueno is a graduate student in the Department of Industrial and Management System Engineering of the Graduate School of Creative Science and Engineering at Waseda University Yasushi Nagata is a profess in the Department of Industrial and Management System Engineering of the School of Creative Science and Engineering at Waseda University [DOI:99/tqs] Received: March, 6 Revised: Nobember, 6 Accepted: March, [DOI:99/tqs] Copyright Journal of the Japanese Society f Quality Control All rights reserved

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce Overview Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Exploration

More information

Package nodeharvest. June 12, 2015

Package nodeharvest. June 12, 2015 Type Package Package nodeharvest June 12, 2015 Title Node Harvest for Regression and Classification Version 0.7-3 Date 2015-06-10 Author Nicolai Meinshausen Maintainer Nicolai Meinshausen

More information

Tutorial 1. Linear Regression

Tutorial 1. Linear Regression Tutorial 1. Linear Regression January 11, 2017 1 Tutorial: Linear Regression Agenda: 1. Spyder interface 2. Linear regression running example: boston data 3. Vectorize cost function 4. Closed form solution

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Package KernelKnn. January 16, 2018

Package KernelKnn. January 16, 2018 Type Package Title Kernel k Nearest Neighbors Version 1.0.8 Date 2018-01-16 Package KernelKnn January 16, 2018 Author Lampros Mouselimis Maintainer Lampros Mouselimis

More information

VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS

VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering, Faculty of Eletrical Engineering, Czech

More information

Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM

Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM Evolution of Regression II: From OLS to GPS to MARS Hands-on with SPM March 2013 Dan Steinberg Mikhail Golovnya Salford Systems Salford Systems 2013 1 Course Outline Today s Webinar: Hands-on companion

More information

The Review of Attributes Influencing Housing Prices using Data Mining Methods

The Review of Attributes Influencing Housing Prices using Data Mining Methods International Journal of Sciences: Basic and Applied Research (IJSBAR) ISSN 2307-4531 (Print & Online) http://gssrr.org/index.php?journal=journalofbasicandapplied ---------------------------------------------------------------------------------------------------------------------------

More information

Evolution of Regression III:

Evolution of Regression III: Evolution of Regression III: From OLS to GPS, MARS, CART, TreeNet and RandomForests March 2013 Dan Steinberg Mikhail Golovnya Salford Systems Course Outline Previous Webinars: Regression Problem quick

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Adaptive Recovery of Image Blocks Using Spline Approach

Adaptive Recovery of Image Blocks Using Spline Approach IJCSNS International Journal of Computer Science and Netwk Security, VOL.11 No., February 011 1 Adaptive Recovery of Image Blocks Using Spline Approach Jong-Keuk Lee Ji-Hong Kim Jin-Seok Seo Dongeui University,

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Trees, Dendrograms and Sensitivity

Trees, Dendrograms and Sensitivity Trees, Dendrograms and Sensitivity R.D. Braddock Cooperative Research Centre for Catchment Hydrology, Griffith University, Nathan, Qld 4111, Australia (r.braddock@mailbo.gu.edu.au) Abstract: Dendrograms

More information

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems Salford Systems Predictive Modeler Unsupervised Learning Salford Systems http://www.salford-systems.com Unsupervised Learning In mainstream statistics this is typically known as cluster analysis The term

More information

Exploring Econometric Model Selection Using Sensitivity Analysis

Exploring Econometric Model Selection Using Sensitivity Analysis Exploring Econometric Model Selection Using Sensitivity Analysis William Becker Paolo Paruolo Andrea Saltelli Nice, 2 nd July 2013 Outline What is the problem we are addressing? Past approaches Hoover

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

Sample Exam. Advanced Test Automation - Engineer

Sample Exam. Advanced Test Automation - Engineer Sample Exam Advanced Test Automation - Engineer Questions ASTQB Created - 2018 American Software Testing Qualifications Board Copyright Notice This document may be copied in its entirety, or extracts made,

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Effect of Cleaning Level on Topology Optimization of Permanent Magnet Synchronous Generator

Effect of Cleaning Level on Topology Optimization of Permanent Magnet Synchronous Generator IEEJ Journal of Industry Applications Vol.6 No.6 pp.416 421 DOI: 10.1541/ieejjia.6.416 Paper Effect of Cleaning Level on Topology Optimization of Permanent Magnet Synchronous Generator Takeo Ishikawa a)

More information

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing

More information

Implementing Layer 2 Access Lists

Implementing Layer 2 Access Lists Implementing Layer 2 Access Lists An Ethernet services access control list (ACL) consists of one me access control entries (ACE) that collectively define the Layer 2 netwk traffic profile. This profile

More information

* Hyun Suk Park. Korea Institute of Civil Engineering and Building, 283 Goyangdae-Ro Goyang-Si, Korea. Corresponding Author: Hyun Suk Park

* Hyun Suk Park. Korea Institute of Civil Engineering and Building, 283 Goyangdae-Ro Goyang-Si, Korea. Corresponding Author: Hyun Suk Park International Journal Of Engineering Research And Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 13, Issue 11 (November 2017), PP.47-59 Determination of The optimal Aggregation

More information

A Heuristic Robust Approach for Real Estate Valuation in Areas with Few Transactions

A Heuristic Robust Approach for Real Estate Valuation in Areas with Few Transactions Presented at the FIG Working Week 2017, A Heuristic Robust Approach for Real Estate Valuation in May 29 - June 2, 2017 in Helsinki, Finland FIG Working Week 2017 Surveying the world of tomorrow From digitalisation

More information

Refining searches. Refine initially: query. Refining after search. Explicit user feedback. Explicit user feedback

Refining searches. Refine initially: query. Refining after search. Explicit user feedback. Explicit user feedback Refine initially: query Refining searches Commonly, query epansion add synonyms Improve recall Hurt precision? Sometimes done automatically Modify based on pri searches Not automatic All pri searches vs

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

Relation Organization of SOM Initial Map by Improved Node Exchange

Relation Organization of SOM Initial Map by Improved Node Exchange JOURNAL OF COMPUTERS, VOL. 3, NO. 9, SEPTEMBER 2008 77 Relation Organization of SOM Initial Map by Improved Node Echange MIYOSHI Tsutomu Department of Information and Electronics, Tottori University, Tottori,

More information

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects Patralekha Bhattacharya Thinkalytics The PDLREG procedure in SAS is used to fit a finite distributed lagged model to time series data

More information

Basic Statistical Terms and Definitions

Basic Statistical Terms and Definitions I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can

More information

Visualization of Crowd-Powered Impression Evaluation Results

Visualization of Crowd-Powered Impression Evaluation Results Visualization of Crowd-Powered Impression Evaluation Results Erika GOMI,YuriSAITO, Takayuki ITOH (*)Graduate School of Humanities and Sciences, Ochanomizu University Tokyo, Japan {erika53, yuri, itot }

More information

Hideki SAKAMOTO 1 Ikuo TANABE 2 Satoshi TAKAHASHI 3

Hideki SAKAMOTO 1 Ikuo TANABE 2 Satoshi TAKAHASHI 3 Journal of Machine Engineering, Vol. 14, No. 2, 2014 Taguchi methods, production, management, optimum condition, innovation Hideki SAKAMOTO 1 Ikuo TANABE 2 Satoshi TAKAHASHI 3 DEVELOPMENT OF PERFECTLY

More information

Predicting Messaging Response Time in a Long Distance Relationship

Predicting Messaging Response Time in a Long Distance Relationship Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when

More information

Structure Learning in Bayesian Networks with Parent Divorcing

Structure Learning in Bayesian Networks with Parent Divorcing Structure Learning in Bayesian Networks with Parent Divorcing Ulrich von Waldow (waldow@in.tum.de) Technische Universität München, Arcisstr. 21 80333 Munich, Germany Florian Röhrbein (florian.roehrbein@in.tum.de)

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A NOVEL HYBRID APPROACH FOR PREDICTION OF MISSING VALUES IN NUMERIC DATASET V.B.Kamble* 1, S.N.Deshmukh 2 * 1 Department of Computer Science and Engineering, P.E.S. College of Engineering, Aurangabad.

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

Functions. Introduction CHAPTER OUTLINE

Functions. Introduction CHAPTER OUTLINE Functions,00 P,000 00 0 y 970 97 980 98 990 99 000 00 00 Figure Standard and Poor s Inde with dividends reinvested (credit "bull": modification of work by Prayitno Hadinata; credit "graph": modification

More information

Lesson 21: Comparing Linear and Exponential Functions Again

Lesson 21: Comparing Linear and Exponential Functions Again Lesson M Lesson : Comparing Linear and Eponential Functions Again Student Outcomes Students create models and understand the differences between linear and eponential models that are represented in different

More information

Reliability Verification of Search Engines Hit Counts: How to Select a Reliable Hit Count for a Query

Reliability Verification of Search Engines Hit Counts: How to Select a Reliable Hit Count for a Query Reliability Verification of Search Engines Hit Counts: How to Select a Reliable Hit Count for a Query Takuya Funahashi and Hayato Yamana Computer Science and Engineering Div., Waseda University, 3-4-1

More information

Implementing Access Lists and Prefix Lists on Cisco ASR 9000 Series Routers

Implementing Access Lists and Prefix Lists on Cisco ASR 9000 Series Routers Implementing Access Lists and Prefix Lists on Cisco ASR 9000 Series Routers An access control list (ACL) consists of one me access control entries (ACE) that collectively define the netwk traffic profile.

More information

A Hybrid Intelligent System for Fault Detection in Power Systems

A Hybrid Intelligent System for Fault Detection in Power Systems A Hybrid Intelligent System for Fault Detection in Power Systems Hiroyuki Mori Hikaru Aoyama Dept. of Electrical and Electronics Eng. Meii University Tama-ku, Kawasaki 14-8571 Japan Toshiyuki Yamanaka

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

1 Lab 1. Graphics and Checking Residuals

1 Lab 1. Graphics and Checking Residuals R is an object oriented language. We will use R for statistical analysis in FIN 504/ORF 504. To download R, go to CRAN (the Comprehensive R Archive Network) at http://cran.r-project.org Versions for Windows

More information

Computational Methods in Statistics with Applications A Numerical Point of View. Large Data Sets. L. Eldén. March 2016

Computational Methods in Statistics with Applications A Numerical Point of View. Large Data Sets. L. Eldén. March 2016 Computational Methods in Statistics with Applications A Numerical Point of View L. Eldén SeSe March 2016 Large Data Sets IDA Machine Learning Seminars, September 17, 2014. Sequential Decision Making: Experiment

More information

Using CODEQ to Train Feed-forward Neural Networks

Using CODEQ to Train Feed-forward Neural Networks Using CODEQ to Train Feed-forward Neural Networks Mahamed G. H. Omran 1 and Faisal al-adwani 2 1 Department of Computer Science, Gulf University for Science and Technology, Kuwait, Kuwait omran.m@gust.edu.kw

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Random Forests and Boosting

Random Forests and Boosting Random Forests and Boosting Tree-based methods are simple and useful for interpretation. However they typically are not competitive with the best supervised learning approaches in terms of prediction accuracy.

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 16, 2015 Urtasun & Zemel (UofT) CSC 411: 02-Regression Sep 16, 2015 1 / 16 Today Linear regression problem continuous

More information

Non-linear models. Basis expansion. Overfitting. Regularization.

Non-linear models. Basis expansion. Overfitting. Regularization. Non-linear models. Basis epansion. Overfitting. Regularization. Petr Pošík Czech Technical Universit in Prague Facult of Electrical Engineering Dept. of Cbernetics Non-linear models Basis epansion.....................................................................................................

More information

Efficient Acquisition of Human Existence Priors from Motion Trajectories

Efficient Acquisition of Human Existence Priors from Motion Trajectories Efficient Acquisition of Human Existence Priors from Motion Trajectories Hitoshi Habe Hidehito Nakagawa Masatsugu Kidode Graduate School of Information Science, Nara Institute of Science and Technology

More information

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms. Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering

More information

Local Minima in Regression with Optimal Scaling Transformations

Local Minima in Regression with Optimal Scaling Transformations Chapter 2 Local Minima in Regression with Optimal Scaling Transformations CATREG is a program for categorical multiple regression, applying optimal scaling methodology to quantify categorical variables,

More information

machine learning framework for Mathematica Version 1.5 What's New

machine learning framework for Mathematica Version 1.5 What's New machine learning framework for Mathematica Version 1.5 What's New What's New in mlf 1.5 Multi-platform support The most important improvement of version 1.5 of the machine learning framework for Mathematica

More information

Small area estimation by model calibration and "hybrid" calibration. Risto Lehtonen, University of Helsinki Ari Veijanen, Statistics Finland

Small area estimation by model calibration and hybrid calibration. Risto Lehtonen, University of Helsinki Ari Veijanen, Statistics Finland Small area estimation by model calibration and "hybrid" calibration Risto Lehtonen, University of Helsinki Ari Veijanen, Statistics Finland NTTS Conference, Brussels, 10-12 March 2015 Lehtonen R. and Veijanen

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

Check Skills You ll Need (For help, go to Lesson 1-2.) Evaluate each expression for the given value of x.

Check Skills You ll Need (For help, go to Lesson 1-2.) Evaluate each expression for the given value of x. A_3eSE_00X 0/6/005 :3 AM Page - Eploring Eponential Models Lesson Preview What You ll Learn To model eponential growth To model eponential deca... And Wh To model a car s depreciation, as in Eample 6 Check

More information

2.4. Families of Polynomial Functions

2.4. Families of Polynomial Functions 2. Families of Polnomial Functions Crstal pieces for a large chandelier are to be cut according to the design shown. The graph shows how the design is created using polnomial functions. What do all the

More information

1.2. Characteristics of Polynomial Functions. What are the key features of the graphs of polynomial functions?

1.2. Characteristics of Polynomial Functions. What are the key features of the graphs of polynomial functions? 1.2 Characteristics of Polnomial Functions In Section 1.1, ou eplored the features of power functions, which are single-term polnomial functions. Man polnomial functions that arise from real-world applications

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

PatternRank: A Software-Pattern Search System Based on Mutual Reference Importance

PatternRank: A Software-Pattern Search System Based on Mutual Reference Importance PatternRank: A Software-Pattern Search System Based on Mutual Reference Importance Atsuto Kubo, Hiroyuki Nakayama, Hironori Washizaki, Yoshiaki Fukazawa Waseda University Department of Computer Science

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Note: In the presentation I should have said "baby registry" instead of "bridal registry," see

Note: In the presentation I should have said baby registry instead of bridal registry, see Q-and-A from the Data-Mining Webinar Note: In the presentation I should have said "baby registry" instead of "bridal registry," see http://www.target.com/babyregistryportalview Q: You mentioned the 'Big

More information

Stat 342 Exam 3 Fall 2014

Stat 342 Exam 3 Fall 2014 Stat 34 Exam 3 Fall 04 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed There are questions on the following 6 pages. Do as many of them as you can

More information

Clustering for Load Balancing and Fail Over

Clustering for Load Balancing and Fail Over Clustering f Balancing and Fail Over Target: Lightstreamer Server v. 7.0 greater Last updated: 16/04/2018 Table of contents 1 Introduction...3 2 HTTP-Based Scenarios...5 2.1 Leverage LB Stickiness Options

More information

Subject-specific study and examination regulations for the M.Sc. Computer Science degree programme

Subject-specific study and examination regulations for the M.Sc. Computer Science degree programme Faculty of Computer Science and Mathematics Subject-specific study and examination regulations f the M.Sc. Computer Science degree programme of 27 April 2016 Imptant notice: Only the German text, as published

More information

Efficient Mining Algorithms for Large-scale Graphs

Efficient Mining Algorithms for Large-scale Graphs Efficient Mining Algorithms for Large-scale Graphs Yasunari Kishimoto, Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka Abstract This article describes efficient graph mining algorithms designed

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/22055 holds various files of this Leiden University dissertation. Author: Koch, Patrick Title: Efficient tuning in supervised machine learning Issue Date:

More information

[2006] IEEE. Reprinted, with permission, from [Wenjing Jia, Gaussian Weighted Histogram Intersection for License Plate Classification, Pattern

[2006] IEEE. Reprinted, with permission, from [Wenjing Jia, Gaussian Weighted Histogram Intersection for License Plate Classification, Pattern [6] IEEE. Reprinted, with permission, from [Wening Jia, Gaussian Weighted Histogram Intersection for License Plate Classification, Pattern Recognition, 6. ICPR 6. 8th International Conference on (Volume:3

More information

A NOVEL APPROACH FOR TEST SUITE PRIORITIZATION

A NOVEL APPROACH FOR TEST SUITE PRIORITIZATION Journal of Computer Science 10 (1): 138-142, 2014 ISSN: 1549-3636 2014 doi:10.3844/jcssp.2014.138.142 Published Online 10 (1) 2014 (http://www.thescipub.com/jcs.toc) A NOVEL APPROACH FOR TEST SUITE PRIORITIZATION

More information

Limits and Derivatives (Review of Math 249 or 251)

Limits and Derivatives (Review of Math 249 or 251) Chapter 3 Limits and Derivatives (Review of Math 249 or 251) 3.1 Overview This is the first of two chapters reviewing material from calculus; its and derivatives are discussed in this chapter, and integrals

More information

Measures of Central Tendency

Measures of Central Tendency Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of

More information

Data Mining: Models and Methods

Data Mining: Models and Methods Data Mining: Models and Methods Author, Kirill Goltsman A White Paper July 2017 --------------------------------------------------- www.datascience.foundation Copyright 2016-2017 What is Data Mining? Data

More information

Scholz, Hill and Rambaldi: Weekly Hedonic House Price Indexes Discussion

Scholz, Hill and Rambaldi: Weekly Hedonic House Price Indexes Discussion Scholz, Hill and Rambaldi: Weekly Hedonic House Price Indexes Discussion Dr Jens Mehrhoff*, Head of Section Business Cycle, Price and Property Market Statistics * Jens This Mehrhoff, presentation Deutsche

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Nonparametric Mixed-Effects Models for Longitudinal Data

Nonparametric Mixed-Effects Models for Longitudinal Data Nonparametric Mixed-Effects Models for Longitudinal Data Zhang Jin-Ting Dept of Stat & Appl Prob National University of Sinagpore University of Seoul, South Korea, 7 p.1/26 OUTLINE The Motivating Data

More information

Practical Design of Experiments: Considerations for Iterative Developmental Testing

Practical Design of Experiments: Considerations for Iterative Developmental Testing Practical Design of Experiments: Considerations for Iterative Developmental Testing Best Practice Authored by: Michael Harman 29 January 2018 The goal of the STAT COE is to assist in developing rigorous,

More information

Predicting User Ratings Using Status Models on Amazon.com

Predicting User Ratings Using Status Models on Amazon.com Predicting User Ratings Using Status Models on Amazon.com Borui Wang Stanford University borui@stanford.edu Guan (Bell) Wang Stanford University guanw@stanford.edu Group 19 Zhemin Li Stanford University

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis

Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis Gert Van Dijck, Marc M. Van Hulle Computational Neuroscience Research Group, Laboratorium

More information

Section 2: Operations on Functions

Section 2: Operations on Functions Chapter Review Applied Calculus 9 Section : Operations on Functions Composition of Functions Suppose we wanted to calculate how much it costs to heat a house on a particular day of the year. The cost to

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

Automatic Drawing for Tokyo Metro Map

Automatic Drawing for Tokyo Metro Map Automatic Drawing for Tokyo Metro Map Masahiro Onda 1, Masaki Moriguchi 2, and Keiko Imai 3 1 Graduate School of Science and Engineering, Chuo University monda@imai-lab.ise.chuo-u.ac.jp 2 Meiji Institute

More information

Conditional Volatility Estimation by. Conditional Quantile Autoregression

Conditional Volatility Estimation by. Conditional Quantile Autoregression International Journal of Mathematical Analysis Vol. 8, 2014, no. 41, 2033-2046 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijma.2014.47210 Conditional Volatility Estimation by Conditional Quantile

More information

Chapter 7 CONCLUSION

Chapter 7 CONCLUSION 97 Chapter 7 CONCLUSION 7.1. Introduction A Mobile Ad-hoc Network (MANET) could be considered as network of mobile nodes which communicate with each other without any fixed infrastructure. The nodes in

More information

Table of Contents POSTGRESQL DATABASE OBJECT MANAGEMENT 4. POSTGRESQL SCHEMAS 5 PostgreSQL Schema Designer 7. Editing PostgreSQL Schema General 8

Table of Contents POSTGRESQL DATABASE OBJECT MANAGEMENT 4. POSTGRESQL SCHEMAS 5 PostgreSQL Schema Designer 7. Editing PostgreSQL Schema General 8 PostgreSQL Database Object Management 1 Table of Contents POSTGRESQL DATABASE OBJECT MANAGEMENT 4 POSTGRESQL SCHEMAS 5 PostgreSQL Schema Designer 7 Editing PostgreSQL Schema General 8 PostgreSQL Tables

More information

ANALYSIS OF USER TRAJECTORIES BASED ON DATA DISTRIBUTION AND STATE TRANSITION: A CASE STUDY WITH A MASSIVELY MULTIPLAYER ONLINE GAME ANGEL LOVE ONLINE

ANALYSIS OF USER TRAJECTORIES BASED ON DATA DISTRIBUTION AND STATE TRANSITION: A CASE STUDY WITH A MASSIVELY MULTIPLAYER ONLINE GAME ANGEL LOVE ONLINE ANALYSIS OF USER TRAJECTORIES BASED ON DATA DISTRIBUTION AND STATE TRANSITION: A CASE STUDY WITH A MASSIVELY MULTIPLAYER ONLINE GAME ANGEL LOVE ONLINE Ruck Thawonmas, Junichi Oda, and Kuan-Ta Chen Intelligent

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Distribution-Free Learning of Bayesian Network Structure in Continuous Domains

Distribution-Free Learning of Bayesian Network Structure in Continuous Domains In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI), Pittsburgh, PA, July 25 Distribution-Free Learning of Bayesian Network Structure in Continuous Domains Dimitris Margaritis

More information

An overview for regression tree

An overview for regression tree An overview for regression tree Abstract PhD (C.) Adem Meta University Ismail Qemali Vlore, Albania Classification and regression tree is a non-parametric methodology. CART is a methodology that divides

More information

Traveling Salesman Problem. Java Genetic Algorithm Solution

Traveling Salesman Problem. Java Genetic Algorithm Solution Traveling Salesman Problem Java Genetic Algorithm Solution author: Dušan Saiko 23.08.2005 Index Introduction...2 Genetic algorithms...2 Different approaches...5 Application description...10 Summary...15

More information

Predictor Selection Algorithm for Bayesian Lasso

Predictor Selection Algorithm for Bayesian Lasso Predictor Selection Algorithm for Baesian Lasso Quan Zhang Ma 16, 2014 1 Introduction The Lasso [1] is a method in regression model for coefficients shrinkage and model selection. It is often used in the

More information

Cost-based Pricing for Multicast Streaming Services

Cost-based Pricing for Multicast Streaming Services Cost-based Pricing for Multicast Streaming Services Eiji TAKAHASHI, Takaaki OHARA, Takumi MIYOSHI,, and Yoshiaki TANAKA Global Information and Telecommunication Institute, Waseda Unviersity 29-7 Bldg.,

More information

MA 180 Lecture Chapter 7 College Algebra and Calculus by Larson/Hodgkins Limits and Derivatives

MA 180 Lecture Chapter 7 College Algebra and Calculus by Larson/Hodgkins Limits and Derivatives MA 180 Lecture Chapter 7 College Algebra and Calculus by Larson/Hodgkins Limits and Derivatives 7.1) Limits An important concept in the study of mathematics is that of a it. It is often one of the harder

More information