Automated Selection of Training Data and Base Models for Data Stream Mining Using Naïve Bayes Ensemble Classification

Size: px
Start display at page:

Download "Automated Selection of Training Data and Base Models for Data Stream Mining Using Naïve Bayes Ensemble Classification"

Transcription

1 Proceedngs of the World Congress on Engneerng 2017 Vol II, July 5-7, 2017, London, U.K. Automated Selecton of Tranng Data and Base Models for Data Stream Mnng Usng Naïve Bayes Ensemble Classfcaton Patrca E.N. Lutu Abstract A data stream s a contnuous, tme-ordered sequence of data tems. Data stream mnng s the process of applyng data mnng methods to a data stream n real-tme n order to create descrptve or predctve models for the process that generates the data stream. The characterstcs of a data stream typcally change wth tme. These changes gve rse to the need to contnuously make revsons to, or completely rebuld predctve models when the changes are detected. Rebuldng and changng the predctve models needs to be a fast process because stream data may arrve at a hgh speed. Manual labelng of tranng data before creatng new models may not be able to cope wth the speed at whch model revson needs to be performed. Ths paper presents expermental results for the performance of methods for the automated selecton of hgh qualty tranng data for predctve model revson for predctve modelng usng Naïve Bayes ensemble classfcaton. The expermental results demonstrate that the use of two base model combnaton algorthms for the ensembles, results n a hgh level of confdence n the predctons. Secondly, the perodc revson of the ensemble usng the base models created from the automatcally selected tranng data produces revsed ensemble models wth hgh predctve performance. Index Terms data mnng, data stream mnng, Naïve Bayes classfcaton, ensemble classfcaton, nstance labelng A I. INTRODUCTION data stream s a contnuous, tme-ordered sequence of data tems [1]. Data stream mnng s the process of applyng data mnng methods to a data stream n real-tme n order to create descrptve or predctve models for the process that generates the data stream [1], [2], [3]. The characterstcs of a data stream typcally change wth tme. The changes are categorsed as dstrbuton changes and concept changes [2], [4]. These changes gve rse to the need to contnuously make revsons to, or completely rebuld predctve models when the changes are detected. Rebuldng and changng the predctve models needs to be a fast process because stream data may arrve at a hgh speed. Tradtonal methods of manually labelng tranng data before creatng new models may not be able to cope wth the speed at whch model revson needs to be performed. Lutu [5] has proposed a predctve modelng framework Manuscrpt receved February 03, 2017; revsed March 06, Patrca. E. N. Lutu s a Senor Lecturer n the Department of Computer Scence, Unversty of Pretora, Pretora 0002, Republc of South Afrca, phone: ; fax: ; e-mal: Patrca.Lutu@up.ac.za; web: ISSN: (Prnt); ISSN: (Onlne) for automated nstance labelng, n stream mnng. The purpose of ths paper s to report on extensons of the work reported by Lutu [5]. Expermental studes are reported on methods for the perodc revson of classfcaton ensemble models n order to ensure that the stream nstances that are classfed by ensemble models are assgned class labels wth a hgh level of confdence. Ths s necessary n order to ensure that hgh qualty automatcally labeled and selected tranng data s used for the creaton of new base models whch can be used for ensemble revson. The expermental results demonstrate that the use of two base model combnaton algorthms for the ensembles, results n a hgh level of confdence n the predctons. Secondly, the perodc revson of the ensemble usng the base models created from the automatcally labeled and selected tranng data produces revsed ensemble models wth hgh predctve performance. The rest of the paper s organsed as follows: Secton II provdes background to the reported research. Secton III presents the proposed ensemble model revson approach. Sectons IV and V respectvely provde the expermental methods and experment results. Secton VI concludes the paper. II. BACKGROUND A. Modelng actvtes for predctve data stream mnng Predctve data stream mnng nvolves the creaton of classfcaton or regresson models. For classfcaton modelng, tranng data wth class labels s used to create the model. A real-lfe data stream does not have a fnte sze or statc behavour. Ths means that a data stream cannot be stored n ts entrety, and the nstance classes cannot be correctly predcted by a model that s created as a once-off actvty. Ths makes t necessary to contnuously or perodcally revse the predctve model usng tranng data that s reasonably recent. Two maor approaches that are used for predctve data stream mnng are the sldng wndow approach and the ensemble approach. An ensemble model conssts of several base models whose predctons are combned nto one fnal predcton usng a combnaton algorthm [6]. The ntal modelng actvtes for ether approach nvolve selectng tranng data nstances, selectng relevant predctve features, creatng the ntal model, and testng the model. Thereafter, the model s used to provde predctons for data stream nstances as they arrve, and the predctve performance s montored to determne when the model should be revsed or

2 Proceedngs of the World Congress on Engneerng 2017 Vol II, July 5-7, 2017, London, U.K. replaced n ts entrety. Data streams typcally exhbt the phenomena of concept change and dstrbuton change. Concept change refers to changes n the descrpton of the class varable values and may be classfed as concept drft whch s a gradual change of the concept, or concept shft whch s a more abrupt change of the concept [1], [2], [3], [7]. Dstrbuton change, on the other hand, refers to changes n the data dstrbuton of the data stream. Concept and dstrbuton change strateges typcally nvolve contnuous montorng of model performance and data characterstcs [7]. For predctve data stream mnng, concept drft and dstrbuton changes are typcally handled through contnuous or perodc revson of the predctve model. Concept shft (sudden concept change) s handled through complete replacement of the current model. A parallel actvty to model usage and montorng s the labelng of new tranng data whch s then used to test the model performance, and to create new models for model revson or model change. Masud et. al [3], Zhu et. al [8] and Zhang et. al [9] have all observed that, for predctve stream mnng, manual labelng of data s a costly and tme consumng exercse. In practce t s not possble to manually label all stream data, especally when the data arrves at a hgh speed. It s therefore common practce to label only a small fracton of the data for tranng and testng purposes. Masud et. al [3] have proposed the use of ensemble classfcaton models based on sem-supervsed clusterng, so that only a small fracton (5) of the data needs to be labeled for the clusterng algorthm. Zhu et. al [8] have proposed an actve learnng framework for solvng the nstance labelng problem. Actve learnng ams to dentfy the most nformatve nstances that should be manually labeled n order to acheve the hghest accuracy. B. Ensemble models for stream mnng Several ensemble classfcaton methods for stream mnng have been reported n the lterature. Examples of ensemble frameworks that have been reported n the lterature are the streamng ensemble algorthm (SEA) [10], the accuracy-weghted ensemble (AWE) [11], and the dynamcally weghted maorty (DWM) ensemble [12]. Lutu [5] has proposed an ensemble framework for stream mnng based on Naïve Bayes ensemble classfcaton [13],[14]. The framework conssts of an onlne component and an offlne component. The onlne component uses Naïve Bayes ensemble base models to make predctons for newly arrved data stream nstances. The offlne component conssts of algorthms to combne base model predctons, determne the relablty of the ensemble predctons, select tranng data for new base models, create new base models, and determne whether the current onlne base models need to be replaced. Two obectves of ths framework are (1) to remove the need for manual labelng of tranng nstances, and (2) to determne when model revson should be performed. C. Nave Bayes ensemble classfcaton Naïve Bayes classfcaton has been reported n the lterature as one of the deal algorthms for stream mnng, due to ts ncremental nature [15]. The Naïve Bayes classfer assgns posteror class probabltes for the query nstance x based on Bayes theorem. The tranng dataset for a classfer s charactersed by d predctor varables X 1,...,X d and a class varable C. The tranng dataset conssts of n tranng nstances, and each nstance may be denoted as ( x,c ) where x = x,...,x ) are the values of ( 1 d the predctor varables, and c { c,...,c } s the class 1 label. Gven a new query nstance x = x,...,x ) Naïve K ( 1 d Bayes classfcaton nvolves the computaton of the probablty Pr( c x ) of the nstance belongng to each class c as [13], [14] Pr( c x ) = P r( c )P r( x c ) (1) P ( x ) where Pr( x c ) = Pr( x c ) and Pr( x ) = K = 1 r P ( Π r c )P r( x c ) The values of contnuous-valued varables are typcally converted nto categores through the process of dscretsaton. The quanttes Pr( c ) for the classes, and Pr( x c ), for the predctor varables, are then estmated from the tranng data. For zero-one loss classfcaton, the class c wth the hghest probablty Pr( c x ) s selected as the predcted class for nstance x. Feature selecton nvolves the dentfcaton of features (predctor varables) that are relevant and not redundant for a gven predcton task [16]. Lu and Motoda [16], Kohav [17], John et al. [18], and Lutu [19] have dscussed the merts of conductng feature selecton for Naïve Bayes classfcaton. One straght-forward method of feature selecton, s the use the Symmetrcal Uncertanty ( SU ) coeffcent as dscussed by Lutu [19]. The SU coeffcent for a predctor varable X and the class varable C s defned n terms of the entropy functon as [16], [20] SU = 2. 0( E( X ) + E( C) E( X,C) /( E( X ) + E( C)). (2) I ( 2 = 1 where E X ) = Pr ( x ) log Pr ( x ) J ( 2 = 1 and E C ) = Pr ( c ) log Pr ( c ) are respectvely the entropy for X and C and I E ( X,C ) = Pr ( x,c ) log 2 = 1 J = 1 Pr ( x,c ). s the ont entropy for X and C [16], [20]. The SU coeffcent takes on values n the nterval [0,1] and has the same nterpretaton as Pearson s product moment correlaton coeffcent for quanttatve varables [16]. Whte and Lu [20], and Lutu [19] have observed that the entropy functons and the ont entropy functon n (2) can be computed from a contngency table. A 2-dmensonal contngency table s a cross-tabulaton whch gves the frequences of co-occurrence of the values of two categorcal varables X and Y. For Naïve Bayes classfcaton and feature selecton, X s the feature and the ISSN: (Prnt); ISSN: (Onlne)

3 Proceedngs of the World Congress on Engneerng 2017 Vol II, July 5-7, 2017, London, U.K. second varable s C, whch s the class varable. Varous statstcal measures can be derved from a contngency table. The man reason why Naïve Bayes (NB) classfcaton was selected for the framework reported by Lutu [5] s because model creaton s a smple and fast actvty. For each predctve feature, a sngle contngency table of counts for the feature values and class labels provdes all the data needed for the computaton of the quanttes for the terms n (1) and (2). Addtonally, the class entropy measure for makng decsons on base model selecton, as dscussed n Secton III, can be easly computed from the contngency tables. III. PROPOSED APPROACH TO ENSEMBLE REVISION A. Automated nstance labelng and selecton for Naïve Bayes classfcaton The framework proposed by Lutu [5] uses three measures for assessng the performance of the ensemble base models. These measures are: Certanty, Relablty, and Incoherence. Certanty measures the frequency that all base models n the ensemble have predcted the same class. Relablty measures the frequency that one class s predcted by the maorty of base models n the ensemble. Incoherence measures the frequency that each base model n the ensemble has predcted a dfferent class from the other base models. The studes reported n ths paper are an extenson of the studes reported by Lutu [5]. The extensons nvolve refnements to the predcton categores and the crtera for makng decsons on ensemble model revson. For the studes reported by Lutu [5], the ensemble predctons were based on the maorty vote by the base models. A maor refnement s to use two methods to determne the class label for a query nstance. Gven a query nstance x, each base model of the ensemble provdes a probablstc score for each class. The scores provded by the base models are typcally stored n a decson matrx wth one row for each base model and one column for each class [6]. Varous combnaton methods are avalable for determnng the ensemble predcton. For the studes reported n ths paper, the mean score for each class s computed and the class selected based on the score s the one wth the hghest mean score value. Addtonally, each base model provdes a predcton based on the class scores. Ths s the class wth the hghest score. The class wth the maorty vote (by the base models) s also used to determne the predcton category. A predcton s treated as vald f the class selected by the mean score value s the same as the one selected by the maorty vote. The nformaton provded by the score-based predctons and by the maorty vote predctons, for the vald predctons, s used as a bass for determnng the predcton categores. Table I provdes a summary of the revsed predcton categores, whch are: Certan, Relable, WeaklyRelable, and NotRelable. Only those predctons that are vald are assgned the categores Certan, Relable or WeaklyRelable. For the proposed automated nstance labelng approach, the nstances that are assgned the categores Certan and Relable are selected as the tranng data for buldng NB base models that can be used to replace the current base models when the need arses. The contngency tables for the new base model can be ncrementally created off-lne as the tranng data s beng generated. At the end of a tme perod two maor actvtes are conducted. The frst actvty s to determne f the newly created base model has any predctve value. Ths assessment s based on the class entropy of the tranng data used to create the contngency tables for the model, and the number of selected (relevant) features. The second actvty s to assess the predctve performance of the current ensemble on the data for the tme perod, and then make a decson on whether the new base model should replace one of the current base models. TABLE I PREDICTION CATEGORIES Mean score range for predcted class Vote for predcted class category 0.8 to out of 3 Certan (score >= 0.8) and 2 out of 3 Certan (score <= 1.0) 1 out of 3 NotRelable 0.6 to 0.8 (score >= 0.6) and (score < 0.8) 0.5 to 0.6 (score >= 0.5) and (score < 0.6) 0.0 to 0.5 (score >= 0.0) and (score < 0.5) B. Base model selecton for ensemble revson Varous measures of ensemble performance are defned n terms of the nstance counts for the predcton categores, and the total number of predctons for a gven tme perod. The measures are defned as follows: Measures of ensemble performance: Surrogate Accuracy = ( count (Certan) + count(relable) ) / ( TotalPredctons) Certanty = count (Certan) / TotalPredctons Relablty = count (Relable) / TotalPredctons WeakRelablty = count (Weakly Relable) / TotalPredctons NonRelablty Agreement 3 out of 3 Relable 2 out of 3 Relable 1 out of 3 NotRelable 3 out of 3 WeaklyRelable 2 out of 3 WeaklyRelable 1 out of 3 NotRelable 3 out of 3 NotRelable 2 out of 3 NotRelable 1 out of 3 NotRelable = count (Not Relable) / TotalPredctons = count( predctons where base modelpredcton s the same as ensemble predcton) The proposed approach handles dstrbuton changes, concept drft and sudden concept change proactvely. Addtonally, the approach ams to ensure that frequent model revsons result n hgh confdence predctons. A maor tme perod T (e.g. after 30,000 nstances are receved) s used for makng decsons about ensemble model revson to handle possble dstrbuton changes and concept drft. Addtonally, a mnor tme perod t (e.g. after 3,000 nstances) s used to montor the possblty of sudden concept change. Rule 1 s used at the end of each mnor tme perod, to check for sudden concept change. ISSN: (Prnt); ISSN: (Onlne)

4 Proceedngs of the World Congress on Engneerng 2017 Vol II, July 5-7, 2017, London, U.K. When sudden concept change s detected, then the current ensemble should be abandoned, and a new ensemble should be created usng manually labeled data. Ensemble performance s assessed at the end of each maor tme perod T, to determne whether the most recently created base Rule 1: If (Certanty < specfed value) AND (Surrogate Accuracy < specfed value) AND (WeakRelablty > specfed value) AND (Agreement for at least one base model < specfed value) then flag sudden concept change model should replace one of the current base models. If the new base model has zero class entropy (.e. all tranng nstances are of the same class) then no replacement decson s made, otherwse, Rule 2 s used to determne the need for base model replacement. Rule 2: If (class entropy for the new base mode s greater than 0) then If (base model has the lowest Agreement) AND (base model has lowest no. of selected features) AND (base model has lower class entropy than new base model) AND (base model has fewer or equal number of selected features as new base model) then replace base model wth the new base model. IV. EXPERIMENTAL METHODS The KDD Cup 1999 dataset avalable from the UCI KDD archve [21] was used for the studes. The dataset conssts of a tranng dataset and a test dataset. The 10 verson of the tranng dataset was used for the experments. Ths dataset conssts of 494,021 nstances, 41 features and 23 classes. The 23 classes may be grouped nto fve categores: NORMAL, DOS, PROBE, R2L and U2R whch can then be used as the classes [22]. A new feature (called ID) was added to the dataset wth values n the range [1, ] as a pseudo tmestamp. Fg. 1 shows a plot of the class dstrbuton for ths data stream. The data stream exhbts an extreme mbalance of the class dstrbuton over tme. It s clear from Fg. 1 that the classes DOS and NORMAL are the maorty classes. Instance count K-30K 30K-60K 60K-90K 90K-120K Class dstrbuton for KDDCup99 120K-150K 150K-180K 180K-210K 210K-240K 240K-270K The algorthms for feature selecton, Naïve Bayes classfcaton, and base model predcton combnaton, were mplemented as a GNU C++ applcaton. Detals of the 270K-300K Tme perod Fg. 1. Class dstrbuton for the KDD Cup 1999 data stream (adopted from [5] ) ISSN: (Prnt); ISSN: (Onlne) 300K-330K 330K-360K 360K-390K 390K-420K 420K-450K 450K-480K 480K-494K DOS NORMAL PROBE R2L U2R requred data structures have been presented by Lutu [19]. The decsons on concept change detecton and model revson were made through manual nspecton of the model performance measures followed by the applcaton of Rule 1 or Rule 2 presented n Secton III. These decsons can easly be automated n C++ program code. V. EXPERIMENTS FOR STREAM MINING A. Obectves of the experments Exploratory experments were conducted to answer the followng questons: (1) Do the Certan and Relable categores lead to the selecton of a hgh percentage of correctly labeled tranng data? (2) Are the Certanty and Relablty measures good estmators of predctve accuracy? (3) Do the proposed methods for ensemble model revsons result n mprovements n Certan and Relable predctons? (4) Are the Certanty, Relablty, WeakRelablty, NonRelablty measures useful for forecastng possble concept change? B. Creaton of the ntal base models The top 90,000 nstances of the KDD Cup 1999 data stream were used for the creaton of the three base models MW1, MW2 and MW3 for the ntal ensemble. The nstances were dvded nto three batches of tranng nstances for each Naïve Bayes base model. The base models were tested on the data for the tme perod W4 (90K- 120K). Table II shows the propertes of the tranng data, and the test results for the ntal ensemble base models. TABLE II PROPERTIES OF THE TRAINING DATA FOR THE INITIAL ENSEMBLE BASE MODELS C. Expermental results The ntal ensemble model {MW1, MW2, MW3} was used to provde predctons for the data stream nstances startng at tme perod W4 (90K-120K). For each predcton perod W, the nstances that were assgned the Certan or Tme perod Relable category were selected as tranng data and the base model MW was created from ths tranng data. Tranng set sze Relevant features Model Class Entropy MW1 0K-30K 30, MW2 30K-60K 30, MW3 60K-90K 30, Testng accuracy on W4 data Addtonally, a decson was made whether or not to replace one of the current base models, usng the logc of Rule 2. Tables III and IV show the descrpton of the ensemble models and the ensemble performance, for the tme perods W4,,W17 (90K to 494K). For the tme perods W6 to W11, the base models are exactly the same. Ths s because all the nstances for these tme perods belong to the DOS class. The entres n column 3 of Table III show that base model replacements were done at the begnnng of tme perods W5, W6 and W13. The results of columns 3, 4 and 5 of Table IV show that, for the perods W5 to W15, the values for Certanty, Surrogate Accuracy and real Accuracy are consstently hgh. Addtonally, the Surrogate

5 Proceedngs of the World Congress on Engneerng 2017 Vol II, July 5-7, 2017, London, U.K. Accuracy values are generally very close to the (real) Accuracy values. For the perod W16, the values for Certanty, Surrogate Accuracy and real Accuracy suddenly plummet to much lower levels. Ths s an ndcaton of sudden concept change. Based on the above observatons, the answer to the queston: Are the Certanty and Relablty measures good estmators of predctve accuracy? s yes. The reader wll recall that Surrogate Accuracy s defned n terms of Certanty and Relablty. TABLE III DESCRIPTION OF THE CONTINUOUSLY REVISED ENSEMBLE Ensemble Predcton Wndow Predcton tme perod base models W4 90K-120K MW1, MW2, MW3 E1 W5 120K-150K MW2, MW3, MW4 E2 W6,..,W11 150K-330K MW2, MW4, MW5 E3 W12 330K-360K MW2, MW4, MW5 E3 W13 360K-390K MW2, MW5, MW12 E4 W14 390K-420K MW2, MW5, MW12 E4 W15 420K-450K MW2, MW5, MW12 E4 W16 450K-480K MW2, MW5, MW12 E4 W17 480K-494K MW2, MW5, MW12 E4 TABLE IV PREDICTIVE PERFORMANCE OF THE CONTINUOUSLY REVISED ENSEMBLE Predcton wndow ensemble Certanty Surrogate Accuracy real Accuracy W4 E W5 E W6,..,W11 E W12 E W13 E W14 E W15 E W16 E W17 E Table V shows the values of the performance measures for the tme perods W5 to W17. The values for the Certan and Relable categores for the perods W5 to W15 (120K to 450K clearly ndcate that the class labels assgned to nstances n these categores are generally correct (can be trusted). Ths mples that the automatcally labeled and selected tranng data for the ensemble s of good qualty. Predcton perod TABLE V ASSESSMENT OF ENSEMBLE PREDICTION CATEGORIES Certan Relable Certanty Correct Relablty Correct W W6,..,W W W W W W W However, the values for the perod W16 (450K-480K) ndcate that the assgned class labels for the Relable category cannot be trusted as they have a very low level of correctness and so, the nstances for ths tme perod should not be selected as tranng data. Ths stuaton s detected as concept change whch s dscussed below. So, do the Certan and Relable categores lead to the selecton of a hgh percentage of correctly labeled tranng data? Based on the foregong observatons, the answer to ths queston s yes. The percentage of correctly labeled tranng data s n the regon of 95, whch mples a nose level n the regon of 5. Table VI shows the propertes and predcton behavour of all the base models that were used n the perodcally revsed ensemble. It can be deduced from Table III that at the end of W4, the decson was made to replace MW1 wth MW4, to obtan the ensemble {MW2, MW3, MW4}. Based on the values n Table VI and Rule 2 of Secton III, the reason for the replacement s because MW1 has the smallest number of selected features and lowest Agreement level. Addtonally, MW1 has lower class entropy than MW4, and fewer selected features than MW4. At the end of W5, the decson was made to replace MW3 wth MW5, to obtan the ensemble {MW2, MW4, MW5}. At the end of W12, the decson was made to replace MW4 wth MW12, to obtan the ensemble {MW2, MW5, MW12}. The reasons for these replacements can be easly deduced from the values n Table VI and Rule 2. Model Name TABLE VI PROPERTIES OF THE BASE MODELS USED IN THE PERIODICALLY REVISED ENSEMBLE Agreement wth ensemble predcton n tme perod: Tme perod for tranng data Tranng set sze Class Entropy Relevant features W4 W5 W12 MW1 0K-30K 30, MW2 30K- 60K MW3 60K- 90K MW4 90K- 120K MW5 120K- 150K MW12 330K- 360K 30, , , , , Table VII provdes a comparson of the ntal ensemble and the perodcally revsed ensemble n terms of Certanty and Relablty. The results of columns 2 and 4 ndcate that Certanty s much hgher for the perodcally revsed ensemble, startng wth the frst revson n W5 (120K- 150K). Based on ths observaton, the answer to the queston: Do the proposed methods for ensemble model revsons result n mprovements n certan and relable predctons?, s yes. Addtonal analyss was conducted on the predcton results of the ensemble {MW2, MW5, MW12} for the W16 (450K-480K) predcton perod. Tables VIII and IX show the detals of the ncremental performance of the ensemble ISSN: (Prnt); ISSN: (Onlne)

6 Proceedngs of the World Congress on Engneerng 2017 Vol II, July 5-7, 2017, London, U.K. for mnor tme perods consstng of 3,000 nstances each. Results are shown for the frst sx mnor tme perods. The results show that (1) Certanty, Surrogate Accuracy and real Accuracy decrease steadly untl concept change occurs n the mnor tme perod 459K-462K, (2) WeakRelablty suddenly ncreases when concept change occurs, and (3) Agreement for one of the base models (MW2) suddenly decreases sgnfcantly when concept change occurs. So, to answer the queston: Are the Certanty, Relablty, WeakRelablty, NonRelablty measures good at forecastng possble concept change?, the answer s yes. The above observatons support the clam that the measures are good ndcators of concept change. TABLE VII COMPARISON OF INITIAL AND PERIODICALLY REVISED ENSEMBLES Predcton perod TABLE VIII INCREMENTAL PERFORMANCE FOR W16 PREDICTIONS: CERTAINTY AND RELIABILITY Predcton tme perod Intal ensemble performance Certanty Certanty Relablty Relablty VI. CONCLUSIONS Contnuously revsed ensemble performance Certanty Weakly Relably Relablty W W W6,..,W W W W W W W Non Relablty 450K-453K K-456K K-459K K-462K K-465K K-468K TABLE IX INCREMENTAL PERFORMANCE FOR W16 PREDICTIONS: ACCURACY AND AGREEMENT Accuracy Agreement Predcton tme perod Surrogate Accuracy MW2 MW5 MW12 450K-453K K-456K K-459K K-462K K-465K K-468K The obectves of the research reported n ths paper were to assess the usefulness of the proposed methods for automated selecton of hgh qualty tranng data for Naïve Bayes base model creaton and ensemble revson for predctve data stream mnng. The expermental results reported n Secton V have demonstrated that, for the KDD Cup 1999 dataset, the use of two base model predcton combnaton algorthms for the ensembles, results n a hgh level of confdence n the class labels for the automatcally selected tranng data. Secondly, the perodc revson of the ensemble usng the base models created from the selected tranng data produces revsed ensemble models wth hgh predctve performance. Thrdly, the proposed measure of Surrogate Accuracy provdes meanngful nformaton about ensemble model predctve performance. REFERENCES [1] C.C. Aggarwal, Data Streams: Models and Algorthms, Boston: Kluwer Academc Publshers, [2] J. Gao, W. Fan and J. Han, On approprate assumptons to mne data streams: analyss and practce, n Proc.7 th IEEE Int. Conf. on Data Mnng, IEEE Computer Socety, 2007, pp [3] M.M. Masud, Q. Chen, J. Gao, L. Khan, J. Han, and B. Thurasngham, Classfcaton and novel class detecton of data streams n a dynamc feature space, n Proc. ECML PKDD 2010, LNAI, Sprnger-Verlag, 2010, pp [4] J. Gao, W. Fan, J. Han and P.S. Yu, A general framework for mnng concept-drftng data streams wth skewed dstrbutons, n Proc. SDM Conference, [5] P.E.N. Lutu, Naïve Bayes classfcaton ensembles to support modelng decsons n data stream mnng, n Proc. IEEE SSCI 2015, Cape Town, South Afrca, pp [6] L. Kuncheva, Combnng Pattern Classfers: Methods and Algorthms, Hoboken, New Jersey: John Wley & sons [7] A. Bfet, G. Holmes, R. Krkby and B. Pfahrnger, Data Stream Mnng: a Practcal Approach, Unversty of Wakato, Avalable: treammnng.pdf [8] X. Zhu, P. Zhang, X. Ln and Y. Sh, Actve learnng from data streams, n Proc. 7 th IEEE Int. Conf. on Data Mnng, 2007, pp [9] P. Zhang, X. Zhu and L. Guo, Mnng data streams wth labeled and unlabeled tranng examples, n Proc. 9th IEEE Internatonal Conference on Data Mnng, 2009, pp [10] W.N. Street and Y. Km, A streamng ensemble algorthm (SEA) for large-scale classfcaton, n Proc. 7 th ACM Int. Conf. on Knowledge Dscovery and Data Mnng,ACM Press,New York,2001,pp [11] H. Wang, W. Fan, P.S. Yu and J. Han, Mnng concept-drftng data streams usng ensemble classfers, n Proc. ACM SIGKDD, Washngton DC, 2003, pp [12] J.Z. Kolter and M.A. Maloof, Dynamc weghted maorty: an ensemble method for drftng concepts, Journal of Machne Learnng Research, vol. 8, pp , [13] T. M. Mtchell, Machne Learnng, Burr Rdge, IL:WCB/McGraw- Hll, [14] P. Gudc, Appled Data Mnng: Statstcal Methods for Busness and Industry, Chchester: John Wley and Sons, [15] R. Munro and S. Chawla, An Integrated Approach to Mnng Data Streams, Techncal Report TR-548, School of Informaton Technologes, Unversty of Sydney, Avalable: [16] H. Lu and H. Motoda, Feature Selecton for Knowledge Dscovery and Data Mnng, Boston:Kluwer Academc Publshers, [17] R. Kohav, Scalng up the accuracy of Naïve Bayes classfers: a decson tree hybrd, n Proc. KDD 1996, pp [18] G.H. John, R. Kohav and K. Pleger, Irrelevant features and the subset selecton problem, n Proc. 11th Int. Conf. on Machne Learnng, 1994, pp [19] P.E.N. Lutu, Fast feature selecton for Naïve Bayes classfcaton n data stream mnng, n Proc. World Congress on Engneerng, London, U.K., July 2013, vol. III, pp [20] A.P. Whte and W.Z. Lu, Bas n nformaton-based measures n decson tree nducton, Machne Learnng, vol. 15, pp Boston : Kluwer Academc Publcatons, [21] S.D. Bay, D. Kbler, M.J. Pazzan and P. Smyth, The UCI KDD archve of large data sets for data mnng research and expermentaton, ACM SIGKDD, vol. 2, no. 2, pp , [22] S. W. Shn and C. H. Lee, Usng attack-specfc feature subsets for network ntruson detecton, n Proc. 19th Australan conference on Artfcal Intellgence, Hobart, Australa, ISSN: (Prnt); ISSN: (Onlne)

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions *

Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions * Yan et al. / J Zhejang Unv-Sc C (Comput & Electron) n press 1 Journal of Zhejang Unversty-SCIENCE C (Computers & Electroncs) ISSN 1869-1951 (Prnt); ISSN 1869-196X (Onlne) www.zju.edu.cn/jzus; www.sprngerlnk.com

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset Under-Samplng Approaches for Improvng Predcton of the Mnorty Class n an Imbalanced Dataset Show-Jane Yen and Yue-Sh Lee Department of Computer Scence and Informaton Engneerng, Mng Chuan Unversty 5 The-Mng

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Lazy Ensemble Learning Method to Classification

A Lazy Ensemble Learning Method to Classification IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 344 A Lazy Ensemble Learnng Method to Classfcaton Haleh Homayoun 1, Sattar Hashem 2 and Al

More information

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB V. Hotař, A. Hotař Techncal Unversty of Lberec, Department of Glass Producng Machnes and Robotcs, Department of Materal

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

An Improvement to Naive Bayes for Text Classification

An Improvement to Naive Bayes for Text Classification Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 2160 2164 Advancen Control Engneerngand Informaton Scence An Improvement to Nave Bayes for Text Classfcaton We Zhang a, Feng Gao a, a*

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method Internatonal Journal of Computatonal and Appled Mathematcs. ISSN 89-4966 Volume, Number (07), pp. 33-4 Research Inda Publcatons http://www.rpublcaton.com An Accurate Evaluaton of Integrals n Convex and

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

A Self-Learning Network Anomaly Detection System using Majority Voting

A Self-Learning Network Anomaly Detection System using Majority Voting A Self-Learnng Network Anomaly Detecton System usng Majorty Votng D.Hock and M.Kappes Department of Computer Scence and Engneerng, Unversty of Appled Scences, Frankfurt am Man, Germany e-mal: {dehock kappes}@fb2.fh-frankfurt.de

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

A Multivariate Analysis of Static Code Attributes for Defect Prediction

A Multivariate Analysis of Static Code Attributes for Defect Prediction Research Paper) A Multvarate Analyss of Statc Code Attrbutes for Defect Predcton Burak Turhan, Ayşe Bener Department of Computer Engneerng, Bogazc Unversty 3434, Bebek, Istanbul, Turkey {turhanb, bener}@boun.edu.tr

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Classifier Ensemble Design using Artificial Bee Colony based Feature Selection

Classifier Ensemble Design using Artificial Bee Colony based Feature Selection IJCSI Internatonal Journal of Computer Scence Issues, Vol. 9, Issue 3, No 2, May 2012 ISSN (Onlne): 1694-0814 www.ijcsi.org 522 Classfer Ensemble Desgn usng Artfcal Bee Colony based Feature Selecton Shunmugaprya

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Parameter estimation for incomplete bivariate longitudinal data in clinical trials Parameter estmaton for ncomplete bvarate longtudnal data n clncal trals Naum M. Khutoryansky Novo Nordsk Pharmaceutcals, Inc., Prnceton, NJ ABSTRACT Bvarate models are useful when analyzng longtudnal data

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS P.G. Demdov Yaroslavl State Unversty Anatoly Ntn, Vladmr Khryashchev, Olga Stepanova, Igor Kostern EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS Yaroslavl, 2015 Eye

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 2 Sofa 2016 Prnt ISSN: 1311-9702; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-2016-0017 Hybrdzaton of Expectaton-Maxmzaton

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract

More information

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems 2008 INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT A Smlarty-Based Prognostcs Approach for Remanng Useful Lfe Estmaton of Engneered Systems Tany Wang, Janbo Yu, Davd Segel, and Jay Lee

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp Lfe Tables (Tmes) Summary... 1 Data Input... 2 Analyss Summary... 3 Survval Functon... 5 Log Survval Functon... 6 Cumulatve Hazard Functon... 7 Percentles... 7 Group Comparsons... 8 Summary The Lfe Tables

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

CLASSIFICATION OF ULTRASONIC SIGNALS

CLASSIFICATION OF ULTRASONIC SIGNALS The 8 th Internatonal Conference of the Slovenan Socety for Non-Destructve Testng»Applcaton of Contemporary Non-Destructve Testng n Engneerng«September -3, 5, Portorož, Slovena, pp. 7-33 CLASSIFICATION

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Journal of Engineering Science and Technology Review 7 (3) (2014) Research Article

Journal of Engineering Science and Technology Review 7 (3) (2014) Research Article Jestr Journal of Engneerng cence and Technology Revew 7 (3) (2014) 151 157 Research Artcle JOURAL OF Engneerng cence and Technology Revew www.estr.org Traffc Classfcaton Method by Combnaton of Host Behavour

More information

Classification Methods

Classification Methods 1 Classfcaton Methods Ajun An York Unversty, Canada C INTRODUCTION Generally speakng, classfcaton s the acton of assgnng an object to a category accordng to the characterstcs of the object. In data mnng,

More information

Title: A Novel Protocol for Accuracy Assessment in Classification of Very High Resolution Images

Title: A Novel Protocol for Accuracy Assessment in Classification of Very High Resolution Images 2009 IEEE. Personal use of ths materal s permtted. Permsson from IEEE must be obtaned for all other uses, n any current or future meda, ncludng reprntng/republshng ths materal for advertsng or promotonal

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Learning-based License Plate Detection on Edge Features

Learning-based License Plate Detection on Edge Features Learnng-based Lcense Plate Detecton on Edge Features Wng Teng Ho, Woo Hen Yap, Yong Haur Tay Computer Vson and Intellgent Systems (CVIS) Group Unverst Tunku Abdul Rahman, Malaysa wngteng_h@yahoo.com, woohen@yahoo.com,

More information

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 www.ijcsi.org 374 An Evolvable Clusterng Based Algorthm to Learn Dstance Functon for Supervsed

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Dynamic Integration of Regression Models

Dynamic Integration of Regression Models Dynamc Integraton of Regresson Models Nall Rooney 1, Davd Patterson 1, Sarab Anand 1, Alexey Tsymbal 2 1 NIKEL, Faculty of Engneerng,16J27 Unversty Of Ulster at Jordanstown Newtonabbey, BT37 OQB, Unted

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

A classification scheme for applications with ambiguous data

A classification scheme for applications with ambiguous data A classfcaton scheme for applcatons wth ambguous data Thomas P. Trappenberg Centre for Cogntve Neuroscence Department of Psychology Unversty of Oxford Oxford OX1 3UD, England Thomas.Trappenberg@psy.ox.ac.uk

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

D. Barbuzzi* and G. Pirlo

D. Barbuzzi* and G. Pirlo Int. J. Sgnal and Imagng Systems Engneerng, Vol. 7, No. 4, 2014 245 bout retranng rule n mult-expert ntellgent system for sem-supervsed learnng usng SVM classfers D. Barbuzz* and G. Prlo Department of

More information

Fast Sparse Gaussian Processes Learning for Man-Made Structure Classification

Fast Sparse Gaussian Processes Learning for Man-Made Structure Classification Fast Sparse Gaussan Processes Learnng for Man-Made Structure Classfcaton Hang Zhou Insttute for Vson Systems Engneerng, Dept Elec. & Comp. Syst. Eng. PO Box 35, Monash Unversty, Clayton, VIC 3800, Australa

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information