Decision Support Systems

Size: px
Start display at page:

Download "Decision Support Systems"

Transcription

1 Decisio Support Systems 50 (010) Cotets lists available at ScieceDirect Decisio Support Systems joural homepage: The data complexity idex to costruct a efficiet cross-validatio method Der-Chiag Li a,, Yao-Hwei Fag b, Y.M. Frak Fag c a Departmet of Idustrial ad Iformatio Maagemet Natioal Cheg Kug Uiversity, Taiwa b Divisio of Biostatistics ad Bioiformatics, Natioal Health Research Istitutes, Taiwa c Geographic Iformatio System Research Ceter, Feg Chia Uiversity, Taiwa article ifo abstract Article history: Received 0 Jauary 009 Received i revised form 31 March 010 Accepted 1 July 010 Available olie 3 July 010 Keywords: Biary classificatio problem Cross-validatio Data complexity Cross-validatio is a widely used model evaluatio method i data miig applicatios. However, it usually takes a lot of effort to determie the appropriate parameter values, such as traiig data size ad the umber of experimet rus, to implemet a validated evaluatio. This study develops a efficiet cross-validatio method called Complexity-based Efficiet (CBE) cross-validatio for biary classificatio problems. CBE cross-validatio establishes a complexity idex, called the CBE idex, by explorig the geometric structure ad oise of data. The CBE idex is used to calculate the optimal traiig data size ad the umber of experimet rus to reduce model evaluatio time whe dealig with computatioally expesive classificatio data sets. A simulated ad three real data sets are employed to validate the performace of the proposed method i the study, while the validatio methods compared are repeated radom subsamplig validatio ad K-fold cross-validatio. The results show that CBE cross-validatio, repeated radom sub-samplig validatio ad K-fold cross-validatio have similar validatio performace, except that the traiig time required for CBE cross-validatio is ideed lower tha that for the other two methods. 010 Elsevier B.V. All rights reserved. 1. Itroductio I data miig applicatios, researchers geerally use crossvalidatio to evaluate the leared classificatio model [11]. However, this usually requires cosiderable computatioal costs. With K-fold cross-validatio, for example, the umber of experimet rus must icrease whe parameter K icreases, makig the traiig computatioally expesive [1]. Specifically, ((K 1)/K)% traiig data are theoretically eeded for learig a classificatio model, ad whe the data size is very large, ((K 1)/K)% traiig data makes computatio expesive [1]. I aother commo sceario, repeated radom sub-samplig validatio is usually repeated 30 or 50 times for model evaluatio [3]. However, if the data structure is simple or uiform, the umber of times sub-samplig validatio is repeated is much more tha what is eeded, ad thus the procedure is iefficiet. Our research develops a effective cross-validatio procedure, called Complexity-based Efficiet (CBE) cross-validatio, for biary classificatio problems. The CBE cross-validatio method ca be used to calculate the optimal traiig data size ad the umber of experimet rus to reduce model validatio time. The CBE crossvalidatio procedure systematically establishes a o-liear data complexity idex (defied i Sectio 3) called CBE idex by explorig the geometric structure ad oise of data. Correspodig author. Tel.: x addresses: lidc@mail.cku.edu.tw (D.-C. Li), yhfag@hri.org.tw (Y.-H. Fag), frakfag@gis.tw (Y.M.F. Fag). The desity-based clusterig algorithm (DBSCAN) is used to discover the geometric structure ad oise, while the betweedistace ad withi-distace of the clusters foud are used as the factors of the CBE idex. Based o this, this research develops a efficiet CBE cross-validatio procedure to calculate the optimal traiig data size ad umber of experimet rus. The rest of this paper is orgaized as follows: The literature review is give i Sectio while the detailed procedure of the proposed method is described i Sectio 3. Oe simulated ad three real data sets are used to illustrate the CBE cross-validatio model i Sectio 4, ad Sectio 5 cotais the coclusio ad discussio of our research.. Literature review I this sectio we review the cocept of liear data complexity (the defiitio is explaied i Sectio 3), the geometric structure ad oise of data, ad existig cross-validatio methods..1. Liear data complexity For liear data complexity, the idex used to detect the level of data complexity is Fisher's discrimiat ratio f [1,10]: ð f ¼ μ 1 μ Þ σ1 þσ where μ 1,μ,σ 1,ad σ are the meas ad variaces of the two classes i a data set, respectively. f is specific for oe feature dimesio case. ð1þ /$ see frot matter 010 Elsevier B.V. All rights reserved. doi: /j.dss

2 94 D.-C. Li et al. / Decisio Support Systems 50 (010) For a multidimesioal problem, the maximum f over all the feature dimesios is used to describe the problem. For problems with multidimesioal features, Li ad Fag proposed a Purity Level (PL) to measure liear data complexity [15]. The parameters of the idex are defied as follows: : the umber of data poits. k: the umber of dimesios of the data (k ). A ij +,A ij : the value of the j-th dimesio of the i-th data poit i the positive ad egative classes, respectively. Ā j +,Ā j : the average value of the j-th dimesio of the data i the positive ad egative classes, respectively. A j max,a j mi : the maximum ad the miimum values of the j-th dimesio, respectively. Usig the parameters listed above, the Purity Level is set as: 0vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k A þ! ij A j u j =1 k A 1 ij Aþ j u j =1 A t j max A j mi t A j max A jmi i =1 + B k 1 k 1 A Purity Level = 0vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k A þ ij A þ! j u j =1 k A ij A 1 j u j =1 t A j max A j mi t A jmax A jmi i =1 + B k 1 k 1 A where the umerator is the sum of the betwee-class distace of the whole data set, ad the deomiator is the sum of the withi-class distace of the whole data set. The results show that the smaller the PL value, the higher the liear data complexity, ad vice versa. However, either Fisher's discrimiat ratio or PL cosiders the geometric structure ad oise of data... The cocept of geometric structure ad oise of data Rubiov [1] discussed the relatioship betwee classes ad clusters i data sets, ad examied the distributio of classes withi the obtaied clusters. He foud that some characteristics lik data poits more strogly tha the classes they belog to. We thus believe that the geometric structure of data is a essetial characteristic for classifyig data sets. I a study o the effect of oise i data processig, Lee et al. [14] combied the fuzzy adaptive resoace theory ad the geeral regressio eural etwork ito a hybrid model, which assisted the removal of oise embedded i traiig data i order to improve the classificatio ability. Ha et al. [9] proposed a revised Expectatio- Maximizatio (EM) algorithm to discover ad remove oise to improve the oe-agaist-the-rest method i biary text classificatio. Cao et al. [] proposed a data preprocessig method for traiig data to remove oise or outliers, ad used the remaiig data to obtai the decisio fuctio. However, the drawback of this method is that it is difficult to remove oise ad outliers without the assistace of problem domai kowledge..3. Commo types of cross-validatio method Cross-validatio is a model evaluatio method that is better tha residual aalysis. The weakess of residual evaluatio is that it does ot give a idicatio of how well the learer will do whe it is used to make predictios for usee data. Oe way to overcome this problem is to leave out part of the data poits from the data set whe traiig a classifier, So that whe traiig is fiished the removed data are used to test the performace of the model. This is the basic idea for the model evaluatio method called cross-validatio [4]. ðþ Two widely used such methods, repeated radom sub-samplig validatio ad K-fold cross-validatio, are described below Repeated radom sub-samplig validatio This method radomly splits a data set ito traiig ad validatio data sets ad the repeats this procedure several times. For each split, the classifier is traied with the traiig data ad validated with the validatio data. The results from each split ca be averaged. This method is usually applied i small sample learig cases that use a small amout of traiig data to lear the model ad large amout of validatio data to validate it [16,17]..3.. K-fold cross-validatio I K-fold cross-validatio, the origial sample is partitioed ito K partitios. A partitio is the used as the validatio data for testig the model, ad the remaiig K 1 partitios are used as the traiig data. The cross-validatio process is the repeated K times, with each of the K partitios used as the validatio data exactly oce. The K results from the folds ca be averaged to produce a sigle estimatio [4]. The advatage of this method over the repeated radom subsamplig validatio method is that all observatios are used for both traiig ad validatio, ad each observatio is used for validatio exactly oce. 10-fold cross-validatio is commoly used by researchers. 3. Proposed method With biary classificatio problems, data complexity is defied as the level of complexity for separatig data ito classes. Whe the data complexity is high this meas it is hard to classify. Complexities ca be subdivided ito liear ad o-liear cases: liear data complexity meas a complex level for separatig the data usig a liear hyperplae; while o-liear data complexity meas a complex level for separatig the data usig a o-liear hyperplae. Takig the XOR problem as a example, we usually use a o-liear hyperplae to separate the data rather tha a liear oe. This research focuses o fidig a effective way to classify data by calculatig the o-liear data complexity for high dimesioal classificatio problems. We develop the CBE idex by improvig the Purity Level (PL) method [15], ad cosider the geometric structure ad oise of data to precisely measure the level of o-liear separability. We the use the CBE idex to form a sample size determiatio method to develop a efficiet CBE cross-validatio method to improve computatioal efficiecy. The proposed Complexity-based Efficiet (CBE) idex is described i detail i subsectio 3.1, ad the proposed CBE cross-validatio is described i subsectio CBE idex Research o patter recogitio suffers from the ucertaity cocerig the match betwee kowledge ad a problem due to the strog depedece of classifyig performace o available data. I other words, the accuracy of a classifier is highly depedet o the data characteristics [10]. Ufortuately, this ucertaity ofte remais because of a lack of uderstadig of the full data characteristics [1], ad this situatio also occurs i model validatio. Therefore, i this work we cosider more descriptors, such as the geometric structure ad oise of data, to further uderstad the data characteristics with the goal of improvig validatio efficiecy. The CBE idex relies heavily o the realizatio of the data's geometric structure, because, i our experiece, whe the ceter of the data belogig to a class is ot located i the data cluster (such as with the XOR problem i Fig. 1), it is ot reasoable to use a liear idex, such as a F-test statistic or purity level, to measure the data complexity. We thus develop the o-liear CBE idex to fid multiple ceters accordig to the geometric structures of data. I

3 D.-C. Li et al. / Decisio Support Systems 50 (010) Table 1 The pseudo code of the DBSCAN algorithm. Fig. 1. The structure of a XOR problem. that we calculate the ceters of data clusters ad let the ceters be located i the data. Note that the liear idex cocept is a special case of the o-liear oe whe it has oly oe cluster i each class. To discover the geometric structure ad oise of data, researchers usually rely o prior kowledge, although this is experiece orieted ad icoclusive []. This research thus proposes a o-liear data complexity idex, the CBE idex, to systematically reflect the geometric structure ad oise of data precisely. This study uses the desity-based clusterig (DBSCAN) algorithm to discover the geometric structure ad oise of data to fid the complexity level to separate data ito classes, as explaied below DBSCAN algorithm DBSCAN is a clusterig algorithm suitable for a data set with a large amout of data with high dimesioality [7]. DBSCAN gathers together high desity data as clusters ad the shape of each cluster are arbitrary. The algorithm fids the clusters ad the deletes data that does ot belog to ay of them. It searches for clusters by checkig the surroudigs of each data poit withi a scope called the ε-eighborhood. If the ε-eighborhood of a data poit cotais other data which has a data size that is more tha a certai pre-defied umber (MiPts), a cluster with this data (called the core object) is created; otherwise, the data is treated as oise which will be evetually deleted. DBSCAN iteratively collects directly desityreachable data (data withi the ε-eighborhood of a core object) util o ew data ca be added to ay cluster, ad this may ivolve mergig some clusters. We apply the DBSCAN algorithm to each class to detect the geometric structure ad oise of data i biary classificatio. Table 1 shows the DASCAN algorithm pseudo code. Cosider the radius of a default ε, obtaied by cosiderig the fractio of objects to be selected ðk = mþ ad the volume V [6]. We exted this cocept to biary classificatio ad suppose that is the dimesio of the data, k is the umber of MiPts, Γ is the gamma fuctio, m + ad m are the amouts of data ithepositive ad egative classes, repectively, ad V þ = j rage x þ j ad V = j rage x j forj = 1; ::; k are the data rages i the positive ad egative classes, respectively. The followig are the formula sets for ε +, ad ε for positive ad egative classes, respectively: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k = m ε þ = þ Vþ Γðk = +1Þ pffiffiffiffiffi π sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðk = m ε = ÞV Γðk = +1Þ pffiffiffiffiffi π ð3þ ð4þ Daszykowski et al. proposed a default MiPts calculatio formula [5]. We exted this formula to biary classificatio ad defie: MiPts þ = iteger m þ ; for a positiveclass ð5þ 5 MiPts = iteger m ; for a egative class ð6þ 5 For a data set with umerous data poits of positive ad egative classes (m + or m ), we suggest that MiPts + or MiPts be equal to The calculatio of the CBE idex This research uses the CBE idex to depict the level of o-liear data complexity. The CBE idex of biary classificatio ca be regarded as the relative distace of clusters discovered by the DBSCAN algorithm for each class, ad it is foud as follows: Let X¼fX 1 ;:::;X N g be a data set that icludes positive samples þ X ¼ þ X 1 ;:::; þ X þ gad egative samples X ¼ X 1 ;:::; X g, where + + =N. Let þ C = þ C 1 ;:::; þ Cj þc j g be a set that cosists of þ C j positive clusters, C = C 1 ;:::; Cj C j g be a set cosistig of C j egative clusters, d X i ;X j be the distace betwee Xi ad X j,ad þ C i = þ X i 1 ;:::; þ Xi þ ig m be the i-th positive cluster, where + m i is the umber of positive samples i the i-th cluster, ad i =1;:::; þ C j. Similarly, let C i = X i 1 ;:::; g Xi be the i-th egative cluster, mi where m i is the umber of egative samples i i-th cluster, ad i =1;:::; C j.wefirst calculate the miimum average distace betwee a pair of clusters which belog to differet classes as Mi_Bet: þm k m l d ðþ X k i ; X l g >< j Þ i =1 j =1 Mi Bet ¼ Mi ð7þ k=1; ; þ C j þm k m l l=1; ; j C j>: A large value of Mi_Bet idicates that the data are widely scattered ad easy to classify.

4 96 D.-C. Li et al. / Decisio Support Systems 50 (010) We the calculate the average distace withi all clusters of the positive class as: Withi þ = j þ C j þm k i =1 m k j =1 d ð þx k i ; X l j Þ k=1 þm k ð þ m k 1Þ ad for all clusters of the egative class as: Withi ¼ þm k j þ C j i =1 k =1 m k d ð X k i ; X l j Þ i =1 m k ð m k 1Þ If the value of the average distace withi all clusters of a class ðwithi þ ad Withi Þ is small, it meas that these clusters cogregate with each other. The calculatio of the CBE idex is defied as follows: CBE idex ¼ MiBet Withi þ + Withi þ C j + j C j j The determiatio of the CBE idex takes three steps: ðþ ð9þ ð10þ Step 1: Normalize the data For differet uits of dimesios, the data is ormalized before calculatig the CBE idex. Step : Discover the geometric structure ad oise of data Use the DBSCAN algorithm i the biary classes with the suggested parameter settigs: ε +,ε, MiPts +,ad MiPts, to detect the geometric structure ad remove the data oise. Step 3: Calculate the CBE idex Calculate Mi_Bet, Withi +, ad Withi to obtai the CBE idex. The CBE idex has the followig properties: (1) 0 CBE idexb. () The smaller the CBE idex is, the higher the data complexity is. (3) The larger the CBE idex is, the lower the data complexity is. 3.. CBE cross-validatio method We apply the CBE idex to develop the CBE cross-validatio method, where we first radomly select a certai small proportio (for example, 5%) of samples as the traiig data ad calculate the CBE idex. This process is repeated 30 times to calculate the averages X CBE ad the stadard deviatios S CBE. I order to achieve a stable CBE idex for the optimal traiig data size N, this process is iterated while icreasig the proportio of the traiig data ad checkig the differece of X CBE as: Whe X % CBE b0:01; THEN N = Maxfo:of % samples; o: of 10% samplesg data size CBE X +1% ð11þ Whe the differece decreases by a level smaller tha 0.01, we cosider the structure of the traiig data to be stable, ad use this traiig data size as the optimal oe. Where 0.01 is oly a empirical suggestio ad 10% is also a empirical save low sample size limit. For the umber of experimet rus, we repeat the process 30 times to calculate the average ad stadard deviatio of CBE. Note that the sample distributio of the CBE idex will coverge to a ormal distributio accordig to the Cetral Limit Theorem (CLT) [3], ad the optimal traiig data size (average X % CBE ad stadard deviatios S % CBE ) is used to calculate the umber of experimet rus. The umber of experimet rus K is determied as: S CALCULATE Z % α= CBE = k; THEN ð0:05 X CBE % Þ Max fk; 5g =K ð1þ where α is the sigificace level, Z α / is the value with α/% i the tail of the cumulative stadard Normal distributio, ad 0:05 X % CBE is set as the desired margi of error, where 5 is agai our suggestio. 4. Experimet I this sectio, we use oe simulated ad three real data sets to verify the performace of the Complexity-based Efficiet (CBE) crossvalidatio method. I the simulatio experimets, a support vector machie (SVM) [1], a Back-propagatio Network (BPN) [,0], ad a Naive Bayes Classifier (NBC) [4] are used as the classificatio tools, while i the three real data sets, oly SVM is used. To fid the relatioship betwee CBE idex ad classificatio accuracy, we radomly select 10% of the total samples ad calculate the CBE idex with the suggested ε +,ε,mipt +,ad MiPt i Sectio 3 to measure the relatioship for all data sets. This process is repeated 10 times, where SVM, BPN, ad NBC are used as the classifiers with the resubstitutio method (all available data are used for traiig ad testig) [13]. To implemet the CBE cross-validatio, we radomly select a small proportio of the data as the traiig set (such as 5%), ad calculate the CBE idexes. This procedure is repeated 30 times. The traiig data size is gradually icreased, where we calculate the average ad the stadard deviatio of the CBE idex i order to fid the optimal traiig data size ad the umber of experimet rus Simulated data experimets This research uses the Parametric Equatio of a Hypersphere [16], briefly itroduced below, to geerate simulated data. The -hypersphere (ofte simply called the -sphere) is a geeralizatio of a object with dimesios i R (the circle ad sphere are called the two-sphere ad three-sphere, respectively). The -sphere cetered at the origi ca therefore be defied as a set of poits ðx 1 ; x ; ; x k Þ such that: x 1 + x + + x = r ð13þ Table The CBE idex ad classificatio accuracies of the three classifiers for the simulated data sets with 1% oise where Average o. of oise samples foud meas for the average umber of oise foud, Average o. of clusters foud meas for the average umber of cluster foud by usig DBSCAN algorithm. CBE idex Average o. of oisy samples foud Average o. of clusters foud (pos., eg.) (, ) (, ) (, ) (, ) (, ) (, ) (, ) (, ) (, ) (, ) Accuracy of SVM Accuracy of BPN Accuracy of NBC

5 D.-C. Li et al. / Decisio Support Systems 50 (010) The hypersphere ca be specified i a parametric equatios as: x 1 = r siθ 1 siθ siθ 1 x = r siθ 1 siθ cosθ 1 >< x 3 = r si θ 1 siθ cosθ x 4 = r siθ 1 siθ cosθ 3 ð14þ >: x 1 = r siθ 1 cosθ x = r cosθ 1 where r is the radius ad θ 1 ; θ ; ; θ 1 ½0; πš are the agles of the hypersphere. The formula of parametric equatios is ot uique, but must satisfy the idetity x 1 +x + +x =1. We cosider the two-cluster coditio i each class ad isert oise ito the data. We geerate 0 five-dimesio data (404 positive ad 404 egative samples) followig the Parametric Equatio of a Hypersphere [16]. I the positive class, the data is geerated ito two clusters. Oe is: x 1 = 0:7 + siθ 1 siθ siθ 4 >< x = 0:7 + siθ 1 siθ cosθ 4 x 3 = 0:7 + siθ 1 si θ cosθ 3 ; 0 θ π ð15þ x 4 = 0:7 + siθ 1 siθ cosθ >: x 5 = 0:7 + cosθ 1 ad the other is: x 1 = 0:7 + siθ 1 siθ siθ 4 >< x = 0:7 + siθ 1 siθ cosθ 4 x 3 = 0:7 + siθ 1 siθ cos θ 3 x 4 = 0:7 + siθ 1 siθ cos θ >: x 5 = 0:7 + cosθ 1 ; 0 θ π ð16þ I the egative class, the data is geerated ito two clusters too. Oe is x 1 = 0:7 + siθ 1 siθ siθ 4 >< x = 0:7 + siθ 1 siθ cosθ 4 x 3 = 0:7 + siθ 1 siθ cos θ 3 ; 0 θ π ð17þ x 4 = 0:7 + siθ 1 siθ cos θ >: x 5 = 0:7 + cosθ 1 ad the other is: x 1 = 0:7 + siθ 1 siθ siθ 4 >< x = 0:7 + siθ 1 siθ cosθ 4 x 3 = 0:7 + siθ 1 siθ cos θ 3 x 4 = 0:7 + siθ 1 siθ cos θ >: x 5 = 0:7 + cosθ 1 ; 0 θ π ð1þ We the add 1% oise to each class by radomly selectig 4 samples to chage class label. Table ad Fig. show the results of usig the CBE idex with the simulated data sets. Table 3 The averages ad stadard deviatios (SDs) of CBE idexes with icreasig size of the traiig data sets for the simulated data set. (Bold value meas the optimal data size). Traiig data 40 (5%) 1 (10%) 11 (15%) 161 (0%) 0 (5%) Average SD Traiig data 4 (30%) 3 (35%) 91 (36%) 99 (37%) 307 (3%) Average SD From the table ad figure above we ca see that whe the value of CBE icreases, the classificatio accuracies of SVM, BPN, ad NBC also rise. There is thus a highly positive correlatio betwee the CBE idex ad classificatio accuracy for the simulated data sets. To fid the optimal traiig data size, we calculate various CBE idexes by icreasig the traiig set size. Table 3 ad Fig. 3 show the results of usig the CBE idex with various simulated data sets. WHEN X 37% CBEb0:01; THEN Maxfo: of 37% samples; o:of 10% samplesg data size = 37% 0 = 99 CBE X 3% ð19þ We determie that the optimal traiig data size is 99 whe CBE X decreases by less tha 0.01, ad cosider that the geometric structure of the optimal traiig data is stable. To fid the optimal umber of experimet rus for the simulated data set. We use the optimal sample size to measure the optimal experimet rus as: 0:13 Z α= CALCULATE =6:46; THEN ð0:05 :19Þ Maxf6:456; 5g =6 ð0þ I the simulated data set, with a sigificace level α=0.05 ad a margi of error of , the optimal umber of traiig data is 99, ad the optimal umber of experimet rus is six. We use repeated radom sub-samplig validatio (with 533 (66%) traiig data poits, 75 (34%) testig data poits, experimet repeated 30 times) to validate that our CBE cross-validatio (with 99 (37%) traiig data poits, 509 (63%) testig data poits, experimet repeated six times) is efficiet. The average ad stadard deviatios of the SVM with the repeated radom sub-samplig validatio are 7.36 ad 1.044, respectively; ad of the CBE cross-validatio are ad The performaces of the two cross-validatio methods thus have isigificat differeces (the P-value is 0.15, Fig.. The relatioship betwee classificatio accuracy ad the CBE idex with 1% oise. Fig. 3. Relatioship betwee traiig size ad the CBE idex with the simulated data set.

6 9 D.-C. Li et al. / Decisio Support Systems 50 (010) Table 4 Properties of the three data sets. Data set No. of dimesios No. of samples No. of classes Pima Idias diabetes 76 Haberma's survival Australia credit approval usig the idepedet t-test). The average traiig time of the repeated radom sub-samplig validatio is =6.7 s, ad that of the CBE cross-validatio is =3.1 s. We also use five-fold cross-validatio to validate that our CBE cross-validatio is efficiet. The average ad stadard deviatios of the SVM with five-fold crossvalidatio are ad 1.141, respectively. The performaces of the cross-validatio methods have isigificat differeces (the P-value is 0.13, usig the idepedet t-test) ad the average traiig time of the five-fold cross-validatio is =5.94 s. I additio, whe we use 10% of the total data (the lower boud of the traiig data size) as the traiig data, ad five experimets rus (the lower boud of the experimet rus), the average ad stadard deviatios of SVM are ad.563, with a sigificat differece (lower) compared to CBE cross-validatio (the P-valuebb0.01, usig the idepedet t-test). The average traiig data is 0.3 5=1.6 s. Sice validatio effectiveess is the bssic cocer of researchers, the CBE cross-validatio is thus cosidered to be better tha the crossvalidatio usig the lower boud of the traiig data size ad experimet rus, ad so it is a efficiet ad effective method. 4.. Real data experimet This research uses two medical data sets, Pima Idias Diabetes ad ad Haberma's Survival, ad oe busiess data set, Australia Credit Approval, i the experimet. The Pima Idias diabetes data set cosists of 76 data with eight umeric dimesios (attributes), ad it is a two-class data set with target values deoted by 0 ad 1. The class value 1 meas tested positive for diabetes, ad the class value 0 meas tested egative. The Haberma's Survival data set cosists of 306 data with three umeric dimesios, ad it is a two-class data set to record the survival status for breast cacer patiets. The Australia Credit Approval data set cosists of 690 data with 14 dimesios that iclude six umerical ad eight categorical data, ad it is a two-class data set. Table 4 shows the summary of the sample characteristics of the three data sets, which are all dowloaded from the UCI repository, available at The results of the experimet for the three data sets are show i the followig subsectio The Pima data set The relatioship betwee the CBE idexes ad classificatio accuracies is show i Table 5 ad Fig. 4. From the table ad figure above we ca see that whe the value of CBE decreases, the classificatio accuracy of the SVM also falls. There is thus a highly positive correlatio betwee the CBE idex ad classificatio accuracy for the Pima data set. Table 6 ad Fig. 5 show the experimetal results of CBE cross-validatio for the Pima data set. WHEN X 13% CBE X14% CBE b 0:01; THEN Maxfo: of 13% samples; o: of 10% samplesg data size ¼ 13% 76 ¼ 100 ð1þ Fig. 4. Relatioship betwee CBE idexes ad accuracies with the Pima data set (correlatio coefficiet=0.773). Table 6 The averages ad stadard deviatios (SDs) of CBE idexes with icreasig size of the traiig data sets for the Pima data set. (Bold value meas the optimal data size). Traiig data 3 (5%) 46 (6%) 54 (7%) 61 (%) 70 (9%) Average SD Traiig data 77 (10%) 4 (11%) 9 (1%) 100 (13%) 10 (14%) Average SD We determie this size as the optimal traiig data size to be 100, ad thus cosider that the geometric structure of the optimal traiig data is stable. We use the optimal sample size to calculate the optimal umber of experimet rus with the Pima data set as: 0:06 Z α= CALCULATE =0:15; THEN ð0:05 1:1Þ Maxf0:15; 5g =5 ðþ where α=0.05 is the sigificace level, ad ð0:05 1:1Þ¼ 0:0564 is the desired margi of error. We thus determie that the optimal umber of experimet rus to be five. We the use repeated radom sub-samplig validatio (with 507 (66%) traiig data poits, 61 (34%) testig data poits, experimet repeated 30 times) to validate that our CBE cross-validatio (with 100 (13%) traiig data poits, 66 (7) testig data poits, experimet repeated five times) is efficiet. The average ad stadard deviatios of the SVM with the repeated radom sub-samplig validatio are ad 1.743, respectively; ad of the CBE cross-validatio are ad.044. The performaces of the two cross-validatio methods have isigificat differeces (the P-value is 0.043, usig the idepedet t-test). The average traiig time of the repeated radom sub-samplig validatio is =7.3 s ad the average traiig time of the CBE cross-validatio is 1.9 5=6. s. We also use five-fold cross-validatio to validate that our CBE cross-validatio is efficiet. The average ad stadard deviatios of the SVM with the five-fold cross-validatio are 75.4 ad 1.74, respectively. The performaces of the two cross-validatio methods have isigificat differeces (the P-value is 0.05, usig the idepedet t-test). The average traiig time of the five-fold crossvalidatio is =15. s. I additio, whe we use 10% of the total data (the lower boud of the traiig data size) as the traiig data with five experimet rus Table 5 Pima data set with 77 selected samples as the traiig data (default MiPt=3). Accuracy CBE idex

7 D.-C. Li et al. / Decisio Support Systems 50 (010) Fig. 5. Relatioship betwee traiig size ad CBE idex with the Pima data set. (the lower boud of the experimet rus), the average ad stadard deviatios of SVM are ad.94, ad it has sigificat differeces with CBE cross-validatio (the P-value=0.055, usig the idepedet t-test). The average traiig data is =.4 s. The CBE crossvalidatio is better tha the cross-validatio usig the lower bouds of the traiig data size ad experimet rus. Therefore, CBE crossvalidatio is cosidered a efficiet ad effective method The Haberma data set The relatioship betwee the CBE idexes ad classificatio accuracies is show i Table 7 ad Fig. 6. From the table ad figure above we ca see that whe the value of CBE decreases, the classificatio accuracy of the SVM also falls. There is thus a highly positive correlatio betwee the CBE idex ad classificatio accuracy for this data set. Table ad Fig. 7 show the results of CBE cross-validatio for the Haberma data set. WHENX 33% CBEb0:01; THEN Maxfo:of 33% samples; o:of10% samplesg data size = 33% 306 =101 CBE X 34% ð3þ Usig the above equatio, we determie the optimal traiig data size to be 101. With that, we cosider the geometric structure of the optimal traiig data is stable. By a similar procedure, the optimal umber of experimet rus is: 0:019 Z α= CALCULATE =9:973; THEN ð0:05 1:13Þ Maxf9:973; 5g 10 ð4þ where α=0.05 is the sigificace level, ad ð0:05 1:13Þ¼ 0:0566 is the desired margi of error. We thus determie the optimal umber of experimet rus to be 10. We the use repeated radom sub-samplig validatio (with 04 (66%) traiig data poits, 104 (34%) testig data poits, experimet repeated 30 times) to validate that our CBE cross-validatio (with 101 (33%) traiig data poits, 05 (67%) testig data poits, experimet repeated 10 times) is efficiet. The average ad stadard deviatios of the SVM with the repeated radom sub-samplig validatio are ad 3.19, respectively, ad the average ad stadard deviatios of the SVM with the CBE cross-validatio are ad Fig. 6. Relatioship betwee CBE idexes ad accuracies with the Haberma data set (correlatio coefficiet=0.7)..04. The performaces of the two cross-validatios have isigificat differeces (the P-value is 0.379, usig the idepedet t-test). The average traiig time of the repeated radom sub-samplig validatio is =9.9 s, while that of the CBE cross-validatio is =.3 s. We the use 10-fold cross-validatio to validate that our CBE cross-validatio is efficiet. The average ad stadard deviatios of SVM with the five-fold cross-validatio are ad.16, respectively. The performaces of the two cross-validatio methods have isigificat differeces (the P-value is 0.075, usig the idepedet t-test). The average traiig time of the 10-fold crossvalidatio is =5.1 s. I additio, whe we use 10% of the total data (the lower boud of the traiig data size) as the traiig data with five experimet rus, the average ad stadard deviatios of the SVM are ad 3.641, ad it has sigificat differeces with the CBE cross-validatio (the P- valuebb0.01, usig the idepedet t-test). The average traiig data is 0.1 5=0.9 s. By cosiderig validatio effectiveess, the CBE cross-validatio is thus agai cosidered better tha the crossvalidatio usig the lower bouds of traiig data size ad experimet rus. Therefore, CBE cross-validatio is a efficiet ad effective method The Australia credit approval First, for umerical idepedet variables aalysis, we delete the categorical idepedet variables X 1 ; X 4 ; X ; X 9 ; X 11 ; adx 1 ad delete the data that have missig value. The relatioship betwee the CBE idexes ad classificatio accuracies is show i Table 9 ad Fig.. From the table ad figure above we ca see a highly positive correlatio betwee the CBE idex ad classificatio accuracy. Table 7 Haberma data set with 31 samples selected as the traiig data (Default MiPt=). Accuracy CBE idex

8 100 D.-C. Li et al. / Decisio Support Systems 50 (010) Table The averages ad stadard deviatios (SD) of CBE idexes with icreasig the size of the traiig data set for the Haberma data set. (Bold value meas the optimal data size). Traiig data 4 (14%) 5 (17%) 61 (0%) 70 (3%) 73 (4%) 77 (5%) 0 (6%) 3 (7%) Average SD Traiig data 6 (%) 9 (9%) 9 (30%) 95 (31%) 97 (3%) 101 (33%) 104 (34%) Average SD Fig. 7. Relatioship betwee traiig size ad the CBE idex with the Haberma data set. Table 9 Australia data set with 1,90 samples selected as traiig data (Default MiPt=3). Accuracy CBE idex Table 10 ad Fig. 9 show the results of CBE cross-validatio for the Australia data set. WHENX 4% CBEb0:01; THEN Maxfo:of 37% samples; o:of 10% samplesg data size = 4% 690 = 90 CBE X 43% ð5þ By a similar procedure, we determie the optimal umber of traiig data poits to be 90, ad measure the optimal umber of experimet rus as: 0:03 Z α= CALCULATE =:17; THEN ð0:05 :31Þ Maxf:17; 5g =5 ð6þ where α=0.05 is the sigificace level, ad ð0:05 :31Þ =0:1116 is the desired margi of error. We determie the optimal umber of experimet rus to be five. Agai, whe we use repeated radom sub-samplig validatio (with 455 (66%) traiig data poits, 35 (34%) testig data poits, experimet repeated 30 times) to validate that our CBE crossvalidatio (with 90 (4%) traiig data poits, 400 (5%) testig data poits, experimet repeated 5 times) is efficiet. The average ad stadard deviatios of the SVM with the repeated radom subsamplig validatio are ad 1.30, respectively, ad the average ad stadard deviatios of the SVM with the CBE cross-validatio are ad The performaces of the two cross-validatios have isigificat differeces (the P-value is 0.305, usig the idepedet t-test). The average traiig time of the repeated radom subsamplig validatio is =54.9 s, ad that of the CBE crossvalidatio is 1.4 5=9. s. Whe we use five-fold cross-validatio to validate CBE crossvalidatio, the average ad stadard deviatios of SVM with the fivefold cross-validatio are 79. ad 1.351, respectively. Thus, the performace of the two cross-validatio methods has isigificat differeces (the P-value is 0.333, usig the idepedet t-test). The average traiig time of the five-fold cross-validatio is 1.9 5=9.9 s. I additio, usig 10% of the total data (the lower boud of the traiig data size) as traiig data with five experimet rus, the average ad stadard deviatios of SVM are ad.169, showig sigificat differeces with the CBE cross-validatio (the P-valuebb0.01, usig the idepedet t-test). The average traiig data is =7.5 s. Similarly, the CBE cross-validatio is better tha the cross-validatio usig the lower bouds of the traiig data size ad experimet rus. Therefore, CBE cross-validatio is a efficiet ad effective method Discussio of CBE idex for various data characteristics I this subsectio, we apply sesitivity aalysis to the calculatio of the CBE idex usig ubalaced classes, dimesios, ad sample sizes of a data set as the attributes. Fig.. Relatioship betwee CBE idexes ad accuracies with the Australia data set (correlatio coefficiet=0.9) Ubalaced class Nguye ad Yoggwa proposed that the accuracy of classifiers goes dow as the ubalaced level icreases. Specifically, they used

9 D.-C. Li et al. / Decisio Support Systems 50 (010) Table 10 The averages ad stadard deviatios (SD) of CBE idexes with icreasig the size of the traiig data set for the Australia data set. (Bold value meas the optimal data size). Traiig data 3 (10%) 13 (0%) 173 (5%) 07 (30%) 4 (35%) Average SD Traiig data 76 (40%) 3 (41%) 90 (4%) 96 (43%) Average SD SVM as the classificatio tool ad foud that it was affected by the ubalaced effect [19]. I our experimets, we first cosider the ubalaced class characteristic of a data set with the same data structure. We geerate data sets by fixig the positive sample size ad icreasig the egative sample size, ad the results are show i Table 11. Table 11 shows that the higher the ubalaced level, the higher the data complexity ad the lower the CBE idex Dimesios For a fixed sample size, addig dimesios will degrade the performace (high data complexity) of a classifier if the umber of traiig data poits is small relative to the umber of dimesios [4]. For the secod characteristic, a fixed sample size of 50 is used. Whe icreasig the umber of dimesios with the same data structure, give that the umber of traiig data is smaller tha the umber of dimesios i the experimets, the results are obtaied ad show i Table 1. Table 1 shows that whe the dimesios are high, the data complexity is also high, while the CBE idex is low Sample size For the third characteristic i our experimets, we use the same sample sizes for both classes, ad these are icreasig with the same structure. The results are show i Table 13. Table 13 shows that whe the samples of both classes icrease, the data complexity stays the same, as does the CBE idex. 5. Coclusio ad discussios Our research develops a efficiet ad effective cross-validatio method called Complexity-based Efficiet (CBE) cross-validatio. The CBE cross-validatio uses the CBE idex (calculated by explorig the data's geometric structure ad oise) to precisely discover the data's characteristics ad its o-liear complexity, i order to help uderstad the data set. We also employ the CBE idex to calculate the optimal traiig data size ad umber of experimet rus. CBE cross-validatio aims to reduce model evaluatio time whe a complex ad computatioally expesive classifier is used. We expect that whe we apply CBE cross-validatio to real biary data sets, we ca use the proposed method to fid the optimal traiig Table 11 Sesitivity aalysis of the CBE idex for ubalaced data sets. Positive samples Negative samples MiPts CBE idex Case Case Case Case Case Table 1 Sesitivity aalysis of the CBE idex for various data dimesios. No. of dimesios MiPts CBE idex Case Case Case Case Case Table 13 Sesitivity aalysis of the CBE idex for various sample sizes of both classes. Positive samples Negative samples MiPts CBE idex Case Case Case Case Case data ad the umber of experimet rus, to help researchers to develop more precise classificatio tools with less evaluatio time. Thus this work ca assist researchers i developig ew classificatio tools. The threshold criterio of X % CBE X+1% CBE, the lower boud sample size of 0.01, ad the lower boud of experimet rus of five are empirical values, that we hope to fid theoretical values i future studies. With regard to the settig of the threshold criterio of the lower boud, we cosider that whe the umber of data is large, we do ot wat to use too few data for the aalysis, eve though the data is easy to classify, because the iformatio lost could be sigificat, ad thus it is very difficult to covice decisio makers ituitively. Besides, whe we use these low limits, we are idicatig that there are about 40% of the whole data that have the chace to be selected as the traiig data = 10 Þ 5 40%. As to the experimet beig repeated 30 times, we cosider that the CBE distributio will ormally coverge to a ormal distributio whe is large. As a matter of coveiece, we thus use 30 times to approximate a ormal distributio. I fact, oe may eed to use Q-Q plot to check if the statistics (accuracy) does i fact follow a ormal distributio. Fig. 9. Relatioship betwee traiig size ad CBE idex of Australia data set.

10 10 D.-C. Li et al. / Decisio Support Systems 50 (010) CBE cross-validatio is a biary classificatio validatio method. However, multi-class classificatio problems are very commo i both studies ad real-world applicatios. Therefore, the study of CBE cross-validatio with multiple classes is also cosidered as oe directio for future research. Refereces [1] C.M. Bishop, Patter Recogitio ad Machie Learig, Spriger, 006. [] L.J. Cao, H.P. Lee, W.K. Chog, Modified support vector ovelty detector usig traiig data with outliers, Patter Recogitio Letters 4 (003) [3] G. Casella, R.L. Berger, Statistical Iferece, secod editio, Duxbury, 00. [4] R. Clarke, H.W. Ressom, A. Wag, J. Xua, M.C. Liu, E.A. Geha, Y. Wag, The properties of high-dimesioal data spaces: implicatios for explorig gee ad protei expressio data, Nature Reviews. Cacer (1) (00) [5] M. Daszykowski, B. Walczak, D.L. Massart, Lookig for atural patters i data part 1. desity-based approach, Chemometrics ad Itelliget Laboratory Systems 56 () (001) 3 9. [6] M. Daszykowski, B. Walczak, D.L. Massart, Represetative subset selectio, Aalytica Chimica Acta 46 (00) [7] M. Ester, H.P. Kriegel, J. Sader, X. Xu.,, A desity-based algorithm for discoverig clusters i large spatial databases with oisy, Proceedigs of d Iteratioal Coferece o Kowledge Discovery ad Data Miig, Portlad, 1996, pp [] M.T. Haga, H.B. Demuth, M. Beale, Neural Network Desig, Thomso, Sigapore, [9] H. Ha, Y. Ko, J. Seo, Usig the revised EM algorithm to remove oisy for improvig the oe-agaist-the-rest method i biary text classificatio, Iformatio Processig ad Maagemet 43 (5) (007) [10] T.K. Ho, A data complexity aalysis of comparative advatages of decisio forest costructors, Patter Aalysis ad Applicatios 5 (00) [11] M.Y. Hu, M. Shaker, G.P. Zhag, M.S. Hug, Modelig cosumer situatioal choice of log distace commuicatio with eural etworks, Decisio Support Systems 44 (4) (00) [1] V.N. Vapik, The Nature of Statistical Learig Theory, secod editiospriger, New York, 000. [13] M. Katardzic, Data Miig: Cocept, Model, Method, ad Algorithms, Wiley- Itersciece, 003. [14] E.W.M. Lee, Y.Y. Lee, C.P. Lim, C.Y. Tag, Applicatio of a oisy classificatio techique to determie the occurrece of flashover i compartmet fires, Advaced Egieerig Iformatics 0 (006) 13. [15] D.C. Li, Y.H. Fag, A algorithm to cluster data for efficiet classificatio of support vector machies, Expert Systems with Applicatios 34 (00) [16] D.C. Li, Y.H. Fag, A o-liearly virtual sample geeratio techique usig cluster discovery ad parametric equatios of hypersphere, Expert Systems with Applicatios 36 (009) [17] D.C. Li, C.W. Yeh, T.I. Tsai, Y.H. Fag, Susa C. Hu, Acquirig kowledge with limited experiece, Expert Systems 4 (3) (007) [1] E.B. Masilla, O classifier domais of competece, Proceedigs of the 17th Iteratioal Coferece o Patter Recogitio (ICPR'04), 004. [19] H.V. Nguye, W. Yoggwa, Classificatio of ubalaced medical data with weighted Regularized Least Squares, Proceedigs of the Frotiers i the Covergece of Biosciece ad Iformatio Techologies (IEEE), 007, pp [0] S. Piramuthu, M.J. Shaw, J.A. Getry, A classificatio approach usig multi-layered eural etworks, Decisio Support Systems 11 (5) (1994) [1] A.M. Rubiov, N.V. Soukhorkova, J. Ugo, Classes ad clusters i data aalysis, Europea Joural of Operatioal Research 173 (006) [] C. Schaffer, Techical ote: selectig a classificatio method by cross-validatio, Machie Learig 13 (1993) [3] P.N. Ta, M. Steibach, V. Kumar, Itroductio to Data Miig, 1st editio, Pearso Addiso, Wesley, Bosto, 006. [4] I.H. Witte, Eibe was preseted as. first ame ad Frak as.surame. Please check if. appropriate.eibe Frak, Data Miig: Practical Machie Learig Tools ad Techiques, Secod editiomorga Kaufma, Amsterdam, 005. Der-Chiag Li is a Distiguished Professor i the Departmet of Idustrial ad Iformatio Maagemet, the Natioal Cheg Kug Uiversity, Taiwa. He received his Ph.D. degree at the Departmet of Idustrial Egieerig at Lamar Uiversity Beaumot, Texas, USA, i 195. As a research professor, his curret iterest cocetrates o learig with small data sets. Yao-Hwei Fag is a postdoctoral fellow i the Divisio of Biostatistics ad Bioiformatics, Natioal Health Research Istitutes. He is workig at the laboratory for statistical aalysis of huma geetic. He received his Ph.D. at the Departmet of Idustrial ad Iformatio Maagemet at Natioal Cheg Kug Uiversity, Taiwa, i 009. Y.M. Frak Fag obtaied his PhD degree from the Departmet of Civil ad Hydraulic Egieerig, Feg Chia Uiversity i 006. Before he joied the Departmet of Civil ad Hydraulic Egieerig of Feg Chia Uiversity (FCU) i 006, he worked as a post doctoral researcher i Geographic Iformatio Systems Research Ceter, Feg Chia Uiversity. Curretly, Assistat Professor Fag is Chief Researcher of Geographic Iformatio Systems Research Ceter, FCU. His research iterests iclude disaster Moitorig ad civil egieerig.

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Sprig 2017 A secod course i data miig http://www.it.uu.se/edu/course/homepage/ifoutv2/vt17/ Kjell Orsbor Uppsala Database Laboratory Departmet of Iformatio Techology, Uppsala Uiversity,

More information

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c Advaces i Egieerig Research (AER), volume 131 3rd Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 2017) Pruig ad Summarizig the Discovered Time Series Associatio Rules

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised

More information

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process Vol.133 (Iformatio Techology ad Computer Sciece 016), pp.85-89 http://dx.doi.org/10.1457/astl.016. Euclidea Distace Based Feature Selectio for Fault Detectio Predictio Model i Semicoductor Maufacturig

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

Normal Distributions

Normal Distributions Normal Distributios Stacey Hacock Look at these three differet data sets Each histogram is overlaid with a curve : A B C A) Weights (g) of ewly bor lab rat pups B) Mea aual temperatures ( F ) i A Arbor,

More information

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Dynamic Programming and Curve Fitting Based Road Boundary Detection

Dynamic Programming and Curve Fitting Based Road Boundary Detection Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL

EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL Iteratioal Joural of Iformatio Techology ad Kowledge Maagemet July-December 2012, Volume 5, No. 2, pp. 371-375 EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL

More information

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article Available olie www.jocpr.com Joural of Chemical ad Pharmaceutical Research, 2013, 5(12):745-749 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 K-meas algorithm i the optimal iitial cetroids based

More information

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb Chapter 3 Descriptive Measures Measures of Ceter (Cetral Tedecy) These measures will tell us where is the ceter of our data or where most typical value of a data set lies Mode the value that occurs most

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana The Closest Lie to a Data Set i the Plae David Gurey Southeaster Louisiaa Uiversity Hammod, Louisiaa ABSTRACT This paper looks at three differet measures of distace betwee a lie ad a data set i the plae:

More information

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University CSCI 5090/7090- Machie Learig Sprig 018 Mehdi Allahyari Georgia Souther Uiversity Clusterig (slides borrowed from Tom Mitchell, Maria Floria Balca, Ali Borji, Ke Che) 1 Clusterig, Iformal Goals Goal: Automatically

More information

New Fuzzy Color Clustering Algorithm Based on hsl Similarity

New Fuzzy Color Clustering Algorithm Based on hsl Similarity IFSA-EUSFLAT 009 New Fuzzy Color Clusterig Algorithm Based o hsl Similarity Vasile Ptracu Departmet of Iformatics Techology Tarom Compay Bucharest Romaia Email: patrascu.v@gmail.com Abstract I this paper

More information

Octahedral Graph Scaling

Octahedral Graph Scaling Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Designing a learning system

Designing a learning system CS 75 Machie Learig Lecture Desigig a learig system Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square, x-5 people.cs.pitt.edu/~milos/courses/cs75/ Admiistrivia No homework assigmet this week Please try

More information

Text Feature Selection based on Feature Dispersion Degree and Feature Concentration Degree

Text Feature Selection based on Feature Dispersion Degree and Feature Concentration Degree Available olie at www.ijpe-olie.com vol. 13, o. 7, November 017, pp. 1159-1164 DOI: 10.3940/ijpe.17.07.p19.11591164 Text Feature Selectio based o Feature Dispersio Degree ad Feature Cocetratio Degree Zhifeg

More information

A new algorithm to build feed forward neural networks.

A new algorithm to build feed forward neural networks. A ew algorithm to build feed forward eural etworks. Amit Thombre Cetre of Excellece, Software Techologies ad Kowledge Maagemet, Tech Mahidra, Pue, Idia Abstract The paper presets a ew algorithm to build

More information

A Novel Feature Extraction Algorithm for Haar Local Binary Pattern Texture Based on Human Vision System

A Novel Feature Extraction Algorithm for Haar Local Binary Pattern Texture Based on Human Vision System A Novel Feature Extractio Algorithm for Haar Local Biary Patter Texture Based o Huma Visio System Liu Tao 1,* 1 Departmet of Electroic Egieerig Shaaxi Eergy Istitute Xiayag, Shaaxi, Chia Abstract The locality

More information

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c Iteratioal Coferece o Computatioal Sciece ad Egieerig (ICCSE 015) Harris Corer Detectio Algorithm at Sub-pixel Level ad Its Applicatio Yuafeg Ha a, Peijiag Che b * ad Tia Meg c School of Automobile, Liyi

More information

Lecture 13: Validation

Lecture 13: Validation Lecture 3: Validatio Resampli methods Holdout Cross Validatio Radom Subsampli -Fold Cross-Validatio Leave-oe-out The Bootstrap Bias ad variace estimatio Three-way data partitioi Itroductio to Patter Recoitio

More information

Criterion in selecting the clustering algorithm in Radial Basis Functional Link Nets

Criterion in selecting the clustering algorithm in Radial Basis Functional Link Nets WSEAS TRANSACTIONS o SYSTEMS Ag Sau Loog, Og Hog Choo, Low Heg Chi Criterio i selectig the clusterig algorithm i Radial Basis Fuctioal Lik Nets ANG SAU LOONG 1, ONG HONG CHOON 2 & LOW HENG CHIN 3 Departmet

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

Optimization for framework design of new product introduction management system Ma Ying, Wu Hongcui

Optimization for framework design of new product introduction management system Ma Ying, Wu Hongcui 2d Iteratioal Coferece o Electrical, Computer Egieerig ad Electroics (ICECEE 2015) Optimizatio for framework desig of ew product itroductio maagemet system Ma Yig, Wu Hogcui Tiaji Electroic Iformatio Vocatioal

More information

Descriptive Statistics Summary Lists

Descriptive Statistics Summary Lists Chapter 209 Descriptive Statistics Summary Lists Itroductio This procedure is used to summarize cotiuous data. Large volumes of such data may be easily summarized i statistical lists of meas, couts, stadard

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

The isoperimetric problem on the hypercube

The isoperimetric problem on the hypercube The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees.

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees. Comp 135 Machie Learig Computer Sciece Tufts Uiversity Fall 2017 Roi Khardo Some of these slides were adapted from previous slides by Carla Brodley Our secod algorithm Let s look at a simple dataset for

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

Study on effective detection method for specific data of large database LI Jin-feng

Study on effective detection method for specific data of large database LI Jin-feng Iteratioal Coferece o Automatio, Mechaical Cotrol ad Computatioal Egieerig (AMCCE 205) Study o effective detectio method for specific data of large database LI Ji-feg (Vocatioal College of DogYig, Shadog

More information

Web Text Feature Extraction with Particle Swarm Optimization

Web Text Feature Extraction with Particle Swarm Optimization 32 IJCSNS Iteratioal Joural of Computer Sciece ad Network Security, VOL.7 No.6, Jue 2007 Web Text Feature Extractio with Particle Swarm Optimizatio Sog Liagtu,, Zhag Xiaomig Istitute of Itelliget Machies,

More information

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals UNIT 4 Sectio 8 Estimatig Populatio Parameters usig Cofidece Itervals To make ifereces about a populatio that caot be surveyed etirely, sample statistics ca be take from a SRS of the populatio ad used

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

Intro to Scientific Computing: Solutions

Intro to Scientific Computing: Solutions Itro to Scietific Computig: Solutios Dr. David M. Goulet. How may steps does it take to separate 3 objects ito groups of 4? We start with 5 objects ad apply 3 steps of the algorithm to reduce the pile

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Protected points in ordered trees

Protected points in ordered trees Applied Mathematics Letters 008 56 50 www.elsevier.com/locate/aml Protected poits i ordered trees Gi-Sag Cheo a, Louis W. Shapiro b, a Departmet of Mathematics, Sugkyukwa Uiversity, Suwo 440-746, Republic

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

Stone Images Retrieval Based on Color Histogram

Stone Images Retrieval Based on Color Histogram Stoe Images Retrieval Based o Color Histogram Qiag Zhao, Jie Yag, Jigyi Yag, Hogxig Liu School of Iformatio Egieerig, Wuha Uiversity of Techology Wuha, Chia Abstract Stoe images color features are chose

More information

ANN WHICH COVERS MLP AND RBF

ANN WHICH COVERS MLP AND RBF ANN WHICH COVERS MLP AND RBF Josef Boští, Jaromír Kual Faculty of Nuclear Scieces ad Physical Egieerig, CTU i Prague Departmet of Software Egieerig Abstract Two basic types of artificial eural etwors Multi

More information

Using The Central Limit Theorem for Belief Network Learning

Using The Central Limit Theorem for Belief Network Learning Usig The Cetral Limit Theorem for Belief Network Learig Ia Davidso, Mioo Amiia Computer Sciece Dept, SUNY Albay Albay, NY, USA,. davidso@cs.albay.edu Abstract. Learig the parameters (coditioal ad margial

More information

x x 2 x Iput layer = quatity of classificatio mode X T = traspositio matrix The core of such coditioal probability estimatig method is calculatig the

x x 2 x Iput layer = quatity of classificatio mode X T = traspositio matrix The core of such coditioal probability estimatig method is calculatig the COMPARATIVE RESEARCHES ON PROBABILISTIC NEURAL NETWORKS AND MULTI-LAYER PERCEPTRON NETWORKS FOR REMOTE SENSING IMAGE SEGMENTATION Liu Gag a, b, * a School of Electroic Iformatio, Wuha Uiversity, 430079,

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Analysis of Different Similarity Measure Functions and their Impacts on Shared Nearest Neighbor Clustering Approach

Analysis of Different Similarity Measure Functions and their Impacts on Shared Nearest Neighbor Clustering Approach Aalysis of Differet Similarity Measure Fuctios ad their Impacts o Shared Nearest Neighbor Clusterig Approach Ail Kumar Patidar School of IT, Rajiv Gadhi Techical Uiversity, Bhopal (M.P.), Idia Jitedra

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

Learning to Shoot a Goal Lecture 8: Learning Models and Skills

Learning to Shoot a Goal Lecture 8: Learning Models and Skills Learig to Shoot a Goal Lecture 8: Learig Models ad Skills How do we acquire skill at shootig goals? CS 344R/393R: Robotics Bejami Kuipers Learig to Shoot a Goal The robot eeds to shoot the ball i the goal.

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Evaluation of Support Vector Machine Kernels for Detecting Network Anomalies

Evaluation of Support Vector Machine Kernels for Detecting Network Anomalies Evaluatio of Support Vector Machie Kerels for Detectig Network Aomalies Prera Batta, Maider Sigh, Zhida Li, Qigye Dig, ad Ljiljaa Trajković Commuicatio Networks Laboratory http://www.esc.sfu.ca/~ljilja/cl/

More information

Fuzzy Rule Selection by Data Mining Criteria and Genetic Algorithms

Fuzzy Rule Selection by Data Mining Criteria and Genetic Algorithms Fuzzy Rule Selectio by Data Miig Criteria ad Geetic Algorithms Hisao Ishibuchi Dept. of Idustrial Egieerig Osaka Prefecture Uiversity 1-1 Gakue-cho, Sakai, Osaka 599-8531, JAPAN E-mail: hisaoi@ie.osakafu-u.ac.jp

More information

A Note on Least-norm Solution of Global WireWarping

A Note on Least-norm Solution of Global WireWarping A Note o Least-orm Solutio of Global WireWarpig Charlie C. L. Wag Departmet of Mechaical ad Automatio Egieerig The Chiese Uiversity of Hog Kog Shati, N.T., Hog Kog E-mail: cwag@mae.cuhk.edu.hk Abstract

More information

Bayesian Network Structure Learning from Attribute Uncertain Data

Bayesian Network Structure Learning from Attribute Uncertain Data Bayesia Network Structure Learig from Attribute Ucertai Data Wetig Sog 1,2, Jeffrey Xu Yu 3, Hog Cheg 3, Hogya Liu 4, Ju He 1,2,*, ad Xiaoyog Du 1,2 1 Key Labs of Data Egieerig ad Kowledge Egieerig, Miistry

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

1 Graph Sparsfication

1 Graph Sparsfication CME 305: Discrete Mathematics ad Algorithms 1 Graph Sparsficatio I this sectio we discuss the approximatio of a graph G(V, E) by a sparse graph H(V, F ) o the same vertex set. I particular, we cosider

More information

An Estimation of Distribution Algorithm for solving the Knapsack problem

An Estimation of Distribution Algorithm for solving the Knapsack problem Vol.4,No.5, 214 Published olie: May 25, 214 DOI: 1.7321/jscse.v4.5.1 A Estimatio of Distributio Algorithm for solvig the Kapsack problem 1 Ricardo Pérez, 2 S. Jös, 3 Arturo Herádez, 4 Carlos A. Ochoa *1,

More information

Investigating methods for improving Bagged k-nn classifiers

Investigating methods for improving Bagged k-nn classifiers Ivestigatig methods for improvig Bagged k-nn classifiers Fuad M. Alkoot Telecommuicatio & Navigatio Istitute, P.A.A.E.T. P.O.Box 4575, Alsalmia, 22046 Kuwait Abstract- We experimet with baggig knn classifiers

More information

BASED ON ITERATIVE ERROR-CORRECTION

BASED ON ITERATIVE ERROR-CORRECTION A COHPARISO OF CRYPTAALYTIC PRICIPLES BASED O ITERATIVE ERROR-CORRECTIO Miodrag J. MihaljeviC ad Jova Dj. GoliC Istitute of Applied Mathematics ad Electroics. Belgrade School of Electrical Egieerig. Uiversity

More information

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters. SD vs. SD + Oe of the most importat uses of sample statistics is to estimate the correspodig populatio parameters. The mea of a represetative sample is a good estimate of the mea of the populatio that

More information

Arithmetic Sequences

Arithmetic Sequences . Arithmetic Sequeces COMMON CORE Learig Stadards HSF-IF.A. HSF-BF.A.1a HSF-BF.A. HSF-LE.A. Essetial Questio How ca you use a arithmetic sequece to describe a patter? A arithmetic sequece is a ordered

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0 Polyomial Fuctios ad Models 1 Learig Objectives 1. Idetify polyomial fuctios ad their degree 2. Graph polyomial fuctios usig trasformatios 3. Idetify the real zeros of a polyomial fuctio ad their multiplicity

More information

Ontology-based Decision Support System with Analytic Hierarchy Process for Tour Package Selection

Ontology-based Decision Support System with Analytic Hierarchy Process for Tour Package Selection 2017 Asia-Pacific Egieerig ad Techology Coferece (APETC 2017) ISBN: 978-1-60595-443-1 Otology-based Decisio Support System with Aalytic Hierarchy Process for Tour Pacage Selectio Tie-We Sug, Chia-Jug Lee,

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Mining from Quantitative Data with Linguistic Minimum Supports and Confidences

Mining from Quantitative Data with Linguistic Minimum Supports and Confidences Miig from Quatitative Data with Liguistic Miimum Supports ad Cofideces Tzug-Pei Hog, Mig-Jer Chiag ad Shyue-Liag Wag Departmet of Electrical Egieerig Natioal Uiversity of Kaohsiug Kaohsiug, 8, Taiwa, R.O.C.

More information

Mathematical Stat I: solutions of homework 1

Mathematical Stat I: solutions of homework 1 Mathematical Stat I: solutios of homework Name: Studet Id N:. Suppose we tur over cards simultaeously from two well shuffled decks of ordiary playig cards. We say we obtai a exact match o a particular

More information

On Computing the Fuzzy Weighted Average Using the KM Algorithms

On Computing the Fuzzy Weighted Average Using the KM Algorithms O Computig the Fuzzy Weighted Average Usig the KM Algorithms Feilog iu ad Jerry M Medel Sigal ad Image Processig Istitute, Departmet of Electrical Egieerig Uiversity of Souther Califoria, 3740 McClitock

More information

Designing a learning system

Designing a learning system CS 75 Itro to Machie Learig Lecture Desigig a learig system Milos Hauskrecht milos@pitt.edu 539 Seott Square, -5 people.cs.pitt.edu/~milos/courses/cs75/ Admiistrivia No homework assigmet this week Please

More information

Research on Identification Model of Financial Fraud of Listed Company Based on Data Mining Technology

Research on Identification Model of Financial Fraud of Listed Company Based on Data Mining Technology 208 2d Iteratioal Coferece o Systems, Computig, ad Applicatios (SYSTCA 208) Research o Idetificatio Model of Fiacial Fraud of Listed Compay Based o Data Miig Techology Jiaqi Hu, Xiao Che School of Busiess,

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Performance Comparisons of PSO based Clustering

Performance Comparisons of PSO based Clustering Performace Comparisos of PSO based Clusterig Suresh Chadra Satapathy, 2 Guaidhi Pradha, 3 Sabyasachi Pattai, 4 JVR Murthy, 5 PVGD Prasad Reddy Ail Neeruoda Istitute of Techology ad Scieces, Sagivalas,Vishaapatam

More information

On Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract

On Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract O Ifiite Groups that are Isomorphic to its Proper Ifiite Subgroup Jaymar Talledo Baliho Abstract Two groups are isomorphic if there exists a isomorphism betwee them Lagrage Theorem states that the order

More information

Intrusion Detection Method Using Protocol Classification and Rough Set Based Support Vector Machine

Intrusion Detection Method Using Protocol Classification and Rough Set Based Support Vector Machine Computer ad formatio Sciece trusio Detectio Method Usig Protocol Classificatio ad Rough Set Based Support Vector Machie Xuyi Re College of Computer Sciece, Najig Uiversity of Post & Telecommuicatios Najig

More information

BAYESIAN WITH FULL CONDITIONAL POSTERIOR DISTRIBUTION APPROACH FOR SOLUTION OF COMPLEX MODELS. Pudji Ismartini

BAYESIAN WITH FULL CONDITIONAL POSTERIOR DISTRIBUTION APPROACH FOR SOLUTION OF COMPLEX MODELS. Pudji Ismartini Proceedig of Iteratioal Coferece O Research, Implemetatio Ad Educatio Of Mathematics Ad Scieces 014, Yogyakarta State Uiversity, 18-0 May 014 BAYESIAN WIH FULL CONDIIONAL POSERIOR DISRIBUION APPROACH FOR

More information

Fuzzy Linear Regression Analysis

Fuzzy Linear Regression Analysis 12th IFAC Coferece o Programmable Devices ad Embedded Systems The Iteratioal Federatio of Automatic Cotrol September 25-27, 2013. Fuzzy Liear Regressio Aalysis Jaa Nowaková Miroslav Pokorý VŠB-Techical

More information

Solving Fuzzy Assignment Problem Using Fourier Elimination Method

Solving Fuzzy Assignment Problem Using Fourier Elimination Method Global Joural of Pure ad Applied Mathematics. ISSN 0973-768 Volume 3, Number 2 (207), pp. 453-462 Research Idia Publicatios http://www.ripublicatio.com Solvig Fuzzy Assigmet Problem Usig Fourier Elimiatio

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

Neural Networks A Model of Boolean Functions

Neural Networks A Model of Boolean Functions Neural Networks A Model of Boolea Fuctios Berd Steibach, Roma Kohut Freiberg Uiversity of Miig ad Techology Istitute of Computer Sciece D-09596 Freiberg, Germay e-mails: steib@iformatik.tu-freiberg.de

More information

Effect of control points distribution on the orthorectification accuracy of an Ikonos II image through rational polynomial functions

Effect of control points distribution on the orthorectification accuracy of an Ikonos II image through rational polynomial functions Effect of cotrol poits distributio o the orthorectificatio accuracy of a Ikoos II image through ratioal polyomial fuctios Marcela do Valle Machado 1, Mauro Homem Atues 1 ad Paula Debiasi 1 1 Federal Rural

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS I this uit of the course we ivestigate fittig a straight lie to measured (x, y) data pairs. The equatio we wat to fit

More information

Revisiting the performance of mixtures of software reliability growth models

Revisiting the performance of mixtures of software reliability growth models Revisitig the performace of mixtures of software reliability growth models Peter A. Keiller 1, Charles J. Kim 1, Joh Trimble 1, ad Marlo Mejias 2 1 Departmet of Systems ad Computer Sciece 2 Departmet of

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only Edited: Yeh-Liag Hsu (998--; recommeded: Yeh-Liag Hsu (--9; last updated: Yeh-Liag Hsu (9--7. Note: This is the course material for ME55 Geometric modelig ad computer graphics, Yua Ze Uiversity. art of

More information

THIN LAYER ORIENTED MAGNETOSTATIC CALCULATION MODULE FOR ELMER FEM, BASED ON THE METHOD OF THE MOMENTS. Roman Szewczyk

THIN LAYER ORIENTED MAGNETOSTATIC CALCULATION MODULE FOR ELMER FEM, BASED ON THE METHOD OF THE MOMENTS. Roman Szewczyk THIN LAYER ORIENTED MAGNETOSTATIC CALCULATION MODULE FOR ELMER FEM, BASED ON THE METHOD OF THE MOMENTS Roma Szewczyk Istitute of Metrology ad Biomedical Egieerig, Warsaw Uiversity of Techology E-mail:

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

South Slave Divisional Education Council. Math 10C

South Slave Divisional Education Council. Math 10C South Slave Divisioal Educatio Coucil Math 10C Curriculum Package February 2012 12 Strad: Measuremet Geeral Outcome: Develop spatial sese ad proportioal reasoig It is expected that studets will: 1. Solve

More information