Unsupervised Discretization Using Kernel Density Estimation

Size: px
Start display at page:

Download "Unsupervised Discretization Using Kernel Density Estimation"

Transcription

1 Usupervised Discretizatio Usig Kerel Desity Estimatio Maregle Biba, Floriaa Esposito, Stefao Ferilli, Nicola Di Mauro, Teresa M.A Basile Departmet of Computer Sciece, Uiversity of Bari Via Oraboa 4, 7025 Bari, Italy Abstract Discretizatio, defied as a set of cuts over domais of attributes, represets a importat preprocessig task for umeric data aalysis. Some Machie Learig algorithms require a discrete feature space but i real-world applicatios cotiuous attributes must be hadled. To deal with this problem may supervised discretizatio methods have bee proposed but little has bee doe to sythesize usupervised discretizatio methods to be used i domais where o class iformatio is available. Furthermore, existig methods such as (equal-width or equal-frequecy) biig, are ot well-pricipled, raisig therefore the eed for more sophisticated methods for the usupervised discretizatio of cotiuous features. This paper presets a ovel usupervised discretizatio method that uses o-parametric desity estimators to automatically adapt sub-iterval dimesios to the data. The proposed algorithm searches for the ext two sub-itervals to produce, evaluatig the best cut-poit o the basis of the desity iduced i the sub-itervals by the curret cut ad the desity give by a kerel desity estimator for each sub-iterval. It uses cross-validated log-likelihood to select the maximal umber of itervals. The ew proposed method is compared to equal-width ad equal-frequecy discretizatio methods through experimets o well kow bechmarkig data. Itroductio Data format is a importat issue i Machie Learig (ML) because differet types of data make relevat differece i learig tasks. While there ca be ifiitely may values for a cotiuous attribute, the umber of discrete values is ofte small or fiite. Whe learig, e.g., classificatio trees/rules, the data type has a importat impact o the decisio tree iductio. As reported i [Dougherty et al.,995], discretizatio makes learig more accurate ad faster. I geeral, the decisio trees ad rules leared usig discrete features are more compact ad more accurate tha those iduced usig cotiuous oes. I additio to the advatages of discrete values over cotiuous oes, the poit is that may learig algorithms ca oly hadle discrete attributes, thus good discretizatio methods are a key issue for them sice they ca sigificatly affect the learig outcome. There are differet axes by which discretizatio methods ca be classified, accordig to the differet directios followed by the implemetatio of discretizatio techiques due to differet eeds: global vs. local, splittig (topdow) vs. mergig (bottom-up), direct vs. icremetal ad supervised vs. usupervised. Local methods, as exemplified by C4.5, discretize i a localized regio of the istace space. (i.e. a subset of istaces). O the other side, global methods use the etire istace space [Chmielevski ad Grzymala-Busse, 994]. Splittig methods start with a empty list of cutpoits ad, while splittig the itervals i a top-dow fashio, produce progressively the cut-poits that make up the discretizatio. O the cotrary, mergig methods start with all the possible cutpoits ad, at each step of the discretizatio refiemet, elimiate cut-poits by mergig itervals. Direct methods divide the iitial iterval i subitervals simultaeously (i.e., equal-width ad equalfrequecy), thus they eed as a further iput from the user the umber of itervals to produce. Icremetal methods [Cerquides ad Mataras, 997] start with a simple discretizatio step ad progressively improve the discretizatio, hece eedig a additioal criterio to stop the process. Supervised discretizatio cosiders class iformatio while usupervised discretizatio does ot. Equal-width ad equal-frequecy biig are simple techiques that perform usupervised discretizatio without exploitig ay class iformatio. I these methods, cotiuous itervals are split ito sub-itervals ad it is up to the user specifyig the width (rage of values to iclude i a sub-iterval) or frequecy (umber of istaces i each sub-iterval). These simple methods may ot lead to good results whe the cotiuous values are ot compliat with the uiform distributio. Additioally, sice outliers are ot hadled, they ca produce results with low accuracy i the presece of skew data. Usually, to deal with these problems, class iformatio has bee used i supervised methods, but whe o such iformatio is available the oly optio is exploitig usupervised methods. While there exist may supervised methods i literature, ot much work has bee doe for sythe-

2 sizig usupervised methods. This could be due to the fact that discretizatio has bee commoly associated with the classificatio task. Therefore, work o supervised methods is strogly motivated i those learig tasks where o class iformatio is available. I particular, i may domais, learig algorithms deal oly with discrete values. Amog these learig settigs, i may cases o class iformatio ca be exploited ad usupervised discretizatio methods such as simple biig are used. The work preseted i this paper proposes a top-dow, global, direct ad usupervised method for discretizatio. It exploits desity estimatio methods to select the cut-poits durig the discretizatio process. The umber of cutpoits is computed by cross-validatig the log-likelihood. We cosider as cadidate cutpoits those that fall betwee two istaces of the attribute to be discretized. The space of all the possible cut-poits to evaluate could grow for large datasets that have cotiuous attributes with may istaces with differet values amog them. For this reaso we developed ad implemeted a efficiet algorithm of complexity Nlog(N) where N is umber of istaces. The paper is orgaized as follows. I Sectio 2 we describe o-parametric desity estimators, a special case of which is the kerel desity estimator. I Sectio 3 we preset the discretizatio algorithm, while i Sectio 4 we report experimets carried out o classical datasets of the UCI repository. Sectio 5 cocludes the paper ad outlies future work. 2 No-parametric desity estimatio Sice data may be available uder various distributios, it is ot always straightforward to costruct desity fuctios from some give data. I parametric desity estimatio, a importat assumptio is made: available data has a desity fuctio that belogs to a kow family of distributios, such as the ormal distributio or the Gaussia oe, havig their ow parameters for mea ad variace. What a parametric method does is fidig the values of these parameters that best fit the data. However, data may be complex ad assumptios about the distributios that are forced upo the data may lead to models that do ot fit well the data. I these cases, where makig assumptios is difficult, oparametric desity fuctios are preferred. Simple biig (histograms) is oe of the most wellkow o-parametric desity methods. It cosists i assigig the same value of the desity fuctio f to every istace that falls i the iterval [C h/2, C + h/2), where C is the origi of the bi ad h is the biwidth. The value of such a fuctio is defied as follows (symbol # stads for umber of ): f = h h #{istaces that fall i C, C + } h 2 2 Oce fixed the origi C of a bi, for every istace that falls i the iterval cetered i C ad of width h, a block of size by the bi width is placed over the iterval (Figure ). Here, it is importat to ote that, if oe wats to get the desity value i x, every other poit i the same bi, cotributes equally to the desity i x, o matter how close or far away from x these poits are. Figure. Simple biig places a block i every sub-iterval for every istace x that falls i it This is rather restrictig because it does ot give a real mirror of the data. I priciple, poits closer to x should be weighted more tha other poits that are far from it. The first step i doig this is elimiatig the depedece o bi origis fixed a-priori ad place the bi origis cetered at every poit x. Thus the followig pseudo-formula: biwidth # {istaces that fall i a bi cotaiig should be trasformed i the followig oe: biwidth # {istaces that fall i a bi aroud The subtle but importat differece i costructig biig desity with the secod formula, permits to place the bi aroud x ad the calculatio of the desity is performed ot i a bi cotaiig x ad depedig from the origi C, but i a bi whose ceter is upo x. The bi ceter o x, allows successively to assig differet weights to the other poits i the same bi i terms of impact upo the desity i x depedig o the distace from x. If we cosider itervals of width h cetered o x, the the desity fuctio i x is give by the formula: f = #{istaces that fall i [ x h, x + h] } 2h I this case, whe costructig the desity fuctio, a box of width h is placed for every poit that falls i the iterval cetered i x. These boxes (the dashed oes i Figure 2) are the added up, yieldig the desity fuctio of Figure 2. This provides a way for givig a more accurate view of what the desity of the data is, called box kerel desity x} x}

3 estimate. However, the weights of the poits that fall i the same bi as x have ot bee chaged yet. Figure 2. Placig a box for every istace i the iterval aroud x ad addig them up. I order to do this, the kerel desity fuctio is itroduced: p = x X K i h i = h where K is a weightig fuctio. What this fuctio does is providig a smart way of estimatig the desity i x, by coutig the frequecy of other poits X i i the same bi as x ad weightig them differetly depedig o their distace from x. Cotributios to the desity value of f i x from poits X i vary, sice those that are closer to x are weighted more tha poits that are further away. This property is fulfilled by may fuctios, that are called kerel fuctios. A kerel fuctio K is usually a probability desity fuctios that itegrates to ad takes positive values i its domai. What is importat for the desity estimatio does ot reside i the kerel fuctio itself (Gaussia, Epaechikov or quadratic could be used) but i the badwidth selectio [Silverma 986]. We will motivate our choice for the badwidth (the value h i the case of kerel fuctios) selectio problem i the ext sectio where we itroduce the problem of cuttig itervals based o the desity iduced by the cut ad the desity give by the above kerel desity estimatio. 3 Where ad what to cut The aim of discretizatio is always to produce sub-itervals whose iduced desity over the istaces best fits the available data. The first problem to be solved is where to cut. While most supervised top-dow discretizatio method cut exactly at the poits i the mai iterval to discretize that represet istaces of the data, we decided to cut i the middle poits betwee istace values. The advatage is that this cuttig strategy avoids the eed of decidig whether the poit at which the cut is performed is to be icluded i the left or i the right sub-iterval. The secod questio is which (sub-)iterval should be cut/split ext amog those produced at a give step of the discretizatio process. Such a choice must be drive by the objective of capturig the sigificat chages of desity i differet separated bis. Our proposal is to evaluate all the possible cut-poits i all the sub-itervals, by assigig to each of them a score accordig to a method whose meaig is as follows. Give a sigle iterval to split, ay of its cutpoits produces two bis ad thus iduces upo the iitial iterval two desities, computed usig the simple biig desity estimatio formula. Such a formula, as show i the previous sectio, assigs the same desity value of the fuctio f to every istace i the bi ad igores the distace from x of the other istaces of the bi whe computig the desity i x. Every sub-iterval produced has a averaged bied desity (the bied desity i each poit) that is differet from the desity estimated with the kerel fuctio. The less this differece is, the more the sub-iterval fits the data well, i.e. the better this biig is, ad hece there is o reaso to split it. O the cotrary, the idea uderlyig our discretizatio algorithm is that, whe splittig, oe must search for the ext two worst sub-itervals to produce, where worst meas that the desity show by each of the sub-itervals is much differet tha it would be if the distaces amog poits i the itervals ad a weightig fuctio were cosidered. The idetified worst sub-itervals are just those to be split to produce other itervals, because they do ot fit the data well. I this way itervals whose desity differs much from the real data situatio are elimiated, ad replaced by other sub-itervals. I order to achieve the desity computed by the kerel desity fuctio we should reproduce a splittig of the mai iterval such as that i Figure 2. A obvious questio that arises is: whe a give subiterval is ot to be cut aymore? Ideed, searchig for the worst sub-itervals, there are always good cadidates to be split. This is true, but o the other had at each step of the algorithms we ca split oly oe sub-itervals i other two. Thus if there are more tha oe sub-iterval (this is the case after the first split) to be split, the scorig fuctio of the cut-poits allows to choose the sub-iterval to split. 3. The scorig fuctio for the cutpoits At each step of the discretizatio process, we must choose from differet sub-itervals to split. I every sub-iterval we idetify as cadidate cut-poits all the middle poits betwee the istaces. For each of the cadidate cut-poits T we compute a score as follows: k Score(T) = ( p( x ) f ( x ) ) + i i i= i= k+ ( p( x ) f ( x )) i i

4 where i=,..,k refers to the istaces that fall ito the left sub-iterval ad i= k +,.., to the istaces that fall ito the right bi. The desity fuctios p ad f are respectively the kerel desity fuctio ad the simple biig desity fuctio. These fuctios are computed as follows: f(x i ) = where m is the umber of istaces that fall i the (left or right) bi, w is the biwidth ad N is the umber of istaces i the iterval that is beig split. The kerel desity estimator is give by the formula: p(x i ) = hn j= w N where h is the badwidth ad K is a kerel fuctio. I this framework for discretizatio, it still remais to be clarified how the badwidth of the kerel desity estimator is chose. Although there are several ways to do it, as reported i [Silverma 986], i fact i this cotext we are ot iterested i the desity computed by a classic kerel desity estimator that cosiders globally the etire set of available istaces. The classic way a kerel desity estimatio works cosiders N as the total umber of istaces i the iitial iterval ad chooses h as the smoothig parameter. The choice of h is ot easy ad various techiques have bee ivestigated to fid a optimal h. Our proposal, i this cotext, is to adapt the classic kerel desity estimator by takig h equal to the biwidth w, specified as follows. Ideed, as ca be see from the formula of p(x i ), istaces that are more distat tha h from x i, cotribute with weight equal to zero to the desity of x i. Hece, if a sub-iterval (bi) uder cosideratio has biwidth h, oly the istaces that fall i it will cotribute, depedig o their distace from x i, to the desity i x i. As we are iterested i kowig how the curret bied desity (iduced by the cadidate cut-poit ad computed by f with biwidth w) differs from the desity i the same bi but computed weightig the cotributios of X j to the desity i x i o the basis of the distace x i X j, it is useless to cosider, for the fuctio p, a badwidth greater tha w. 3.2 The discretizatio algorithm Oce a scorig fuctio has bee sythesized, we explai how the discretizatio algorithm works. Figure 3 shows the algorithm i pseudo laguage. It starts with a empty list of cut-poits (that ca be implemeted as a priority queue i order to maitai, at each step, the cut-poits ordered after their value accordig to the scorig fuctio) ad aother priority queue that cotais the sub-itervals geerated thus far. Let us see it through a example. Suppose the iitial iterval to be discretized is the oe i Figure 4 (frequecies of the istaces are ot show). N m xi X K h j Discretize(Iterval) Begi PotetialCutpoits = ComputeCutPoits(Iterval); PriorityQueueItervals.Add(Iterval); While stoppig criteria is ot met do If PriorityQueueCPs is empty Foreach cutpoit CP i PotetialCutpoits do scorecp = ComputeScorigFuctio(CP,Iterval); PriorityQueueCPs.Add(CP,scoreCP); Ed for Else BestCP = PriorityQueue.GetBest(); CurretIterval = PriorityQueueItervals.GetBest(); NewItervals = Split(CurretIterval,BestCP); LeftIterval = NewItervals.GetLeftIterval(); RightIterval = NewItervals.GetRightIterval(); PotetialLeftCPs = ComputeCutPoits(LeftIterval); PotetialRightCPs =ComputeCutPoits(RightIterval); Foreach cutpoit CP i PotetialLeftCPs scorecp = ComputeScorigFuctio(CP,LeftIterval); PriorityQueueCPs.Add(CP,scoreCP); PriorityQueueItervals.Add(LeftIterval,scoreCP); Ed For // the same foreach cycle for PotetialRightCPs Ed while Ed Figure 3. The discretizatio algorithm i pseudo laguage ,5 7, Figure 4. The first cut The cadidate cut-poits are placed i the middle of adjacet istaces: 2.5, 7.5, 22.5, 27.5; the sub-itervals produced by cut-poit 2.5 are [0, 2.5] ad [2.5, 30], ad similarly for all the other cut-poits. Now, suppose that, computig the scorig fuctio for each cut-poit, the greatest value (idicatig the cut-poit that produces the ext two worst sub-itervals) is reached by the cut-poit 7.5. The the sub-itervals are: [0, 7.5] ad [7.5, 30] ad the list of cadidate cut-poits becomes <2.5, 6.25, 8.75, 22.5, 27.5>. Suppose the scorig fuctio evaluates as follows: Score(2.5) = 40, Score(6.25) = 22, Score(8.75) =, Score(22.5) = 5, Score(27.5) = 28. The algorithm selects 22.5 as the best cut-poit ad splits the correspodig iterval as show i Figure ,5 7, , ,5 Figure 5. The secod cut 22,

5 This secod cut produces two ew sub-itervals ad hece the curret discretizatio is made up of three sub-itervals: [0, 7.5], [7.5, 22.5], [22.5, 30], with cadidate cutpoits <2.5, 8.75, 23.75, 27,75>. Suppose values of the scorig fuctio are as follows: Score(2.5) = 40, Score(8.75) = 20, Score(23.75) = 35, Score(27,5) = 48. The best cut-poit 27.5 suggests the third cut ad the discretizatio becomes [0, 7.5], [7.5, 22.5], [22.5, 27.5], [27.5, 30]. Thus, the algorithm refies those sub-itervals that show worst fit to the data. A ote is worth: i some cases it might happe that a split is performed eve if oe of the two sub-itervals (which could be the left or the right oe) it produces shows such a good fit, compared to the other sub-itervals, that it is ot split i the future. This is ot strage, sice the scorig fuctio evaluates the overall fit of the two sub-itervals. This is the case of the first cut i the preset example: the cut-poit 7.5 has bee chose, where the left sub-iterval [0, 7.5] shows good fit to the data i terms of desity while the right oe [7.5, 30] shows bad fit. I this case the iterval [0, 7.5] will ot be cut before the iterval [7.5, 30] ad perhaps will remai utouched till the ed of the discretizatio algorithm. The algorithm will stop cuttig whe the stoppig criterio (the maximal umber of cut-poits, computed by a procedure explaied i the ext paragraph) is met. 3.3 Stoppig criteria ad complexity The defiitio of a stoppig criterio is fudametal, to prevet the algorithm from cotiuig to cut util each bi cotais a sigle istace. Eve without reachig such a extreme situatio, the risk of ruig ito overfittig the model is real, because, as usual i the literature, we use loglikelihood to evaluate the desity estimators, the simple biig ad the kerel desity estimate. As a solutio, istead of requirig a specific umber of itervals (that could be too rigid ad ot based o valid assumptios), we propose the use of cross-validatio to provide a ubiased estimatio of how the model fits the real distributio. For the experimets performed the 0-fold cross-validatio was used. For each fold the algorithm computes the stoppig criterio as follows: Supposig there are N cadidate cut-poits, for each of them the cross-validated loglikelihood is computed. I order to optimize performace, at each step a structure maitais the sub-itervals i the curret discretizatio ad the correspodig splittig values, so that oly the ew values for the iterval to be split have to be computed at each step. Thus the algorithm that computes the log-likelihood for the N cut-poits is performed 0 times overall. The umber of cut-poits that shows the maximum value of the averaged log-likelihood o the test folds is chose as the best. The log-likelihood o the test data is give by the followig formula: Log-likelihood = j= j test log w N j trai where j-trai is the umber of traiig istaces i bi j, j-test is the umber of test istaces that fall i bi j, N is the total umber of istaces ad w is the width of bi j. As regards the kerel desity estimator complexity, from the formula of p, it ca be deduced that the complexity for evaluatig the kerel desity i N poits is N 2. For uivariate data, the complexity problem has bee solved by the algorithms proposed i [GreeGard ad Strai, 99] ad [Yag et al 2003] which compute the kerel desity estimate i O(N+N) istead of O(N 2 ). I our cotext we deal oly with uivariate data because oly sigle cotiuous attributes have to processed, ad thus for N istaces, the theoretical complexity of the algorithm is O(NlogN). 4 Experimets I order to assess the validity ad performace of the proposed discretizatio algorithm, we have performed experimets o several datasets take from the UCI repository ad classically used i the literature to evaluate discretizatio algorithms i the past. Specifically, the dataset used are: autos, bupa, wie, ioosphere, ecoli, soar, glass, heart, hepatitis, arrhythmia, aeal, cylider, ad auto-mpg. These datasets cotai a large set of umeric attributes of various types, from which 200 cotiuous attributes were extracted at radom ad used to test the discretizatio algorithm. I order to evaluate the discretizatio carried out by the proposed algorithm with respect to other algorithms i the literature, we compared it to three other methods: equalwidth with fixed umber of bis (we use 0 for the experimets), equal-frequecy with fixed umber of bis (we use 0 for the experimets), equal-width cross-validated for the umber of bis. The compariso was made alog the loglikelihood o the test data usig a 0-fold cross-validatio methodology. The results o the test folds were compared through a paired t-test as regards cross-validated loglikelihood. Table presets the results of the t-test based o cross-validated log-likelihood with a risk level α = It shows the umber of cotiuous attributes whose discretizatio through our method was sigificatly better, equal or sigificatly worst compared to the other methods. Our method sigificalty more accurate Equal Our method sigificalty less accurate EqualWidth 0 bis 7 (35,5%) 26 3 (,5%) EqualFreq 0 bis 79 (39,5%) 9 2 (,0%) EqualWidth Cross- Validated 54 (27,0%) 36 0 (5,0%) Table. Results of paired t-test based o cross-validated log-likelihood o 0 folds. It is clear that, eve if i the majority of cases the ew algorithm shows o differece i performace with respect to

6 the others, there is a outstadig percetage of cases (at least 27%) i which it behaves better, while the opposite holds oly i very rare cases. Amog the datasets there ca be foud may cases of cotiuous attributes whose iterval of values cotai may occurreces of the same value. This characteristic had a impact o the results of the equal frequecy method that ofte, i such cases, was ot able to produce a valid model that could fit the data. This is atural, sice this method creates the bis based o the umber of istaces that fall i it. For example if the total umber of istaces is 200 ad the bis to geerate are 0, the the umber of istaces that must fall i a bi is 20. Thus, if amog the istaces there is oe that has 30 occurreces, the the equal frequecy method is ot able to build a good model because it caot compute the desity of the bi that cotais oly the occurreces of the sigle istace. This would be eve more problematic i case of cross-validatio, which is the reaso why o compariso with the Equal Frequecy Cross-Validatio method was carried out. A importat ote ca be made cocerig (very) discotiuous data, o which our method performs better tha the others. This is due to the ability of the proposed algorithm to catch the chages i desity i separated bis. Thus very high desities i the itervals (for example large umber of istaces i a small regio) are isolated i bis differet from those which host low desities. Although it is ot straightforward to hadle very discotiuous distributios, the method we have proposed achieves good results whe tryig to produce bis that ca fit these kid of distributios. 5 Coclusios ad future work Discretizatio represets a importat preprocessig task for umeric data aalysis. So far may supervised discretizatio methods have bee proposed but little has bee doe to sythesize usupervised methods. This paper presets a ovel usupervised discretizatio method that exploits a kerel desity estimator for choosig the itervals to be split ad cross-validated log-likelihood to select the maximal umber of itervals. The ew proposed method is compared to equal-width ad equal-frequecy discretizatio methods through experimets o well kow bechmarkig data. Prelimiary results are promisig ad show that kerel desity estimatio methods are good for developig sophisticated discretizatio methods. Further work ad experimets are eeded to fie-tue the discretizatio method to deal with those cases where the other methods show better accuracy. As future applicatio we pla to use the proposed discretizatio algorithm i a learig task that requires discretizatio ad where o class iformatio is always available. Oe such cotext could be Iductive Logic Programmig, where objects whose class is ot kow, are ofte described by cotiuous attributes. This ivestigatio will aim at assessig the quality of the learig task ad how this is affected by the discretizato of the cotiuous attributes. Refereces [Cerquides ad Mataras, 997] Cerquides. J ad Mataras R.L. Proposal ad empirical compariso of a parallelizable distace-based discretizatio method. I KDD97. Third Iteratioal Coferece o Kowledge Discovery ad Data Miig, pp [Chmielevski ad Grzymala-Busse, 994] Chmielevski, M.R ad Grzymala-Busse,J.W. Global discretizatio of cotiuous attributes o preprocessig for machie learig. I Third Iteratioal Workshop o Rough Sets ad Soft Computig, pp , 994. [Dougherty et al..,995] Dougherty.J.,Kohavi,R., ad Sahami,M. Supervised ad usupervised discretizatio discretizatio of cotiuous features. I Proc. Twelfth Iteratioal Coferece o Machie Learig, Los Altos, CA:Morga Kaufma,pp , 995. [Gregard ad Strai 99] Greegard, L. ad Strai, J.The fast Gauss Trasform. SIAM Joural of Scietific ad statistical computig. 2,, [Silverma 986] Silverma, B.W. Desity estimatio for statistics ad data aalysis. Chapma ad Hall, Lodo, 986. [Yag et al 2003] Yag, C., Duraiswami, R., ad Gumerov, N Improved fast Gauss trasform. Tech. Rep.CS-TR- 4495, Dept. of Computer Sciece, Uiversity of Marylad, College Park.

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical

More information

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees.

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees. Comp 135 Machie Learig Computer Sciece Tufts Uiversity Fall 2017 Roi Khardo Some of these slides were adapted from previous slides by Carla Brodley Our secod algorithm Let s look at a simple dataset for

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Designing a learning system

Designing a learning system CS 75 Machie Learig Lecture Desigig a learig system Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square, x-5 people.cs.pitt.edu/~milos/courses/cs75/ Admiistrivia No homework assigmet this week Please try

More information

Lecture 13: Validation

Lecture 13: Validation Lecture 3: Validatio Resampli methods Holdout Cross Validatio Radom Subsampli -Fold Cross-Validatio Leave-oe-out The Bootstrap Bias ad variace estimatio Three-way data partitioi Itroductio to Patter Recoitio

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Sprig 2017 A secod course i data miig http://www.it.uu.se/edu/course/homepage/ifoutv2/vt17/ Kjell Orsbor Uppsala Database Laboratory Departmet of Iformatio Techology, Uppsala Uiversity,

More information

OCR Statistics 1. Working with data. Section 3: Measures of spread

OCR Statistics 1. Working with data. Section 3: Measures of spread Notes ad Eamples OCR Statistics 1 Workig with data Sectio 3: Measures of spread Just as there are several differet measures of cetral tedec (averages), there are a variet of statistical measures of spread.

More information

FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS

FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS Prosejit Bose Evagelos Kraakis Pat Mori Yihui Tag School of Computer Sciece, Carleto Uiversity {jit,kraakis,mori,y

More information

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised

More information

The isoperimetric problem on the hypercube

The isoperimetric problem on the hypercube The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Our Learning Problem, Again

Our Learning Problem, Again Noparametric Desity Estimatio Matthew Stoe CS 520, Sprig 2000 Lecture 6 Our Learig Problem, Agai Use traiig data to estimate ukow probabilities ad probability desity fuctios So far, we have depeded o describig

More information

Σ P(i) ( depth T (K i ) + 1),

Σ P(i) ( depth T (K i ) + 1), EECS 3101 York Uiversity Istructor: Ady Mirzaia DYNAMIC PROGRAMMING: OPIMAL SAIC BINARY SEARCH REES his lecture ote describes a applicatio of the dyamic programmig paradigm o computig the optimal static

More information

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University CSCI 5090/7090- Machie Learig Sprig 018 Mehdi Allahyari Georgia Souther Uiversity Clusterig (slides borrowed from Tom Mitchell, Maria Floria Balca, Ali Borji, Ke Che) 1 Clusterig, Iformal Goals Goal: Automatically

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

Descriptive Statistics Summary Lists

Descriptive Statistics Summary Lists Chapter 209 Descriptive Statistics Summary Lists Itroductio This procedure is used to summarize cotiuous data. Large volumes of such data may be easily summarized i statistical lists of meas, couts, stadard

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Kernel Smoothing Function and Choosing Bandwidth for Non-Parametric Regression Methods 1

Kernel Smoothing Function and Choosing Bandwidth for Non-Parametric Regression Methods 1 Ozea Joural of Applied Scieces (), 009 Ozea Joural of Applied Scieces (), 009 ISSN 943-49 009 Ozea Publicatio Kerel Smoothig Fuctio ad Choosig Badwidth for No-Parametric Regressio Methods Murat Kayri ad

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

Improved Random Graph Isomorphism

Improved Random Graph Isomorphism Improved Radom Graph Isomorphism Tomek Czajka Gopal Paduraga Abstract Caoical labelig of a graph cosists of assigig a uique label to each vertex such that the labels are ivariat uder isomorphism. Such

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

Counting the Number of Minimum Roman Dominating Functions of a Graph

Counting the Number of Minimum Roman Dominating Functions of a Graph Coutig the Number of Miimum Roma Domiatig Fuctios of a Graph SHI ZHENG ad KOH KHEE MENG, Natioal Uiversity of Sigapore We provide two algorithms coutig the umber of miimum Roma domiatig fuctios of a graph

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

Numerical Methods Lecture 6 - Curve Fitting Techniques

Numerical Methods Lecture 6 - Curve Fitting Techniques Numerical Methods Lecture 6 - Curve Fittig Techiques Topics motivatio iterpolatio liear regressio higher order polyomial form expoetial form Curve fittig - motivatio For root fidig, we used a give fuctio

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

Designing a learning system

Designing a learning system CS 75 Itro to Machie Learig Lecture Desigig a learig system Milos Hauskrecht milos@pitt.edu 539 Seott Square, -5 people.cs.pitt.edu/~milos/courses/cs75/ Admiistrivia No homework assigmet this week Please

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Sectio 4, a prototype project of settig field weight with AHP method is developed ad the experimetal results are aalyzed. Fially, we coclude our work

Sectio 4, a prototype project of settig field weight with AHP method is developed ad the experimetal results are aalyzed. Fially, we coclude our work 200 2d Iteratioal Coferece o Iformatio ad Multimedia Techology (ICIMT 200) IPCSIT vol. 42 (202) (202) IACSIT Press, Sigapore DOI: 0.7763/IPCSIT.202.V42.0 Idex Weight Decisio Based o AHP for Iformatio Retrieval

More information

On (K t e)-saturated Graphs

On (K t e)-saturated Graphs Noame mauscript No. (will be iserted by the editor O (K t e-saturated Graphs Jessica Fuller Roald J. Gould the date of receipt ad acceptace should be iserted later Abstract Give a graph H, we say a graph

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs CHAPTER IV: GRAPH THEORY Sectio : Itroductio to Graphs Sice this class is called Number-Theoretic ad Discrete Structures, it would be a crime to oly focus o umber theory regardless how woderful those topics

More information

Graphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU)

Graphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU) Graphs Miimum Spaig Trees Slides by Rose Hoberma (CMU) Problem: Layig Telephoe Wire Cetral office 2 Wirig: Naïve Approach Cetral office Expesive! 3 Wirig: Better Approach Cetral office Miimize the total

More information

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana The Closest Lie to a Data Set i the Plae David Gurey Southeaster Louisiaa Uiversity Hammod, Louisiaa ABSTRACT This paper looks at three differet measures of distace betwee a lie ad a data set i the plae:

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

1.2 Binomial Coefficients and Subsets

1.2 Binomial Coefficients and Subsets 1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =

More information

Optimal Mapped Mesh on the Circle

Optimal Mapped Mesh on the Circle Koferece ANSYS 009 Optimal Mapped Mesh o the Circle doc. Ig. Jaroslav Štigler, Ph.D. Bro Uiversity of Techology, aculty of Mechaical gieerig, ergy Istitut, Abstract: This paper brigs out some ideas ad

More information

Analysis of Documents Clustering Using Sampled Agglomerative Technique

Analysis of Documents Clustering Using Sampled Agglomerative Technique Aalysis of Documets Clusterig Usig Sampled Agglomerative Techique Omar H. Karam, Ahmed M. Hamad, ad Sheri M. Moussa Abstract I this paper a clusterig algorithm for documets is proposed that adapts a samplig-based

More information

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters. SD vs. SD + Oe of the most importat uses of sample statistics is to estimate the correspodig populatio parameters. The mea of a represetative sample is a good estimate of the mea of the populatio that

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao

More information

Software Fault Prediction of Unlabeled Program Modules

Software Fault Prediction of Unlabeled Program Modules Software Fault Predictio of Ulabeled Program Modules C. Catal, U. Sevim, ad B. Diri, Member, IAENG Abstract Software metrics ad fault data belogig to a previous software versio are used to build the software

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

On Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract

On Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract O Ifiite Groups that are Isomorphic to its Proper Ifiite Subgroup Jaymar Talledo Baliho Abstract Two groups are isomorphic if there exists a isomorphism betwee them Lagrage Theorem states that the order

More information

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS I this uit of the course we ivestigate fittig a straight lie to measured (x, y) data pairs. The equatio we wat to fit

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Criterion in selecting the clustering algorithm in Radial Basis Functional Link Nets

Criterion in selecting the clustering algorithm in Radial Basis Functional Link Nets WSEAS TRANSACTIONS o SYSTEMS Ag Sau Loog, Og Hog Choo, Low Heg Chi Criterio i selectig the clusterig algorithm i Radial Basis Fuctioal Lik Nets ANG SAU LOONG 1, ONG HONG CHOON 2 & LOW HENG CHIN 3 Departmet

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process Vol.133 (Iformatio Techology ad Computer Sciece 016), pp.85-89 http://dx.doi.org/10.1457/astl.016. Euclidea Distace Based Feature Selectio for Fault Detectio Predictio Model i Semicoductor Maufacturig

More information

Dynamic Programming and Curve Fitting Based Road Boundary Detection

Dynamic Programming and Curve Fitting Based Road Boundary Detection Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

Cubic Polynomial Curves with a Shape Parameter

Cubic Polynomial Curves with a Shape Parameter roceedigs of the th WSEAS Iteratioal Coferece o Robotics Cotrol ad Maufacturig Techology Hagzhou Chia April -8 00 (pp5-70) Cubic olyomial Curves with a Shape arameter MO GUOLIANG ZHAO YANAN Iformatio ad

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article Available olie www.jocpr.com Joural of Chemical ad Pharmaceutical Research, 2013, 5(12):745-749 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 K-meas algorithm i the optimal iitial cetroids based

More information

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c Advaces i Egieerig Research (AER), volume 131 3rd Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 2017) Pruig ad Summarizig the Discovered Time Series Associatio Rules

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

Performance Comparisons of PSO based Clustering

Performance Comparisons of PSO based Clustering Performace Comparisos of PSO based Clusterig Suresh Chadra Satapathy, 2 Guaidhi Pradha, 3 Sabyasachi Pattai, 4 JVR Murthy, 5 PVGD Prasad Reddy Ail Neeruoda Istitute of Techology ad Scieces, Sagivalas,Vishaapatam

More information

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 10 Defiig Classes Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 10.1 Structures 10.2 Classes 10.3 Abstract Data Types 10.4 Itroductio to Iheritace Copyright 2015 Pearso Educatio,

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

Image based Cats and Possums Identification for Intelligent Trapping Systems

Image based Cats and Possums Identification for Intelligent Trapping Systems Volume 159 No, February 017 Image based Cats ad Possums Idetificatio for Itelliget Trappig Systems T. A. S. Achala Perera School of Egieerig Aucklad Uiversity of Techology New Zealad Joh Collis School

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Octahedral Graph Scaling

Octahedral Graph Scaling Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of

More information

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb Chapter 3 Descriptive Measures Measures of Ceter (Cetral Tedecy) These measures will tell us where is the ceter of our data or where most typical value of a data set lies Mode the value that occurs most

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Y.K. Patil* Iteratioal Joural of Advaced Research i ISSN: 2278-6244 IT ad Egieerig Impact Factor: 4.54 HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Prof. V.S. Nadedkar** Abstract: Documet clusterig is

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

Second-Order Domain Decomposition Method for Three-Dimensional Hyperbolic Problems

Second-Order Domain Decomposition Method for Three-Dimensional Hyperbolic Problems Iteratioal Mathematical Forum, Vol. 8, 013, o. 7, 311-317 Secod-Order Domai Decompositio Method for Three-Dimesioal Hyperbolic Problems Youbae Ju Departmet of Applied Mathematics Kumoh Natioal Istitute

More information

Lecture 28: Data Link Layer

Lecture 28: Data Link Layer Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals UNIT 4 Sectio 8 Estimatig Populatio Parameters usig Cofidece Itervals To make ifereces about a populatio that caot be surveyed etirely, sample statistics ca be take from a SRS of the populatio ad used

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

condition w i B i S maximum u i

condition w i B i S maximum u i ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility

More information

Recursion. Recursion. Mathematical induction: example. Recursion. The sum of the first n odd numbers is n 2 : Informal proof: Principle:

Recursion. Recursion. Mathematical induction: example. Recursion. The sum of the first n odd numbers is n 2 : Informal proof: Principle: Recursio Recursio Jordi Cortadella Departmet of Computer Sciece Priciple: Reduce a complex problem ito a simpler istace of the same problem Recursio Itroductio to Programmig Dept. CS, UPC 2 Mathematical

More information

Intro to Scientific Computing: Solutions

Intro to Scientific Computing: Solutions Itro to Scietific Computig: Solutios Dr. David M. Goulet. How may steps does it take to separate 3 objects ito groups of 4? We start with 5 objects ad apply 3 steps of the algorithm to reduce the pile

More information

n Some thoughts on software development n The idea of a calculator n Using a grammar n Expression evaluation n Program organization n Analysis

n Some thoughts on software development n The idea of a calculator n Using a grammar n Expression evaluation n Program organization n Analysis Overview Chapter 6 Writig a Program Bjare Stroustrup Some thoughts o software developmet The idea of a calculator Usig a grammar Expressio evaluatio Program orgaizatio www.stroustrup.com/programmig 3 Buildig

More information

APPLICATION NOTE. Automated Gain Flattening. 1. Experimental Setup. Scope and Overview

APPLICATION NOTE. Automated Gain Flattening. 1. Experimental Setup. Scope and Overview APPLICATION NOTE Automated Gai Flatteig Scope ad Overview A flat optical power spectrum is essetial for optical telecommuicatio sigals. This stems from a eed to balace the chael powers across large distaces.

More information

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting)

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting) MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fittig) I this chapter, we will eamie some methods of aalysis ad data processig; data obtaied as a result of a give

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information