Predicting the Density of Algae Communities using Local Regression Trees

Size: px
Start display at page:

Download "Predicting the Density of Algae Communities using Local Regression Trees"

Transcription

1 Predctng the Densty of Algae Communtes usng Local Regresson Trees Luís Torgo LIACC Machne Learnng Group / FEP Unversty of Porto R. Campo Alegre, 823 Phone: , Fax: emal: ltorgo@ncc.up.pt WWW: ABSTRACT: Ths paper descrbes the applcaton of local regresson trees to an envronmental regresson task. Ths task was part of the 3 rd Internaton ERUDIT Competton. The technque descrbed n ths paper was announced as one of the three runner-up methods by the jury of the competton. We brefly descrbe RT, a system that s able to generate local regresson trees. We emphasse the mult-strategy features of RT that we clam as beng one of the causes for the obtaned performance. We descrbed the pre-processng steps that were taken n order to apply RT to the competton data, hghlghtng the weghed schema that was used to combne the predctons of the best RT varants. We conclude by renforcng the dea that the combnaton of features of dfferent data analyss technques can be useful to obtan hgher predctve accuracy. KEYWORDS: Regresson trees, local modellng, hybrd methods, local regresson trees. INTRODUCTION Ths paper descrbes an applcaton of local regresson trees (Torgo, 1997a, 1997b, 1999) to an envronmental data analyss task. Ths applcaton was carred out n the context of the 3 rd Internatonal Competton organsed by ERUDIT n conjuncton wth the new Computatonal Intellgence and Learnng Cluster. Ths cluster s a cooperaton between four EC-funded Networks of Excellence : ERUDIT, EvoNet, MLnet and NEuroNet. Ths paper descrbes the use of system RT (Torgo, 1999) n the context of ths competton. RT s a computer program that s able to obtan local regresson trees. In ths paper we brefly descrbe the man deas behnd local regresson trees, and then focus on the steps taken to obtan a soluton to ths data analyss task. The data analyss task concerns the envronmental problem of determnng the state of rvers and streams by montorng and analysng certan measurable chemcal concentratons wth the goal of nferrng the bologcal state of the rver, namely the densty of algae communtes. Ths study s motvated by the ncreasng concern wth the mpact human actvtes are havng on the envronment. Identfyng the key chemcal control varables that nfluence the bologcal process assocated wth these algae has become a crucal task n order to reduce the mpact of man actvtes. The data used n ths 3 rd ERUDIT nternatonal competton comes from such a study. Water qualty samples were collected from varous European rvers durng one year. These samples were analysed for varous chemcal substances ncludng: ntrogen n the form of ntrates, ntrtes and ammona, phosphate, ph, oxygen and chlorde. At the same tme, algae samples were collected to determne the dstrbutons of the algae populatons. The dynamcs of algae communtes s strongly nfluenced by the external chemcal envronment. Determnng whch chemcal factors are nfluencng more ths dynamcs s mportant knowledge that can be used to control these populatons. At the same tme there s an economcal factor motvatng even more ths analyss. In effect, the chemcal analyss s cheap and easly automated. On the contrary, the bologcal part nvolves mcroscopc examnaton, requres traned manpower and s therefore both expensve and slow. The competton task conssted of predctng the frequency dstrbuton of seven dfferent algae on the bass of eght measured concentratons of chemcal substances plus some addtonal nformaton charactersng the envronment from whch the sample was taken (season, rver sze and flow velocty). RT s a regresson analyss tool that can be seen as a knd of mult-strategy data analyss system. We wll see that ths characterstc s a consequence of the theory behnd local regresson trees. In effect, these models ntegrate regresson trees (e.g. Breman et al.,1984) wth local modellng (e.g. Cleveland and Loader, 1995). The ntegraton schema used n

2 RT allows several varants to be tred out n a gven data set. Due to ths flexblty, RT can cope wth problems havng dfferent characterstcs due to ts ablty of emulatng technques wth dfferent approxmaton bases. In ths competton we have taken advantage of ths facet of RT. We have treated the predcton of each of the seven algae frequences as a dfferent regresson task. For each of the seven regresson problems we have carred out a selecton process wth the am of estmatng whch RT varant provded the best accuracy. The followng secton provdes a bref descrpton of local regresson trees. We descrbe the man components ntegratng these hybrd regresson models and refer the method that was used to ntegrate them wthn RT. We them focus on the techncal detals concernng the applcaton of RT to the task of predctng the densty of algae communtes. LOCAL REGRESSION TREES Local Regresson Trees (Torgo, 1997a, 1997b, 1999) explore the possblty of mprovng the accuracy of regresson trees by usng smoother models at the tree leaves. These models can be regarded as a hybrd approach to multvarate regresson ntegratng dfferent solutons to ths data analyss problem. Local regresson (e.g. Cleveland and Loader, 1995) s a non-parametrc statstcal methodology that provdes smooth modellng by not assumng any partcular global form of the unknown regresson functon. On the contrary these models ft a functonal form wthn the neghbourhood of the query ponts. These models are known to provde hghly accurate predctons over a wde range of problems due to the absence of a pre-defned functonal form. However, local regresson technques are also known by ther computatonal cost, low nterpretablty and storage requrements. Regresson trees (e.g. Breman et al.,1984), on the other hand, obtan models through a recursve parttonng algorthm that nsures hgh computatonal effcency. Moreover, the resultng models are usually consdered hghly nterpretable. By ntegratng regresson trees wth local modellng, not only we mprove the accuracy of the trees, but also ncrease the computatonal effcency and comprehensblty of local models. In ths secton we provde a bref descrpton of local regresson trees. We start by descrbng both standard regresson trees and local modellng. We then address the ssue of how these two methodologes are ntegrated n our RT regresson tool. STANDARD REGRESSION TREES A regresson tree can be seen as a knd of addtve model (Haste & Tbshran, 1990) of the form where, l ( ) = k I( x ) m x (1) = 1 D k are constants; I(.) s an ndcator functon returnng 1 f ts argument s true and 0 otherwse; and D are dsjont parttons of the tranng data D such that U l = D and I l =1 D D =1 = φ. Models of ths type are sometmes called pecewse constant regresson models as they partton the predctor space 1 χ n a set of mutually exclusve regons and ft a constant value wthn each regon. An mportant aspect of tree-based regresson models s that they provde a propostonal logc representaton of these regons n the form of a tree. Each path from the root of the tree to a leaf corresponds to such a regon. Each nner node 2 of the tree s a logcal test on a predctor varable 3. In the partcular case of bnary trees there are two possble outcomes of the test, true or false. Ths means that assocated to each partton D we have a path P consstng of a conjuncton of logcal tests on the predctor 1 The multdmensonal space formed by all nput (or predctor) varables of a multvarate regresson problem. 2 All nodes except the leaves. 3 Although work exsts on multvarate tests (e.g. Breman et. al. 1984; Murthy et. al., 1994; Broadley & Utgoff, 1995; Gama, 1997).

3 varables. Ths symbolc representaton of the regresson surface s an mportant ssue when one wants to have a better understandng of the problem under consderaton. For nstance, consder the followng small example of a regresson tree obtaned by applyng RT to the competton data to obtan a model for one of the algae: T 39.8 ± 5.9 P 1 Ch , wth k 1 = 39.8 Ch P 2 Ch 7 > 43.8 Ch , wth k 2 = 50.1 F Ch T F 50.1 ± 12.2 Ch T F 12.9 ± ± 1.4 P 3 Ch 7 > 43.8 Ch 3 > 7.16 Ch , wth k 3 = 12.9 P 4 Ch 7 > 43.8 Ch 3 > 7.16 Ch 6 > 51.1, wth k 4 = 3.85 Fgure 1: An Example of a Regresson Tree. As there are four dstnct paths from the root node to the leaves, ths tree dvdes the nput space n four dfferent regons. The conjuncton of the tests n each path can be regarded as a logcal descrpton of such regons, as shown above. Ths tree can be used both to make predctons of the densty of ths alga for future water samples. Moreover, t provdes nformaton on whch varables nfluence more ths densty. Regresson trees are constructed usng a recursve parttonng (RP) algorthm. Ths algorthm bulds a tree by recursvely splttng the tranng sample nto smaller subsets. We gve below a hgh level descrpton of the algorthm. The RP algorthm receves as nput a set of n data ponts, { } t n D t = x, y, and f certan termnaton crtera are not met t generates a test node t, whose branches are obtaned by applyng the same algorthm wth two subsets of the nput data ponts. These subsets consst of the cases that logcally ental the test n the node, Dt {, y Dt : s } * L = x and the remanng cases, D = { x, y D : x s } *. At each node the best splt test s chosen accordng to some t R local crteron, whch means that ths s a greedy hll-clmbng algorthm. t = 1 x, The Recursve Parttonng Algorthm. Input : A set of n data ponts, { < x, y > }, = 1,...,n Output : A regresson tree IF termnaton crteron THEN Create Leaf Node and assgn t a Constant Value Return Leaf Node ELSE Fnd Best Splttng Test s* Create Node t wth s* Left_branch(t) = RegressonParttonngAlgorthm({ <x, y > : x s* }) Rght_branch(t) = RegressonParttonngAlgorthm({ <x, y > : x / s* }) Return Node t ENDIF The algorthm has three man components: A way to select a splt test (the splttng rule). A rule to determne when a tree node s termnal (termnaton crteron). A rule for assgnng a value to each termnal node.

4 The answer to these three problems s related to the error crteron that s selected to gude the growth of the tree. The most common choce s the mnmsaton of the mean squared error (MSE). Usng ths crteron the constant that should be used n the leaves of the trees (termnal nodes) s the average goal varable value of the cases that fall n each leaf. Moreover, ths crteron leads to the followng rule for determnng the best splt of each test node (Torgo, 1999): The best splt s * s the splt that maxmses the expresson, where, S L y and = D t L S n 2 L S. R = y D t R 2 R S + (2) n tl t R Ths rule leads to fast ncremental updatng algorthms that allow effcent search for the best test of each node (Torgo, 1999). Fnally, the termnaton crteron s related to the ssue of relable error estmaton. Methodologes lke regresson trees that have a rch hypothess (model) language, ncur the danger of overfttng the tranng data. Ths n turn leads to unnecessarly large models that usually have poor generalsaton performance (.e. lower predctve accuracy on new samples of the same problem). Overfttng avodance n regresson trees s usually acheved through a process of prunng unrelable branches of an overly large tree (e.g. Breman et al.,1984). Many prunng technques exst (see Torgo, 1999 for an overvew), most of them relyng on relable estmaton of predctve error of the trees (Torgo, 1998). Regresson trees acheve compettve predctve accuracy n a wde range of applcatons. Moreover, ther computatonal effcency allows dealng wth problems wth hundreds of varables and hundreds of thousands of cases. Stll, regresson trees provde a hghly non-smooth functon approxmaton wth strong dscontnutes 4. LOCAL MODELLING Local modellng 5 (Fan, 1995) belongs to a data analytc methodology whose basc dea behnd conssts of obtanng the predcton for a data pont x by fttng a parametrc functon n the neghbourhood of x. Ths means that these methods are locally parametrc as opposed to, for nstance, least squares lnear regresson. Moreover, these methods do not produce a vsble model of the data. Instead they make predctons based on local models generated on a query pont bass. Accordng to Cleveland and Loader (1995) local regresson traces back to the 19 th century. These authors provde a hstorcal survey of the work done snce then. The modern work on local modellng starts n the 1950 s wth the kernel methods ntroduced wthn the probablty densty estmaton settng (Rosenblatt,1956; Parzen,1962) and wthn the regresson settng (Nadaraya,1964; Watson,1964). Local polynomal regresson (Stone,1977; Cleveland,1979; Katkovnk,1979) s a generalsaton of ths early work on kernel regresson. In effect, kernel regresson amounts to fttng a polynomal of degree zero (a constant) n a neghbourhood. Summarsng, we can state the general goal of local regresson as tryng to ft a polynomal of degree p around a query pont (or test case) q usng the tranng data n ts neghbourhood. Ths ncludes the varous avalable settngs lke kernel regresson (p=0), local lnear regresson (p=1), etc 6. Most studes on local modellng are done for the case of one unvarate problems. Stll, the framework s applcable to the multvarate case and has been used wth success n some domans (Atkeson et al.,1997; Moore et al.,1997). However, several authors alert for the danger of applyng these methods n hgher nput space dmensons 7 (Hardle, 1990; Haste & Tbshran,1990). The problem s that wth hgh number of varables the tranng cases are so sparse that the noton of local neghbourhood can hardly be seen as local. Another drawback of local modellng s the complete lack of nterpretablty of the models. No nterpretable model of the tranng data s obtaned. Wth some smulaton work one may obtan a graphcal pcture of the approxmaton provded by local regresson models, but ths s only possble wth low number of nput varables. 4 Ths later characterstcs can both be seen as advantageous or dsadvantageous, dependng on the applcaton. 5 Also known as non-parametrc smoothng and local regresson. 6 A further generalsaton of ths set-up conssts of usng polynomal mxng (Cleveland & Loader,1995), where p can take non-nteger values. 7 The so called curse of dmensonalty (Bellman, 1961).

5 In spte of beng consdered a non-parametrc regresson technque, local modellng does have several parameters that must be tuned n order to obtan good predctve results. One of the most mportant s the noton of neghbourhood. Gven a query pont q we need to decde whch tranng cases wll be used to ft a local polynomal around the query pont. Ths nvolves defnng a dstance metrc over the multdmensonal space defned by the nput varables. Wth ths metrc we can specfy a dstance functon that allows fndng the nearest tranng cases of any query pont. Stll, many open ssues reman unspecfed. Namely, weghng of the varables wthn the dstance calculaton can be crucal n domans wth less relevant varables. Moreover, we need to specfy how many tranng cases wll enter the local ft (usually known as the bandwdth selecton problem). Even after havng a bandwdth sze specfcaton, we need to wegh the contrbutons of the tranng cases wthn the bandwdth. Nearer ponts should contrbute more nto the local ft. Ths s usually accomplshed through a weghng functon that takes the dstance to the query pont nto account (known as the kernel functon). The correct tunng of all these modellng parameters can be crucal for a successful use of local modellng. Local modellng provdes smooth functon approxmaton wth wde applcablty through correct tunng of the dstance functon parameters. Stll, ths s a computatonal ntensve method that does not produce a comprehensble model of the data. Moreover, these methods have dffcultes n dealng wth domans wth strong dscontnutes n the functon beng approxmated. INTEGRATING REGRESSION TREES WITH LOCAL MODELLING In ths secton we descrbe how we propose to overcome some of the lmtatons of both regresson trees and local modellng through an ntegraton of both methods that result n what we refer to as local regresson trees. The man goals of ths ntegraton can be stated as follows: Improve standard regresson trees smoothness, leadng to superor predctve accuracy. Improve local modellng n the followng aspects: Computatonal effcency. Generaton of models that are comprehensble to human users. Capablty to deal wth domans wth strong dscontnutes. In our study of local regresson trees we have consdered the ntegraton of the followng local modellng technques: Kernel models (Watson, 1964; Nadaraya, 1964). These models amount to fttng a polynomal of degree zero (.e. a constant) n the local neghbourhood. Local lnear polynomal (Stone,1977; Cleveland,1979; Katkovnk,1979). Where we ft a lnear polynomal wthn the neghbourhood. Sem-parametrc models or partal lnear models (Spegelman, 1976; Hardle, 1990). Consstng of fttng a standard least squares lnear model on all data plus a kernel model component on the resduals of the lnear polynomal that acts as a local correcton of the global lnearty assumpton of the polynomal. The man decsons concernng the ntegraton of local models wth regresson trees are how, and when to perform t. Three man alternatves exst: Assume the use of local models n the leaves durng all tree nducton (.e. growth and prunng phases). Grow a standard regresson tree and use local models only durng the prunng stage. Grow and prune a standard regresson tree. Use local models only n predcton tasks. The frst of these alternatves s more consstent from a theoretcal pont of vew as the choce of the best splt depends on the models at the leaves. Ths s the approach followed n RETIS (Karalc, 1992), whch ntegrates global least squares lnear polynomals n the tree leaves. However, obtanng such models s a computatonally demandng task. For each tral splt the left and rght chld models need to be obtaned and ther error calculated. Even wth a smple model lke the average, wthout an effcent ncremental algorthm the evaluaton of all canddate splts s too heavy. Ths means that f more complex models are to be used, lke lnear polynomals, ths task s practcally unfeasble for large domans 8. The experments descrbed by Karalc (1992) used data sets wth few hundred cases. The author does not provde any results concernng the computaton penalty of usng lnear models nstead of averages. Stll, we clam that 8 Partcularly wth large number of contnuous varables.

6 when usng more complex models lke kernel regresson ths approach s not feasble f we want to acheve a reasonable computaton tme. The second alternatve s to ntroduce the more complex models only durng the prunng stage. The tree s grown (.e. the splts are chosen) assumng averages n the leaves. Only durng the prunng stage we consder that the leaves wll contan more complex models, whch entals obtanng them for each node of the grown tree. Notce that ths s much more effcent than the frst alternatve mentoned before whch nvolved obtanng the models for each tral splt consdered durng the tree growth. Ths s the approach followed n M5 (Qunlan, 1992). Ths system also uses global least squares lnear polynomals n the tree leaves but these models are only added durng the prunng stage. We have access to a verson of M5 9 and we have confrmed that ths s a computatonally feasble soluton even for large problems. In our ntegraton of regresson trees wth local modellng we have followed the thrd alternatve. In ths approach the learnng process s separated from predcton tasks. We generate the regresson trees usng the standard methodology descrbed prevously. If we want to use the learned tree to make predctons for a set of unseen test cases, we can choose whch model should be used n the tree leaves. These can nclude complex models lke kernels or local lnear polynomals. Usng ths approach we only have to ft as many models as there are leaves n the fnal pruned tree. The man advantage of ths approach s ts computatonal effcency. However, t also allows tryng several alternatve models wthout havng to re-learn the tree. Usng ths approach the ntal tree can be regarded as a knd of rough approxmaton of the regresson surface whch s comprehensble to the human user. On top of ths rough surface we may ft smoother models for each data partton generated n the leaves of the regresson tree so as to ncrease the predctve accuracy. In Torgo (1999) a large expermental evaluaton of local regresson trees over a wde range of problems was carred out. Ths large set of expermental comparsons confrmed that local regresson trees are sgnfcantly more accurate than the standard regresson trees. However, these new regresson models are computatonally more demandng and less comprehensble than standard regresson trees. We have also observed that local regresson trees could overcome some of the lmtatons of local modellng technques, partcularly ther lack of comprehensblty and rather hgh processng tme. These experments have shown that local regresson trees are sgnfcantly faster than local modellng technques. Moreover, through the ntegraton wthn a tree-based structure we obtan a comprehensble nsght of the approxmaton of these models. Fnally, we have also observed that the modellng bas resultng from combnng local models and partton-based approaches mproves the accuracy of local models n domans where there are strong dscontnutes n the regresson surface. PREDICTING THE DENSITY OF ALGAE COMUNITIES In ths secton we descrbe the steps followed n the applcaton of RT n the context of the 3 rd Internatonal ERUDIT Competton. As we have mentoned ths competton conssted of a data analyss task concernng the envronmental problem of determnng the state of rvers and streams by montorng and analysng certan measurable chemcal concentratons. The goal of the task was to obtan a model that should allow makng predctons concernng the frequency dstrbuton of seven algae. Wth ths purpose the organsers have delvered two data fles to each compettor. The frst contaned the tranng data consstng of 200 samples descrbed by 11 nput varable plus the observed algae frequency dstrbutons. Ths tranng fle was presented as a matrx wth 200 rows and 18 (11+7) columns. From the 11 nput varables (labelled as A,,K) three were categorcal, namely the season, the rver sze and the flud velocty. All remanng nput varables were numerc. Several rver samples contaned unknown nput varable values (sgnalled by XXXX ). The compettors were also gven a second fle contanng the test data consstng of another 140 rver samples. The compettors were only gven the values of the 11 nput varables of these test samples. There were also unknown varable values n the test cases. Thus the test data set corresponded to a matrx of 140 rows and 11 columns of values. The goal of the competton was to buld a model based on the tranng data that could provde predctons of the frequency dstrbutons of the 7 algae (labelled as a,,g) for the 140 test samples, whose true values were only known to the organsers. The compettors should present ther results as a matrx of predctons. The solutons proposed by the compettors were evaluated by the sum of the total squared errors between ther predctons and the true values. 9 Verson 5.1

7 PROBLEM SOLUTION USING RT Ths secton brefly descrbes the steps taken to produce a matrx of predctons based on models obtaned wth RT. The frst step that we carred out was to dvde the gven tranng data nto seven dfferent tranng fles. The dvson was done for each of the 7 dfferent algae. Ths means that we dealt wth ths problem as seven dfferent regresson tasks. For each of these seven tranng sets all nput varable values were the same, and only the target varables changed. The second step of our proposed soluton conssted of fllng-n the unknown varable values. For each tranng case wth an unknown varable value we have searched for the 10 most smlar cases. Ths smlarty was asserted usng an Eucldean dstance metrc. The medan value of the varable of these 10 cases was used to fll n the unknown varable value. The methodology used n RT to ntegrate local models wth regresson trees enables ths system to emulate a large number of regresson technques through smple parameter settngs. In effect, as the predctons of the local models used n the leaves are obtaned completely ndependently of the tree growth phase we can very easly try dfferent varants of local regresson. Moreover, we can even use these models on all tranng data that would correspond to the orgnal local models. Ths later alternatve s easly accomplshed by prunng too much the ntal tree untl a sngle leaf s reached. Ths means that RT can emulate several regresson technques lke standard regresson trees, local regresson trees and local modellng technques. Moreover, several of the mplemented local modellng technques can be used to emulate other regresson models. For nstance, a local lnear polynomal can behave lke a standard least squares lnear regresson model through approprate parameter tunng 10. In resume, RT hybrd character can be used to obtan the followng types of models: Standard least squares (LS) regresson trees. Least absolute devaton (LAD) regresson trees. Regresson trees (LS or LAD) wth least squares lnear polynomals n the leaves. Regresson trees (LS or LAD) wth kernel models n the leaves. Regresson trees (LS or LAD) wth k nearest neghbors n the leaves. Regresson trees (LS or LAD) wth least squares local lnear polynomals n the leaves. Regresson trees (LS or LAD) wth least squares partal lnear polynomals (models that ntegrate both parametrc and non-parametrc components) n the leaves. Least squares lnear polynomals (wth or wthout backward elmnaton to smplfy the models). Kernel models. K nearest neghbors. Local lnear polynomals Partal lnear polynomals. Havng so many varants n RT the next step of our analyss was to try to fnd out whch were the most promsng technques for each of the seven regresson tasks. Wth ths purpose we have carred out the followng experment. We have selected a large set of canddate varants of RT. For each of the seven tranng sets we have carred out the followng experment wth the goal of estmatng the predctve accuracy (measured by the Mean Squared Error) of each varant. The MSE was estmated usng a 10-fold Cross Valdaton experment. Each tranng set was randomly dvded nto 10 approxmately equally szed folds. For each fold, a model was bult wth the remanng nne and ts predcton error calculated n the fold. Ths process was repeated for the 10 dfferent folds and the predcton errors were averaged. Ths 10-fold Cross Valdaton process was repeated 10 tmes wth 10 dfferent random permutatons of each tranng set. The fnal score of each varant was obtaned by averagng over these 10 repettons. For each of the seven algae the most promsng varants of RT were collected together wth ther estmated predcton error. Ths model selecton stage mmedately revealed that the seven regresson tasks posed qute dfferent challenges. In effect, there was a large dversty of technques that were estmated as the most promsng dependng of the alga n consderaton. For each of the seven algae tranng sets we have used the respectve most promsng RT varants to obtan a model. These models were them appled n the gven 140 test cases and ther predctons collected. The fnal predctons that 10 In ths example t s enough to use an nfnte bandwdth (meanng that all tranng ponts contrbute to obtan the model) and a unform weghng functon (whch gves equal weghts to all tranng ponts).

8 were submtted as our solutons were obtaned by a weghed average of these predctons. To better llustrate ths weghng schema, magne that for alga a the most promsng RT varants and ther estmated predctve accuracy were: M, Err ; M, Err ; M, Err ; M, Err a M a The fnal predcton for a test case M a y = b M b c M c d M d x would be calculated as: ( x ) Err + M ( x ) Err + M ( x ) Err + M ( x ) M a b Err M a + Err M b M b c + Err M c + Err M c M d d Err M d where, x s a test case; y s the predcton for case x ; and M a ( x ) s the predcton of model M a for case x. DISCUSSION The soluton obtaned through the process descrbed before was declared by the jury of the ERUDIT competton as one of the three runner-ups wnners. After the announcement of the results the organsaton provded the true values of the test samples and ths enabled us to analyse n more detal the performance of our proposed data analyss technque. The total sum of squared error of our soluton was , whch corresponds to a MSE of At the tme of wrtng the scores of the other compettors are not known so t s a bt hard to nterpret the value of these numbers. Stll, we have collected other statstcs that could provde further nsghts on the qualty of the soluton. We have calculated the mean absolute devaton (MAD) of the predctons of RT 11. The followng graph shows the mnmun MAD, maxmum MAD and an nterval (represented by a box) between MAD±Standard Error, for all seven algae. Absolute Error a b c d e f g Algae Fgure 2: The Scores of our Soluton n term of Absolute Error. From ths fgure t s clear that the best results are obtaned for predctng the frequency of algae c, d and g, whle the worst predctons occur wth algae a and f. Stll, the overall Mean Absolute Error s whch s a relatvely low value Ths statstc s more meanngful as t s n the same scale as the goal varable. 12 The values of the algae dstrbutons range from 0 to 100 (although the maxmum value n the tranng samples was 89.8 for alga a).

9 Wth the true goal varable values we were able to confrm that the rankng of RT varants obtaned through repeated 10-fold Cross Valdaton was most of the tmes correct. We have also carred out some ntal experments to try to dentfy the man causes for the good performance of our soluton. These experments clearly show that the avalablty of dfferent varants was hghly benefct because worse results are obtaned when tryng the same varant over the seven regresson tasks. As an example, whle our multstrategy weghed soluton acheved an overall MSE of on all seven tasks, usng the best RT varant (Torgo, 1999), whch conssts of usng local regresson trees wth partal lnear models n the leaves, we obtaned a MSE of Moreover, the weghng schema does also brng some accuracy gans when compared to the alternatve of always selectng the predcton of the best estmated varant for each task 13. We thnk that usng more sophstcated model combnaton methodologes lke boostng (Freund, 1995) or baggng (Breman,1996) should provde even more nterestng accuracy results. CONCLUSIONS We have descrbed the applcaton of a regresson analyss tool named RT to an envronmental data used n the 3 rd Internatonal ERUDIT Competton. RT mplements a new regresson methodology called local regresson trees. Ths technque can be regarded as a combnaton of two exstng methodologes, regresson trees and local modellng. We have descrbed the ntegraton schema used n RT to obtan local regresson trees. We have shown how ths schema s mportant n obtanng a tool that s able to approxmate qute dfferent regresson surfaces. The results obtaned by RT n the competton provde further confdence on the correctness of our ntegraton schema. The methodology used to obtan a soluton for the competton was based on the assumpton that usng dfferent varants of RT on each of the seven regresson tasks of the competton provded better results than always usng the same regresson methodology. Our experments and the results of the competton confrmed ths hypothess. Moreover, averagng over the best varants dd also mprove the accuracy of the resultng predctons. Ths applcaton provdes further evdence towards the advantages of systems ncorporatng multple data analyss technques, and also towards the accuracy advantages of combnng predctons of dfferent models. REFERENCES Atkeson,C.G., Moore,A.W., Schaal,S., 1997, Locally Weghted Learnng, Artfcal Intellgence Revew, 11, Specal ssue on lazy learnng, Aha, D. (Ed.). Bellman,R., 1961, Adaptve Control Processes. Prnceton Unversty Press. Breman,L., Fredman,J.H., Olshen,R.A. & Stone,C.J., 1984, Classfcaton and Regresson Trees, Wadsworth Int. Group, Belmont, Calforna, USA. Breman,L., 1996, Baggng Predctors, Machne Learnng, 24, (p ). Kluwer Academc Publshers. Broadley,C., Utgoff,P., 1995, Multvarate decson trees, Machne Learnng, 19, Kluwer Academc Publshers. Cleveland, W., 1979, Robust locally weghted regresson and smoothng scatterplots, Journal of the Amercan Statstcal Assocaton, 74, Cleveland,W., Loader,C., 1995, Smoothng by Local Regresson: Prncples and Methods (wth dscusson), n Computatonal Statstcs. Fan,J., 1995, Local Modellng, n Encyclopdea of Statstcal Scence. Freund,Y., 1995, Boostng a weak learnng algorthm by majorty, n Informaton and Computaton, 121 (2), Gama,J., 1997, Probablstc lnear tree, n Proceedngs of the 14 th Internatonal Conference on Machne Learnng, Fsher,D. (ed.). Morgan Kaufmann. Hardle,W., 1990, Appled Nonparametrc Regresson. Cambrdge Unversty Press. 13 Whle the weghed average obtaned a total sum of squared error of (MSE=87.020), usng the predctons of the best varant we get a total of (MSE=89.296).

10 Haste,T., Tbshran,R., 1990, Generalzed Addtve Models. Chapman & Hall. Karalc,A., 1992, Employng Lnear Regresson n Regresson Tree Leaves, n Proceedngs of ECAI-92. Wley & Sons. Katkovnk,V., 1979, Lnear an nonlnear methods of nonparametrc regresson analyss, Sovet Automatc Control, 5, Moore,A. Schneder,J., Deng,K., 1997, Effcent Locally Weghted Polynomal Regresson Predctons, n Proceedngs of the 14 th Internatonal Conference on Machne Learnng (ICML-97), Fsher,D. (ed.). Morgan Kaufmann Publshers. Murthy,S., Kasf,S., Salzberg,S., 1994, A system for nducton of oblque decson trees, n Journal of Artfcal Intellgence Research, 2, Nadaraya, E.A., 1964, On estmatng regresson, Theory of Probablty and ts Applcatons, 9: Parzen,E., 1962, On estmaton of a probablty densty functon and mode, Annals Mathematcal Statstcs, 33, Qunlan, J., 1992, Learnng wth Contnuous Classes, n Proceedngs of the 5th Australan Jont Conference on Artfcal Intellgence. Sngapore: World Scentfc, Rosenblatt,M., 1956, Remarks on some nonparametrc estmates of a densty functon, Annals Mathematcal Statstcs, 27, Spegelman,C., 1976, Two technques for estmatng treatment effects n the presence of hdden varables: adaptve regresson and a soluton to Reersol s problem. Ph.D. Thess. Dept. of Mathematcs, Northwestern Unversty. Stone, C.J., 1977, Consstent nonparametrc regresson, The Annals of Statstcs, 5, Torgo,L., 1997a, Functonal models for Regresson Tree Leaves, n Proceedngs of the Internatonal Conference on Machne Learnng (ICML-97), Fsher,D. (ed.). Morgan Kaufmann Publshers. Also avalable n Torgo,L., 1997b, Kernel Regresson Trees, n Proceedngs of the poster papers of the European Conference on Machne Learnng (ECML-97), Internal Report of Faculty of Informatcs and Statstcs, Unversty of Economcs, Prague, ISBN: Also avalable n Torgo,L., 1998, A Comparatve Study of Relable Error Estmators for Prunng Regresson Trees, n Proceedng of the Iberoamercan Conference on Artfcal Intellgence, Coelho,H. (ed.), Sprnger-Verlag. Also avalable n Torgo,L., 1999, Inductve Learnng of Tree-based Regresson Models, Ph.D. Thess, to be presented at the Department of Computer Scence, Faculty of Scences, Unversty of Porto. Soon avalable n Watson, G.S., 1964, Smooth Regresson Analyss, Sankhya: The Indan Journal of Statstcs, Seres A, 26 :

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

A Semi-parametric Regression Model to Estimate Variability of NO 2

A Semi-parametric Regression Model to Estimate Variability of NO 2 Envronment and Polluton; Vol. 2, No. 1; 2013 ISSN 1927-0909 E-ISSN 1927-0917 Publshed by Canadan Center of Scence and Educaton A Sem-parametrc Regresson Model to Estmate Varablty of NO 2 Meczysław Szyszkowcz

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Adaptive Regression in SAS/IML

Adaptive Regression in SAS/IML Adaptve Regresson n SAS/IML Davd Katz, Davd Katz Consultng, Ashland, Oregon ABSTRACT Adaptve Regresson algorthms allow the data to select the form of a model n addton to estmatng the parameters. Fredman

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

Available online at   ScienceDirect. Procedia Environmental Sciences 26 (2015 ) Avalable onlne at www.scencedrect.com ScenceDrect Proceda Envronmental Scences 26 (2015 ) 109 114 Spatal Statstcs 2015: Emergng Patterns Calbratng a Geographcally Weghted Regresson Model wth Parameter-Specfc

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Dynamic Integration of Regression Models

Dynamic Integration of Regression Models Dynamc Integraton of Regresson Models Nall Rooney 1, Davd Patterson 1, Sarab Anand 1, Alexey Tsymbal 2 1 NIKEL, Faculty of Engneerng,16J27 Unversty Of Ulster at Jordanstown Newtonabbey, BT37 OQB, Unted

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set

Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Applcaton of Addtve Groves Ensemble wth Multple Counts Feature Evaluaton to KDD Cup 09 Small Data Set Dara Sorokna dara@cs.cmu.edu School of Computer Scence, Carnege Mellon Unversty, Pttsburgh PA 15213

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Relevance Assignment and Fusion of Multiple Learning Methods Applied to Remote Sensing Image Analysis

Relevance Assignment and Fusion of Multiple Learning Methods Applied to Remote Sensing Image Analysis Assgnment and Fuson of Multple Learnng Methods Appled to Remote Sensng Image Analyss Peter Bajcsy, We-Wen Feng and Praveen Kumar Natonal Center for Supercomputng Applcaton (NCSA), Unversty of Illnos at

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Understanding K-Means Non-hierarchical Clustering

Understanding K-Means Non-hierarchical Clustering SUNY Albany - Techncal Report 0- Understandng K-Means Non-herarchcal Clusterng Ian Davdson State Unversty of New York, 1400 Washngton Ave., Albany, 105. DAVIDSON@CS.ALBANY.EDU Abstract The K-means algorthm

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

A Fusion of Stacking with Dynamic Integration

A Fusion of Stacking with Dynamic Integration A Fuson of Stackng wth Dynamc Integraton all Rooney, Davd Patterson orthern Ireland Knowledge Engneerng Laboratory Faculty of Engneerng, Unversty of Ulster Jordanstown, ewtownabbey, BT37 OQB, U.K. {nf.rooney,

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

S.P.H. : A SOLUTION TO AVOID USING EROSION CRITERION?

S.P.H. : A SOLUTION TO AVOID USING EROSION CRITERION? S.P.H. : A SOLUTION TO AVOID USING EROSION CRITERION? Célne GALLET ENSICA 1 place Emle Bloun 31056 TOULOUSE CEDEX e-mal :cgallet@ensca.fr Jean Luc LACOME DYNALIS Immeuble AEROPOLE - Bat 1 5, Avenue Albert

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

Report on On-line Graph Coloring

Report on On-line Graph Coloring 2003 Fall Semester Comp 670K Onlne Algorthm Report on LO Yuet Me (00086365) cndylo@ust.hk Abstract Onlne algorthm deals wth data that has no future nformaton. Lots of examples demonstrate that onlne algorthm

More information

Multiblock method for database generation in finite element programs

Multiblock method for database generation in finite element programs Proc. of the 9th WSEAS Int. Conf. on Mathematcal Methods and Computatonal Technques n Electrcal Engneerng, Arcachon, October 13-15, 2007 53 Multblock method for database generaton n fnte element programs

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Additive Groves of Regression Trees

Additive Groves of Regression Trees Addtve Groves of Regresson Trees Dara Sorokna, Rch Caruana, and Mrek Redewald Department of Computer Scence, Cornell Unversty, Ithaca, NY, USA {dara,caruana,mrek}@cs.cornell.edu Abstract. We present a

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information