REG^2: A Regional Regression Framework for Geo-Referenced Datasets

Size: px
Start display at page:

Download "REG^2: A Regional Regression Framework for Geo-Referenced Datasets"

Transcription

1 REG^2: A Regonal Regresson Framework for Geo-Referenced Datasets Oner Ulv Celepckay Unversty of Houston Department of Computer Scence Houston, TX, onerulv@cs.uh.edu Chrstoph F. Eck Unversty of Houston Department of Computer Scence Houston, TX, ceck@cs.uh.edu ABSTRACT Tradtonal regresson analyss derves global relatonshps between varables and neglects spatal varatons n varables. Hence they lack the ablty to systematcally dscover regonal relatonshps and to buld better models that use ths regonal knowledge to obtan hgher predcton accuraces. Snce most relatonshps n spatal datasets are regonal, there s a great need for regonal regresson methods that derve regonal regresson functons that reflect dfferent spatal characterstcs of dfferent regons. Ths paper proposes a novel regonal regresson framework that frst dscovers nterestng regons showng strong regonal relatonshps between the dependent and the ndependent varables, and then bulds a predcton model wth a regonal regresson functon assocated wth each regon. Interestng regons are dentfed by runnng a representatve-based clusterng algorthm that maxmzes an externally plugged n ftness functon. In ths work, we propose two ftness functons: an R- squared based ftness functon and an AIC-based ftness functon to handle overfttng better. We evaluate our framework n two case studes; () dentfyng causes of arsenc contamnaton n Texas water wells and (2) Boston Housng dataset determnng spatally varyng effects of house propertes on house prces. We demonstrated that our framework effectvely dentfes nterestng regons and bulds better predcton systems that rely on regonal models. Categores and Subject Descrptors H.2.8. [Database Applcatons]: Spatal databases and GIS, Data Mnng General Terms Algorthms, Desgn, and Expermentaton Keywords Spatal Data Mnng, Regresson Analyss, Regonal Knowledge Dscovery, Regonal Regresson, Clusterng. Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage, and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. ACM GIS '09, November 4-6, Seattle, WA, USA (c) 2009 ACM ISBN /09/...$0.00. INTRODUCTION Regresson analyss has been extensvely used n many scentfc felds to dscover relatonshps and dependences among varables and many varatons of regresson analyss have been proposed n the lterature [2]. In spatal data mnng, regresson analyss can be used to dscover spatal relatonshps and to buld models for predcton. In geo-referenced datasets, most relatonshps exst at regonal level not global level. Snce often ths type of spatal or regonal relatonshps are not explctly represented n georeferenced datasets, one major task n spatal data mnng s to develop technques and methodologes to automate the extracton of nterestng and useful regonal patterns, and to buld better models that reflect the spatally varyng characterstcs of georeferenced data. Moreover, capturng regonal varaton wll lead to a deeper understandng of mportant relatonshps that are embedded n a spatal dataset. For example, a global lnear regresson analyss on housng prces n a cty would derve coeffcents that measures each attrbute s contrbuton to the prce of a house. However, coeffcents tend to vary from one regon to another; for example, the attrbute have_pool mght have a coeffcent of 9,000 n a cty wde regresson analyss whch ndcates that havng a pool adds $9,000 to the house value, when n realty dependng on the neghborhood, the pool adds value between $5,000 and $50,000 to the house prce, and ts contrbuton dffers regonally; e.g. t s much lower for houses n some underdeveloped parts of the cty. We clam that understandng these regonal varatons wll not only lead to more accurate predcton models, but wll also provde regonal background knowledge concernng whch attrbutes have a sgnfcant mpact on house prces n whch regons. Therefore t s desrable to develop methods that capture ths spatal varaton and extract regonal knowledge from spatal datasets. Ths type of regonal knowledge s crucal for doman experts who seek to obtan a deeper understandng of regonal varatons n relatonshps among varables n geo-referenced datasets. Some local regresson methods have been proposed n the lterature but most use pre-defned regon boundares lke zp codes, county lmts, or grd structures based on spatal coordnate systems. However, n spatal data mnng, regonal patterns often do not concde wth predefned geographc boundares and mportant spatal relatonshps mght not be dscovered due to dluton. For example, proxmty to a rver mght sgnfcantly add to the value of the house; but, f census blocks are used as regons, ths pattern wll not be detected especally f the pre-defned regons contan a lot of houses that are not very close to the rver. In general, such relatonshps can only be dscovered by methods that seek for regons on ther own that reflects the underlyng

2 structure of the data, e.g. regons that contan houses wth smlar relatonshps between dependent and ndependent varables. Ths paper proposes a Regonal Regresson framework called REG^2 (pronounced as REG-squared) that focuses on dscoverng regonal regresson functons that are assocated wth contguous areas n the subspace of the spatal attrbutes whch we call regons. Fgure shows an example of dscovered regons along wth ther regonal regresson functons and regon representatves where the regonal regresson functon for regon 0 s gven as an example. Frst, nterestng regons are dscovered by runnng a representatve-based clusterng algorthm that maxmzes an externally plugged-n ftness functon; next, regonal knowledge s extracted from the obtaned subspaces (regons). We developed two ftness functons: a R-squared-based ftness functon (RsqFtness) and an AIC-based (Akake s Informaton Crteron) ftness functon (AICFtness). The two ftness functons are used to gude the search for regons wth strong regonal lnear relatonshps between the response varable and the ndependent varables. 2. RELATED WORK 2. Regresson Analyss and AIC Regresson Analyss has been extensvely used n many scentfc felds to dscover lnear relatonshps and dependences among varables and many varatons of regresson analyss exsts n the lterature [2]. OLS s the best known of all regresson technques, whch provdes a global model of the varable of the nterest that needs too be understood or predcted. R 2 s a measure of the extent to whch the total varaton of the dependent varable s explaned by the model. In general, ncreasng the number of ndependent varables nvolved n regresson wll lead to a hgher R 2 value. R 2 alone s not a good measure for goodness of ft of a model snce t only deals wth the bas of the regresson model, and gnores the complexty of a model and ts assocated varance, and s therefore prone to overfttng. Consequently, several nformaton crtera have been developed that take the complexty of the employed model nto consderaton, wth. Akake's Informaton Crteron (AIC)[] beng one of the most popular nformaton crtera to measure the goodness of ft of an estmated statstcal model. AIC provdes a balance between bas and varance, and s estmated usng the followng formula: AIC = 2 log L( θ ) + 2k () where L( θ ) s the maxmzed lkelhood functon, θ s the maxmum lkelhood estmate of the parameter vector θ under the model and k s the number of the ndependent parameters of the model. Assumng that the errors are normally dstrbuted, the AIC formula becomes; AIC = 2 k + n[ln(2 π. SSE / n) + ] (2) Fgure. Dscovered Regons and Regresson Functons The man contrbutons of the paper nclude:. A regonal regresson framework that employs representatvebased clusterng to dscover nterestng regons and ther assocated regonal regresson functons, wthout usng any predefned regon boundares. 2. An AIC-based and an Rsq-based ftness functon to gude the search for regons wth hghly accurate regresson functons. 3. A correlaton-based methodology to evaluate the performance of the employed ftness functons and to select ftness functon parameters automatcally. 4. An expermental evaluaton of the framework n a case study that centers on ndentfyng causes of arsenc contamnaton n Texas water wells and on Boston Housng dataset determnng spatally varyng effects of house propertes on house prces. The remander of the paper s organzed as follows: In secton 2, we dscuss related work. Secton 3 provdes a detaled dscusson of our regon dscovery framework, the AIC-based ftness functon and R-squared based ftness functon. Secton 4 presents the expermental evaluaton, and secton 5 concludes the paper. where SSE s the Resdual Sum of Squares whch s called as RSS n some lterature and n s the number of objects. McQuarre and Tsa [9] defne a specal varaton of AIC u for small regons; SSE n + k AICu = ln + n k n k 2 Increasng the number of free parameters to be estmated mproves the goodness of ft, regardless of the number of free parameters n the data generatng process. Hence AIC not only rewards goodness of ft, but also ncludes a penalty that s an ncreasng functon of the number of estmated parameters. Ths penalty dscourages overfttng. The preferred model s that wth the lowest AIC value. In summary, the AIC methodology attempts to fnd the model that best explans the data wth a mnmum of free parameters. 2.2 Regresson Trees Regresson Trees are another local statstcal predcton model whch recursvely parttons data nto small parttons and then ft a smple model to these small parttons. The early Classfcaton and Regresson Tree (CART) algorthm [4] selects the splt varable and splt value that mnmzes the weghted sum of the varances of the target values n the two subsets. The selecton of frst attrbute to splt n regresson trees dramatcally affects the resulted regons and that causes lack of flexblty. Snce data s splt greedly usng a top-down approach, regons n regresson trees are rectangular. Regresson trees also am to fnd local statstcs, but our approach s more flexble snce t employs an externally plugged-n ftness functon to be maxmzed rather than evaluaton varance of splttng on a sngle attrbute lke (3)

3 regresson trees employ and also performs wder non-greedy search; moreover, shapes of regons that can be dscovered by our approach can be convex polygons, whch represent Vorono cells whereas regresson trees are lmted to dscovery rectangle shape regons snce they dscover regons by recursvely splttng trees nto 2 sub-trees n a top down fashon. Our approach, on the other hand, searches for the optmal set of regons teratvely by modfyng regon representatves maxmzng an external, plug-n ftness functon. 2.3 Geographcally Weghted Regresson Geographcally weghted regresson (GWR) [5] s an nstancebased, local spatal statstcal technque used to analyze spatal non-statonarty. GWR generates maps used n explorng and nterpretng spatal non-statonarty. Instead of calbratng a sngle regresson equaton, GWR generates a separate regresson equaton for a set of observaton ponts that are usually determned usng a grd structure. Each equaton s calbrated usng a dfferent weghtng of the observatons contaned n the data set, based on the proxmty of observatons to observaton ponts. Each GWR equaton may be expressed as: y (, ) (, ) = β0 u v + βk u v xk + ε (4) where (u,v ) denotes the coordnates of the th pont and β k (u, v ) s a realzaton of contnuous functon β k (u, v) at pont [2]. Usng OLS, the parameters for a lnear regresson model can be obtaned by solvng: k β = ( X T X) T X Y Smlarly, the parameter estmates for GWR may be solved usng a weghtng scheme: T T (, ) ( (, ) ) (, ) (5) β u v = X W u v X X W u v y (6) The weght assgned to each observaton s based on a dstance decay functon (w j ) centered on observaton. Recently, GWR has been used n many research works for nvestgatng a varety of topcal areas, ncludng clmatology [5], urban poverty [8], envronmental justce [20], and the ecologcal nference problem [7]. GWR s smlar to our approach especally n amng to capture the spatal varance of attrbutes over space whch n GWR s called spatal non-statonarty. Our approach, on the other hand, does not use any observaton ponts, predefned grd structures but t dscovers polygon-shaped regons. In general, GWR assumes autocorrelaton and wll have problems for spatal datasets where patterns change sharply. For example, f we have two neghborng regons that are characterzed by very dfferent regresson functons, GWR wll have sgnfcant errors near the boundary of the two regons, because t uses multple regresson functons of nearby observaton ponts and not a sngle regon-specfc regresson functon as does our approach. In general, n our approach a regresson functon s assgned to each regon where coeffcent estmates are smlar, reflectng the exstence of smlar pattern of dependency between the dependent and ndependent varable n a partcular regon. 2.4 Cluster-Wse Regresson (CLR) The cluster-wse regresson technque ncorporates cluster analyss nto the OLS regresson analyss. The smplest cluster-wse regresson s a 2-cluster lnear regresson, whch was ntroduced by Spath [25, 26, 27], and was further developed by other researchers. In summary, nstead of usng the classcal homogenety or separaton crteron, cluster-wse regresson s based on the accuracy of a lnear regresson model assocated to each cluster usng the sum of squared predcton errors for each object n a cluster. Ths technque has many applcatons, due to ts beng well suted to market segmentaton and product prcng [2, 6]. The mathematcal formulaton of CLR s: gven; n K m 2 e = k ( jk j + 0k ) (7) = k= j= Mn z y b x b K zk = =... n and k = z {0,} =... n k =... K. k where x j and y are respectvely the value of the varable j and the value of the dependent varable for the observaton. b jk s the j th regresson coeffcents for the cluster k and z j s a bnary varable that equals f and only f the observaton belongs to cluster k. n s the number of observatons, m the number of ndependent varables consdered and K the number of clusters. The man objectve of CLR s to mnmze the sum of squared predcton error for each object usng the equaton of the cluster that the object belongs to. Our approach s smlar but our framework tres to mnmze AIC of each regon rather than only the predcton error and capablty. Moreover, our approach searches for the optmal number of regons rather that assumng that the correct number of regons s known n advance and supports arbtrary, plug-n ftness functons whch provde flexblty and extensblty. More mportantly, as demonstrated n [6] by Brusco et al., CLR has tremendous potental for overfttng snce mnmzng the sum of the error sums of squares for the wthncluster regresson models makes no effort to dstngush the error explaned by the wthn-cluster regresson models from the error explaned by the clusterng process. 3. METHODOLOGY We now ntroduce the components of our regonal regresson framework, shown n Fgure 2. The framework dscovers nterestng regons by runnng a representatve-based clusterng algorthm that maxmzes an externally plugged n ftness functon. Regonal representatves and regonal regresson functons are assocated wth regons to support predcton. Fgure 2. Regonal Regresson (REG 2 ) Framework

4 3. Regon Dscovery Framework We employ the regon dscovery framework that was proposed n [3, 4]. The objectve of regon dscovery s to fnd nterestng places n spatal datasets regons occupyng contguous areas n the spatal subspace. In ths work, we extend ths framework to extract regonal regresson functons. The framework ncorporates doman knowledge nto doman-specfc plug-n ftness functons that are maxmzed by the clusterng algorthm. The framework employs a reward-based evaluaton scheme to evaluate the qualty of the dscovered regons. Gven a set of regons R={r,,r k } wth respect to a spatal dataset O ={o,,o n }, the ftness of R s defned as the sum of the rewards obtaned from each regon r j (j =,,k): k q( R) = sum[ ψ ( r, j)] = ( r )* sze( r ) β (8) j= where (r j ) s the nterestngness of the regon r j a quantty based on doman nterest, reflectng the degree to whch the regon s newsworthy. Ftness functons are the core components n the framework, as they capture a doman expert s noton of nterestngness. The framework seeks for a set of regons R such that the sum of rewards over all of ts consttuent regons s maxmzed. In general, the parameter β controls how much premum s put on regon sze. The sze (r j ) β component n q(r), (β ) ncreases the value of the ftness nonlnearly wth respect to the number of objects n the regon r j. ψ(r,j) s the reward of the regon and regon reward s proportonal to ts nterestngness, but gven two regons wth the same value of nterestngness, a larger regon receves a hgher reward to reflect a preference gven to larger regons. Rewardng regon sze non-lnearly ensures mergng neghborng regons whose coeffcent estmates are smlar reflectng exstence of smlar pattern of spatal varance of attrbutes wthn each regon. 3.2 CLEVER Algorthm We employ the CLEVER [3] clusterng algorthm to fnd nterestng regons n the expermental evaluaton. CLEVER s a representatve-based clusterng algorthm that forms clusters by assgnng objects to the closest cluster representatve and seeks for the optmal set of representatves wth respect to q(r). The algorthm starts wth a randomly created set of representatves and employs randomzed hll clmbng by samplng s neghbors of the current clusterng soluton as long as new clusterng solutons mprove the ftness value. To battle premature convergence, the algorthm employs re-samplng: f none of the s neghbors mproves the ftness value, then t more solutons are sampled before the algorthm termnates. After CLEVER termnates regon representatves and ther assocated regresson functons are returned. 3.3 R-squared Ftness Functon (RsqFtness) The natural queston n assessng the estmated model s: How well does t ft the data? In our framework we need a ftness functon that wll be optmzed by the clusterng algorthm that reflects the man characterstc of good regresson models: hgh accuracy. In regresson analyss one of the most popular evaluaton measures s R-squared (R 2 ). Our approach uses the followng R 2 -based nterestngness measure: j j Defnton : (Rsq-based Interestngness Rsq (r) ) SSE f n MnRegSze Rsq ( r)= SST f n < MnRegSze 0 R 2 mathematcally s equal to -SSE/SST. SSE s Sum of Squares of Resduals or Errors and SST s Total Sum of Squares and they are defned as: n 2 SSE = ( y y ) and 2 SST = ( y y) The RsqFtness functon then becomes: k q ( R)= ( r )* sze( r ) β Rsq Rsq j j j= n (9) (0) MnRegonSze s a controllng parameter to battle the tendency towards havng very small sze regons wth maxmal varance n regresson analyss so that we do not end up wth regons wth a few objects that have very hgh R 2 values. The experments conducted usng the R-sq-based ftness functon suggest that usng R-sq alone frequently leads to lower predcton accuraces on unseen example due to overfttng. We need a better model selecton crteron to balance the tradeoff between bas and the varance. Therefore we developed another ftness functon that s based on an nformaton crteron, namely AIC. 3.4 AIC-based Ftness Functon (AICFtness) Akake s Informaton Crteron (AIC) s one of the most commonly used measures of goodness of an estmated model. We prefer to use AIC because t takes model complexty nto consderaton; moreover, we beleve AIC takes the number of observatons more effectvely nto consderaton; there are many varatons of AIC ncludng AICu whch s used for small sze data whch we beleve s good ft for small sze regons. Snce lower AIC value ndcates a better model and our framework tres to maxmze the ftness functon, we wll use /AIC as the nterestngness measure. We use AICu f the number of object s less than a predefned threshold th, as defned n equaton (3). Our work employs the nterestngness measure n defnton 2 to assess the strength of relatonshps between the dependent varable and the ndependent varables n a regon r (also see Secton 2. for further explanaton): Defnton 2: (AIC-based Interestngness AIC (r) ) f n th 2 k + n[ln(2 π. SSE / n) + ] AIC ( r)= f n < th SSE n + k ln + n k n k 2 AICFtness functon then becomes: k q ( R)= ( r )* sze( r ) β AIC AIC j j j= () (2) Our ftness functon repeatedly apples Regresson analyss durng the search for the optmal set of regons, mnmzng the AIC value n that regon. Havng an externally plugged n AIC-based ftness functon enables the clusterng algorthm to probe for the

5 optmal parttonng and encourages the mergng of two regons that exhbt structural smlartes. 3.5 The Predcton Schema The value of the dependent varable for an object s determnes as follows: frst, the model fnds the closest regon representatve usng a -nearest neghbor query and then usng the regresson functon assocated wth ths regon t predcts the value of the dependent varable. The pseudo-code of ths schema s llustrated n fgure 3. Β O r s the regresson ntercept value for the regon r and β j r s the regresson equaton slope coeffcent for the ndependent varable j n regon r. SSE_TE s the Resdual Sum of Squares of testng-set and SSE_TR s the Resdual Sum of Squares of tranng-set. for each object n new_set { -fnd closest regon representatve; -retreve the regresson functon coeffcents assocated wth the regon (β 0r and β jr for all j s); -estmate the predcted value of dependent varable (ŷ) by usng these coeffcents; -usng observed value of dependent varable(y) and predcted value (ŷ) estmate SSE_TE(REG 2 ); -do same usng global model SSE_TE(GL) } - output the regonal and global RSS (SSE) for new_set Fgure 3. The pseudo-code for predcton schema 4. EXPERIMENTAL EVALUATION 4. Objectves and Desgn of the Experments One mportant objectve of expermental evaluaton s to evaluate the benefts of employng representatve-based clusterng to dscover regonal regresson functons. Another mportant objectve s to compare the performances of varous ftness functons, such as AICFtness and RsqFItness functons wth respect to predcton accuracy. In Secton 4.2, a correlaton-based methodology for ftness functon parameter selecton s ntroduced. We also assess how the regons found usng our approach compare wth regons that are selected randomly. In order to evaluate the true performance of our methodologes and to make far comparsons, our expermental benchmark s composed of four steps. Frst, we apply OLS regresson on the global data. Then we use our framework to dscover regons and regonal regresson functons usng both R-sq-based ftness functon and AIC-based ftness functon. We compare the R 2 value of newly dscovered regons vs. global R 2 and also, more mportantly, the Resdual Sum of Squares (SSE) of new results vs. global SSE to determne the total accuracy mprovement. One could argue that any method that dvdes data nto subregons ncreases the R 2 value and reduces the SSE value, snce smaller numbers of objects are nvolved. Ths s true to some extent but to show that the regons dscovered by our framework are sgnfcantly better than randomly selected regons, we also run experments where regons are randomly dscovered wthout usng any ftness functons. Ths step s repeated fve tmes and the average of the SSE s taken to be far to the random regon dscovery. So each dataset was evaluated usng the followng four benchmarks for each parameter settng: ) Global Regresson, 2) Random Regons, 3) R-sq Ftness Functon, 4) AIC-based ftness functon. We have used 5-fold cross valdaton to determne predcton accuracy n the experments. The SSE values presented n tables or fgures represent the values from all 5 folds. 4.2 Parameter Selecton Methodology Utlzng AIC addresses overfttng wthn a regon, but we stll must deal wth global overfttng, whch can be attrbuted to havng too many regons n regonal regresson model. The ftness functon parameter β s used to control the number of regons to be dscovered and thus overall model complexty. The challenge s to fnd a good value for β for a gven dataset that strkes the rght balance between underfttng and overfttng for a gven dataset. By choosng larger β values model complexty can be penalzed, f ths desrable for the partcular applcaton. Ths rases the queston how we can determne good values for β and other ftness functon parameters whch s the subject of the remander of ths secton. In order to fnd best parameter combnatons we have developed a parameter selecton methodology whch s based on gvng preference for ftness functons that demonstrate a hgh negatve correlaton between regon reward and the regon predcton error on unseen data. N- fold cross valdaton s used to assess the predcton error on tranng and test data. Below, we descrbe n how ths correlaton (and others nvolvng tranng accuracy) are computed n detal: Let - ψ be the regon reward functon - CRV the set of tranng-set test-set pars to be used n n-fold cross-valdaton - O(r) be the set of objects belongng to regon r - SSE_TE(r) be the test-set sum of squares error of r - SSE_TR(r) be the tranng-set sum of squares error of r - r be the number of tranng set objects belongng to r FOR EACH (tranng-set, test-set)-par n CRV DO. Compute Regons R usng tranng-set; 2. FOR EACH regon r R DO a. determne number of objects of the test-set that belong to r; b. determne SSE error for tranng set and test set objects n r c.store(ψ(r),sse_te(r)* r ) & TS-OBJECTS(r),SSE_TR(r)) 3. Compute correlatons between the stored entres n the table Bascally we determne for each test set of each foldng whch objects belong to whch regon, and then we determne the predcton error for those objects usng SSE and compute the correlaton 2 between regonal rewards and regonal predcton errors whch are expected to be negatve snce hgher rewardedregons are expected to have lower errors. Table and Table 2 report average correlatons between regonal rewards and regonal predctons errors usng dfferent values of β. Some normalzaton have to performed to cope wth sze dscrepances between tranng set objects belongng to a regon (whch determne rewards) and test set objects belongng to a regon (whch determne the error on unseen example). 2 Addtonally, we compute correlatons between regonal tranng accuracy and regonal testng accuracy and regon rewards.

6 Table. Reward & Predcton Error Correlatons n Arsenc Arsenc Dataset Beta Values AICFtness Functon Corr(Reward, SSE_TR) Corr(Reward, SSE_TE) Table 2. Reward & Predcton Error Correlatons - Housng Boston Housng Data Beta Values AICFtness Functon Corr(Reward, SSE_TR) Corr(Reward, SSE_TE) The correlaton estmates suggest that there s a negatve correlaton among regonal reward assgned to a regon and ts predcton error whch s an ndcaton of the capablty of our framework to dentfy hghly correlated regons that capture and mnmze the spatal varaton among attrbutes. Table 3 lsts experments that wll be dscussed n the remander of the paper, whose parameters have been selected usng the prevously descrbed parameter selecton methodology. The datasets used n the experments are both real datasets and are explaned next. Table 3. Fnal set of experments and parameters used Common parameters Beta Values(β) Ftness Functon Dataset Exp#.0 RsqFtness Arsenc Exp# 2.03 RsqFtness Arsenc Exp# 3.05 RsqFtness Arsenc Exp# 4. RsqFtness Arsenc Exp# 5.0 AICFtness Arsenc Exp# 6.03 AICFtness Arsenc Exp# 7.05 AICFtnes Arsenc Exp# 8. RsqFtness Arsenc Exp# 9.03 Both Boston Housng Exp# 0.05 Both Boston Housng Exp#. Both Boston Housng Exp# 2.7 Both Boston Housng 4.3 A Real World Case Study: Texas Water Wells Arsenc Project Arsenc s a deadly poson, and long-term exposure to even very low arsenc concentratons can cause cancer [28]. Therefore, t s extremely crucal to understand factors that cause hgh arsenc concentratons to occur. In partcular, we are nterested n dentfyng other attrbutes that contrbute sgnfcantly to the varance of arsenc concentraton. Datasets used n the experments were created usng the Texas Water Department Ground Water Database [28] that samples Texas water wells regularly. The datasets were generated by cleanng out duplcate, mssng and nconsstent varables and aggregatng the arsenc amount when multple samples exst. Our dataset has 3 spatal and 0 non-spatal attrbutes. Longtude, Lattude and Aqufer ID are the spatal attrbutes and Arsenc(As), Molybdenum(M), Vanadum(V), Boron(B), Fluorde(F), Slca(SO2), Chlorde(Cl), Sulfate(SO4) are 8 of the non-spatal attrbutes whch are chemcal concentratons. The other 2 nonspatal attrbutes are Total Dssolved Solds (TDS) and Well Depth (WD). The dataset has,653 objects. 4.4 Boston Housng Data-Corrected We also evaluated our framework usng the corrected verson of Boston Housng Data whch contans 506 census tracts of Boston from the 970 census. We use the corrected verson snce ths verson ncludes addtonal spatal nformaton ncludng Longtude and Lattude. The dataset was taken from the StatLb lbrary[27] mantaned at Carnege Mellon Unversty. We used MEDV the medan value of homes n USD 000's as the target (dependent) varable and 8 of other varables as ndependent varables. These varables nclude CRIM-crme rate, NOX-ntrc oxdes concentraton, RM- average number of rooms, AGE-proporton of owner-occuped unts bult pror to 940, RAD-ndex of accessblty to radal hghways, TAX-property tax rate, PTRATIO-pupl-teacher rato, B-black proporton populaton, and LSTAT- percentage of lower status of the populaton. We omtted some less mportant attrbutes from ths regresson. 4.5 Arsenc Dataset Results 4.5. SSE Improvements The total Sum of Square of Resduals (SSE) for the global model, Rsq ftness, AIC ftness and the model wth randomly dscovered regons s shown n fgure 4. Ths SSE values are the Sum of Square of Resduals estmated from the tranng data. As shown n the fgure, both models wth R-sq ftness functon and AIC-based ftness functon reduce SSE sgnfcantly. For example, where global SSE s 76,34 the model wth RsqFtness model reduces t to 3,578 and the model wth AIC-ftness reduces t to 26,974. SSE that the model usng randomly dscovered regons produces s 52,76 whch stll s an mprovement over global model as expected snce lower number of objects are nvolved but both our models outperforms Random model as well. In order to llustrate the SSE mprovement better, the mprovements n percentages are shown n table 4. Res d ual S u m S quar es (S S E ) 80K 70K 60K 50K 40K 30K 20K 0K K. 0 Beta Global Random R-sq Ftness A I C -Ftness Fgure 4. SSE values of four models

7 Table 4. SSE Improvements over Global Model Beta Values Random R-sq Ftness AIC- Ftness.0 3% 59% 65%.03 22% 64% 68%.05 26% 57% 69%. 32% 37% 69% The results show that usng randomly selected regons slghtly mproves the accuracy whch s expected due to nvolvement of less objects. But both the models generated by our framework reduce predcton error sgnfcantly. RsqFtness reduces the error by a percentage of hgh 50s and AICFtness model reduces by almost 70% and produces the best results. The comparson of these three models s sketched n fgure 5 for comparson. Table 5. R 2 value mprovements usng REG 2 Regon R 2 Value Sze Texas 68.2% 655 Regon % 95 Regon % 24 Regon 96.34% 20 Regon % 00 Regon % 3 Regon % 25 Regon % 92 Regon % 22 Regon % 30 Regon 6 9.6% % 7 0 % 6 0 % 5 0 % Improvement n SSE 4 0 % 3 0 % 2 0 % 0 % 0 % Random R-sq Ftness A I C -Ftness Beta Values Table 6. R 2 value mprovements usng Random Regons Regon R 2 Value Sze Texas 68.2% 655 Regon % 47 Regon % 26 Regon % 26 Regon % 80 Regon % 34 Regon % 42 Regon % 33 Regon 73.70% 52 Regon % 34 Regon % 32 Fgure 5. SSE mprovements over Global Regresson R 2 and AIC value mprovements The dscovered regons llustrate great mprovements n R 2 values. R 2 value was for the global data whch means 68.2% of the arsenc varaton can be explaned by other 7 chemcal varables for Texas-wde data. The R 2 values for the top 0- ranked regons n experment 2 are gven n Table 5. As can be seen from the table, our framework dscovers regons that show hgh correlaton; and n these 0 regons, on average 95% of arsenc varaton can be explaned usng other chemcal varables. For example, R 2 value ncreased from 68% to 98.% n Regon 34 wth 95 water wells, whch ndcates that n ths regon there exst stronger correlatons between arsenc and the chemcals. The R 2 mprovements wth random regon model are also estmated for each experment. There s some mprovement as expected snce the regons are much smaller than global, but mprovements are much less than the mprovements provded by the RsqFtness model. The R 2 values for the top 0-ranked regons usng Random regon model n experment 2 are gven n Table 6 as an example and they wll not be provded for other experments due to lmted space. The hgh R 2 values of regons n Table 5 ndcates that our framework successfully dscovers regons along wth ther regonal regresson functons whch represent better model compared to lnear regresson appled to global data. The average R 2 values of regons n Table 5 s whch means on average only around 80% of the arsenc varaton can be explaned by other chemcals. Besdes, the hgh R 2 value of Regon 37 and Regon 9 domnate the average, otherwse the mprovement observed n all other regons can be accepted as nsgnfcant. We now provde the AIC value mprovements n Table 7. Table 7. AIC value mprovements n Arsenc Regon AIC Value Sze Texas, Regon Regon Regon Regon Regon

8 Agan snce lower AIC value ndcates better model, n ftness functon /AIC was the ftness value to be maxmzed. AIC values n dscovered regons show great mprovement compared to global AIC. Snce the AICFtness model provdes best accuracy as far as Resdual Sum of Squares goes as shown n Table 4, these mprovements also ndcate the regons of captures spatal varaton and mnmze the varaton wthn each regon (clusters). 4.6 Boston Housng Dataset Results 4.6. SSE Improvements The SSE values of Boston Housng data for the 4 models descrbed prevously s shown n fgure 6. As shown n the fgure, both models wth R-sq ftness functon and AIC-based ftness functon reduce SSE sgnfcantly. For example n an experment wth Beta value of.0, global SSE s 0,444 and the model wth R-sq ftness reduces t to 2,056 and the model wth AIC-ftness reduces t to,96. SSE that the model usng randomly dscovered regons produces s 4,65 for these experments and more than 7,000 n 2 of experments. SSE mprovements n percentages are shown n table 8. Res d ual S u m S quar es (S S E ) 2,000 0,000 8,000 6,000 4,000 2, Beta Values Global Random R-sq Ftness A I C -Ftness Fgure 6. SSE Values of 4 models for Boston Housng Data Table 8. SSE_TR mprovements n Boston Housng Data Beta Values Random Regons R-sq Ftness AIC- Ftness.0 60% 8% 82%.03 58% 77% 80%. 3% 70% 8%.7 27% 55% 83% The SSE_TR mprovements n the Boston Housng dataset experments are much better than Arsenc Dataset snce ths dataset has more correlaton and more spatal varance so the search for regons to mnmze AIC or maxmze Rsq value provdes better regons and as a result better predcton accuracy. As far as SSE_TE s concerned Boston Housng Data also performs better than Arsenc dataset. Due to space lmtaton all results are not provded but table 9 exemplfes SSE_TE(REG 2 ) and SSE_TE(GL) calculatons whch was descrbed n predcton schema n secton 3.5. As shown n the table, our framework dscovers regons and ther regonal regresson coeffcents that perform better predcton compared to the global model. In many regons better predcton s provded and the percentage of such regons s also provded n Table 9. Some regons wth very hgh predcton error reduces the overall predcton accuracy mprovement but stll there s a sgnfcant 27% reducton n testng error whch s open for mprovement for future work. Table 9. SSE_TE mprovements n Boston Housng Data β SSE % of regons SSE_TE SSE_TE (GL) (REG 2 Improvement predcton better ). 7,82 2,566 27% 72%.7 20,028 4,799 26% 65% 4.7 Dscusson of Regonal Characterstcs Global and regonal regresson results show that the relatonshp of the arsenc concentraton wth other chemcal concentratons spatally vares and s not constant over space, whch provdes a motvaton for regonal knowledge dscovery. In other words, there are sgnfcant dfferences n arsenc concentratons n water wells across varous regons n Texas. Some of these dfferences are found to be due to the varyng mpact of the ndependent varables on the arsenc concentraton. In order to exemplfy ths we wll compare the global regresson model wth the regresson functon of a partcular regon. The result of the global regresson and regonal regresson of regon 0 are shown n Tables 0 and, respectvely whch followed by a dscusson these results. Table 0. Regresson Result for Global Data As Coef. Std.Er t P> t Mo V B Fl SO Cl SO const R-squared Value: 68% Adjusted R-squared Value 68% The global OLS regresson result suggests that Molybdenum, Vanadum, Boron, and Slca ncrease the arsenc concentraton, but Sulfate and Fluorde decrease t Texas-wde. Moreover, Chlorde (Cl) and Sulfate (SO4) are not sgnfcant for global predctors of arsenc concentraton but n some regons for example regon 0 they are sgnfcant. Conversely, Boron, Fluorde, and Slca are globally sgnfcant and hghly correlated wth arsenc, but ths s not the case n regon 0. Ths nformaton s crucal to doman experts who seek to determne the controllng

9 factors for arsenc polluton, as t can help reveal hdden regonal patterns and specal characterstcs for ths regon. For example, n ths regon, hgh arsenc level s hghly correlated to hgh Sulfate and Chlorde levels, whch s an ndcaton of external factors that play a role n ths regon, such as a nearby chemcal plant or toxc waste. Our framework s able to successfully detect such hdden regonal assocatons between attrbutes. The global and regonal regresson results show that the relatonshp of arsenc concentraton wth other chemcal concentratons spatally vares, and s not constant over space. In addton, there are unexplaned dfferences that are not accounted for by our ndependent varables, whch mght be due to external factors, such as toxc waste or the proxmty of a chemcal plant Table. Regresson Result for Regon 0 As Coef. Std.Er t P> t Mo V B Fl SO Cl SO const R-squared Value 95.03% Adjusted R-squared Value 93.73% 4.8 Implementaton Platform and Effcency The components of the framework descrbed n ths paper were developed usng an open-source, Java-based data mnng and machne learnng framework called Cougar^2[9], whch has been developed by our research group []. All experments were performed on a machne wth.79 GHz of processor speed and 2GB of memory. The parameter β s the most mportant factor n determnng the run tme. The run tmes of the experments wth respect to the β values used are shown n Fgure 7. RunTme (n mlseconds) 9,000 8,500 8,000 7,500 7,000 6,500 6,000 5,500 5,000 4,500 4,000 3,500 3,000 2,500 2,000,500, β Arsenc RsqFtness A r s e n c A I C F t n e s s B o s t o n H o u s ng R s q F t ne s s Boston Housng AICFtness Fgure 7. Run Tmes vs. β Values In arsenc dataset the β value of.05 takes longer than. or.5 but t provdes better results as far as SSE mprovements and better testng and tranng set error correlaton are concerned. So the tme s consumed n explorng better solutons. The Boston Housng dataset experments take less tme manly because of two reasons: ) t has fewer objects 2) more mportantly as results provded n secton 4 suggest ths dataset has more correlaton and more spatal varance so the search for regons to mnmze AIC or maxmze Rsq value takes less tme compared to Arsenc data. We observed that more than 80% of the computatonal resources are allocated for determnng regonal ftness values when dscoverng regons. Even though our framework repeatedly apples regresson analyss to each explored regon combnaton untl no further mprovement occurs, t s stll effcent compared to approaches n whch regresson s appled and models were compared that many tmes usng other statstcal tools. 5. CONCLUSION Ths paper proposes a novel regresson framework for spatal datasets, whch focuses on dscoverng regonal regresson functons that are assocated wth contguous areas n the subspace of the spatal attrbutes that we call regons. Unlke other research that employs local or global regresson, our approach emphaszes regonal patterns and provdes a comprehensve methodology to help dscover them. The proposed framework dscovers regons by employng representatve-based clusterng algorthms that maxmze AIC-based or Rsq-based ftness functons; next, regonal knowledge (regresson functons n our case) s extracted from the obtaned regons. In general, dfferent dscovered regons capture dfferent relatonshps between dependent and ndependent varables. Ths regonal knowledge s crucal for doman experts for understandng the underlyng structure of the data. We also developed a generc correlaton-based methodology to evaluate and select ftness functons and parameters of ftness functons for a gven dataset. Ths capablty s crtcal to help deal wth overfttng n regonal regresson for more accurate predcton. Compared to competng approaches, our approach does not assume that regons are a pror gven, and provdes sophstcated clusterng technques to fnd good regons; moreover, we provde a methodology to deal wth overfttng and for selectng ftness functons that are sutable for the gven data. We clam that none of the competng approaches address both ssues n ther proposed frameworks. Our proposed framework was tested and evaluated n a real world case study that analyzes regonal correlaton patterns among arsenc and other chemcal concentratons n Texas water wells. The framework was also tested on the corrected verson of the Boston Housng dataset. We demonstrated that our framework can effectvely and effcently dentfy hghly correlated regons, along wth the regonal regresson functons whch capture the spatal varaton of attrbutes better than global models and bulds better models for predcton. We plan to develop other versons of AICFtness functon where the ftness functon s scaled to penalze bad regons more harshly. We also plan to nvestgate other nformaton crtera lke ICOMP [3] and BIC [23] and regonal regresson approaches that use valdaton sets to get a better handle on overfttng.

10 6. REFERENCES [] Akake, H. Informaton theory and an extenson of the maxmum lkelhood prncple. In B.N. Petrov and F. Csake (eds.), Second Internatonal Symposum on Informaton Theory. Budapest: Akadema Kado, , 973 [2] Appled Regresson Analyss, Wley Seres n Probablty and Statstcs; Fox, J. (997). [3] Bozdogan, H. ICOMP: a new model-selecton crteron. In Classfcaton and Related Methods of Data Analyss, H. H. Bock (Ed.), Elsever Scence Publshers, Amsterdam; , 988. [4] Breman, L., Fredman, J., Olshen, R., Stone, C.: Classfcaton and Regresson Trees, Wadsworth, Belmont, CA, 984. [5] Brunsdon, C., McClatchey, J. and Unwn, D. (200). Spatal varatons n the average ranfall alttude relatonshps n Great Brtan: an approach usng geographcally weghted regresson, Internatonal Journal of Clmatology, 2, [6] Brusco, M. J., Cradt, J. D., Stenley, D., & Fox, G. L. (2008). Cautonary remarks on the use of clusterwse regresson. Multvarate Behavoral Research, 43 (), [7] Calvo, C. and Escolar, M. (2003). The local voter: a geographcally weghted approach to ecologcal nference, Amercan Journal of Poltcal Scence, 47, [8] Choo, J.,Jamthapthaksn, R., Sheng Chen, C., Celepckay, O.U., Gust,C., Eck, C.F.: MOSAIC: A proxmty graph approach for agglomeratve clusterng. The 9th Int l Conference on Data Warehousng & Knowledge Dscovery (DaWaK 2007),Germany, 2007 [9] Cougar^2 Framework, [0] Cresse, N.: Statstcs for Spatal Data (Revsed Edton). New York: Wley, 993. [] Data Mnng and Machne Learnng Group, Unversty of Houston, [2] Dng, W., Jamthapthaksn. R., Parmar R., Jang D., Stepnsk, T., and Eck, C.F., Towards Regon Dscovery n Spatal Datasets, n Proc. Pacfc-Asa Conference on Knowledge Dscovery and Data Mnng (PAKDD), Osaka, Japan, May [3] Eck, C.F., Parmar, R., Dng, W., Stepnsk, T., Ncot, J.P.: Fndng Regonal Co-locaton Patterns for Sets of Contnuous Varables n Spatal Datasets, n Proc. 6th ACM SIGSPATIAL Internatonal Conference on Advances n GIS (ACM-GIS), Irvne, Calforna, November [4] Eck, C.F., Vaezan, B., Jang D.,, J.: Dscoverng of nterestng regons n spatal data sets usng supervsed clusterng. Proc. of the 0th European Conference on Prncples of Data Mnng and Knowledge Dscovery, Berln, Germany, [5] Fotherngham, A.S., Brunsdon, C. & Charlton, M.: Geographcally Weghted Regresson: The Analyss of Spatally Varyng Relatonshps. John Wley, 2002 [6] J.M. Aurfelle. A bo-mmetc clusterwse regresson algorthm for consumer segmentaton. In Advances n Computatonal Management, pages 45 63, 998. [7] Johnson,R.A.: Appled Multvarate Analyss. Englewood Clffs, N.J. : Prentce Hall, c992 [8] Longley, P. A. and Tobon, C. (2004). Spatal dependence and heterogenety n patterns of hardshp: an ntra-urban analyss, Annals of the Assocaton of Amercan Geographers, 94, [9] McQuarre A. D., and Tsa, C. (998), "Regresson and Tme Seres Model Selecton", World Scentfc Publshng Co. Pte. Ltd., Rver Edge, NJ. [20] Menns, J. and Jordan, L. (2005). The dstrbuton of envronmental equty: explorng spatal non-statonarty n multvarate models of ar toxc releases, Annals of the Assocaton of Amercan Geographers, 95, [2] M.J. Brusco, J. D Cradt, and S. Stahl. A smulated annealng heurstc for a bcrteron parttonng problem n market segmentaton. Journal of Marketng Research, 39:99 09, [22] Openshaw, S "Geographcal data mnng: key desgn ssues". GeoComputaton 99: Proceedngs Fourth Internatonal Conference on GeoComputaton, Mary Washngton College, Fredercksburg, Vrgna, USA, July 999. [23] Schwarz, G. (978). Estmatng the dmenson of a model. Annals of Statstcs, 6, [24] Spath, H., Clusterwse lnear regresson, Computng, 22 (4), , 979. [25] Spath, H., Correcton to Algorthm 39: Clusterwse Lnear Regresson. Computng, 26, 275, 98. [26] Spath, H., Algorthm 48: A Fast Algorthm for Clusterwse Lnear Regresson. Computng, 29, 75-8,982. [27] STATLIB Probablty and Statstcs Lbrary [28] Texas Water Development Board, [29] Woolrdge, J.: Econometrc Analyss of Cross-Secton and Panel Data. MIT Press, 2002, pp. 30, 279,

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

Available online at   ScienceDirect. Procedia Environmental Sciences 26 (2015 ) Avalable onlne at www.scencedrect.com ScenceDrect Proceda Envronmental Scences 26 (2015 ) 109 114 Spatal Statstcs 2015: Emergng Patterns Calbratng a Geographcally Weghted Regresson Model wth Parameter-Specfc

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

A Semi-parametric Regression Model to Estimate Variability of NO 2

A Semi-parametric Regression Model to Estimate Variability of NO 2 Envronment and Polluton; Vol. 2, No. 1; 2013 ISSN 1927-0909 E-ISSN 1927-0917 Publshed by Canadan Center of Scence and Educaton A Sem-parametrc Regresson Model to Estmate Varablty of NO 2 Meczysław Szyszkowcz

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset Under-Samplng Approaches for Improvng Predcton of the Mnorty Class n an Imbalanced Dataset Show-Jane Yen and Yue-Sh Lee Department of Computer Scence and Informaton Engneerng, Mng Chuan Unversty 5 The-Mng

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

EVALUATION OF THE PERFORMANCES OF ARTIFICIAL BEE COLONY AND INVASIVE WEED OPTIMIZATION ALGORITHMS ON THE MODIFIED BENCHMARK FUNCTIONS

EVALUATION OF THE PERFORMANCES OF ARTIFICIAL BEE COLONY AND INVASIVE WEED OPTIMIZATION ALGORITHMS ON THE MODIFIED BENCHMARK FUNCTIONS Academc Research Internatonal ISS-L: 3-9553, ISS: 3-9944 Vol., o. 3, May 0 EVALUATIO OF THE PERFORMACES OF ARTIFICIAL BEE COLOY AD IVASIVE WEED OPTIMIZATIO ALGORITHMS O THE MODIFIED BECHMARK FUCTIOS Dlay

More information

Data Mining For Multi-Criteria Energy Predictions

Data Mining For Multi-Criteria Energy Predictions Data Mnng For Mult-Crtera Energy Predctons Kashf Gll and Denns Moon Abstract We present a data mnng technque for mult-crtera predctons of wnd energy. A mult-crtera (MC) evolutonary computng method has

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Fuzzy Logic Based RS Image Classification Using Maximum Likelihood and Mahalanobis Distance Classifiers

Fuzzy Logic Based RS Image Classification Using Maximum Likelihood and Mahalanobis Distance Classifiers Research Artcle Internatonal Journal of Current Engneerng and Technology ISSN 77-46 3 INPRESSCO. All Rghts Reserved. Avalable at http://npressco.com/category/jcet Fuzzy Logc Based RS Image Usng Maxmum

More information

Understanding K-Means Non-hierarchical Clustering

Understanding K-Means Non-hierarchical Clustering SUNY Albany - Techncal Report 0- Understandng K-Means Non-herarchcal Clusterng Ian Davdson State Unversty of New York, 1400 Washngton Ave., Albany, 105. DAVIDSON@CS.ALBANY.EDU Abstract The K-means algorthm

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2 Pa. J. Statst. 5 Vol. 3(4), 353-36 A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN Sajjad Ahmad Khan, Hameed Al, Sadaf Manzoor and Alamgr Department of Statstcs, Islama College,

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

STING : A Statistical Information Grid Approach to Spatial Data Mining

STING : A Statistical Information Grid Approach to Spatial Data Mining STING : A Statstcal Informaton Grd Approach to Spatal Data Mnng We Wang, Jong Yang, and Rchard Muntz Department of Computer Scence Unversty of Calforna, Los Angeles {wewang, jyang, muntz}@cs.ucla.edu February

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Analysis of Malaysian Wind Direction Data Using ORIANA

Analysis of Malaysian Wind Direction Data Using ORIANA Modern Appled Scence March, 29 Analyss of Malaysan Wnd Drecton Data Usng ORIANA St Fatmah Hassan (Correspondng author) Centre for Foundaton Studes n Scence Unversty of Malaya, 63 Kuala Lumpur, Malaysa

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information