A Generalized Methodology for Data Analysis

Size: px

Start display at page:

Download "A Generalized Methodology for Data Analysis"

Melanie Bradford
5 years ago
Views:

1 > < A eneralzed Methodology for ata Analyss Plamen Angelov, Fellow, IEEE, Xaowe, and Jose Prncpe, Fellow, IEEE Abstract Based on a crtcal analyss of data analytcs and ts fondatons, we propose a fnctonal approach to estmate data ensemble propertes, whch s based entrely on the emprcal observatons of dscrete data samples and the relatve promty of these ponts n the data space and hence named emprcal data analyss (EA). The ensemble fnctons nclde the non-parametrc sqare centralty (a measre of closeness sed n graph theory) and typcalty (an emprcally derved qantty whch resembles probablty). A dstnctve featre of the proposed new fnctonal approach to data analyss s that t does not assme randomness or determnsm of the emprcally observed data, nor ndependence. The typcalty s derved from the dscrete data drectly n contrast to the tradtonal approach where a contnos probablty densty fncton (pdf) s assmed a pror. The typcalty s epressed n a closed analytcal form that can be calclated recrsvely and, ths, s comptatonally very effcent. The proposed non-parametrc estmators of the ensemble propertes of the data can also be nterpreted as a dscrete form of the nformaton potental (known from the nformaton theoretc learnng theory as well as the Parzen wndows). Therefore, EA s very stable for the crrent move to a data-rch envronment where the nderstandng of the nderlyng phenomena behnd the avalable vast amonts of data s often not clear. We also present an etenson of EA for nference. The areas of applcatons of the new methodology of the EA are wde becase t concerns the very fondaton of data analyss. Prelmnary tests show ts good performance n comparson to tradtonal technqes. Inde Terms data mnng and analyss, machne learnng, pattern recognton, probablty, statstcs. C I. ITROUCTIO URRETY, there s a growng demand n Machne earnng, Pattern Recognton, Statstcs, ata Mnng and a nmber of related dscplnes broadly called ata Scence, for new concepts and methods that are centered on the actal data, the evdence collected from the real world rather than at theoretcal pror assmptons whch need to be frther confrmed wth the epermental data (e.g the assan assmpton). The core of the statstcal approach s the Manscrpt receved Jly 06. Ths work was partally spported by The Royal Socety grant IE439/04 ovel Machne earnng Paradgms to address Bg ata Streams. Plamen P. Angelov and Xaowe are wth School of Comptng and Commncatons, ancaster Unversty ancaster, A 4WA. (e-mal: {p.angelov,.g3}@lancaster.ac.k) José C. Príncpe s wth Comptatonal eroengneerng aboratory, epartment of Electrcal and Compter Engneerng, Unversty of Florda, USA. (e-mal: prncpe@cnel.fl.ed) defnton of a random varable,.e. a fnctonal measre from the space of events to the real lne, whch defnes the probablty law [] [4]. The probablty densty fncton (pdf) s, by defnton, the dervatve of the cmlatve dstrbton fncton (cdf). It s well known that dfferentaton can create nmercal problems n both practcal and n theoretcal aspects and s a challenge for fnctons whch are not analytcally defned or are comple. In realty, we sally do not have ndependent and dentcally dstrbted (d) events, bt we do have correlated, nterdependent (albet n a comple and often nknown manner) data from dfferent eperments whch complcates the procedre. The appeal of the tradtonal statstcal approach s ts sold mathematcal fondaton and the ablty to provde garantees of performance, when data s plenty (), and created from the same dstrbton that was hypotheszed n the probablty law. The actal data s sally dscrete (or dscretzed), whch n tradtonal probablty theory and statstcs are modeled as a realzaton of the random varable, bt one does not know a pror ther dstrbton. If the pror data generaton hypothess s verfed, good reslts can be epected; otherwse ths opens the door for many falres. Even n the case that the hypotheszed measre meets the realzatons, one has to address the dfference of workng wth realzatons and random varables, whch brngs the sse of choosng estmators of the statstcal qanttes necessary for data analyss. Ths s not a trval problem, and s seldom dscssed n data analyss. The smple determnaton of the probablty law (the measre of the random varable) that eplans the collected data s a hard problem as stded n densty estmaton [] [3]. Moreover, f we are nterested n statstcal nference, for nstance, smlarty between two random varables sng mtal nformaton, the problem gets even harder becase dfferent estmators may provde dfferent reslts [5]. The reason s that very lkely the fnctonal propertes of the chosen estmator do not preserve all the propertes emboded n the statstcal qantty. Therefore, they behave dfferently n the fnte (and even n the nfnte) sample case. An alternatve approach s to proceed from the realzatons to the random varables, whch s the reverse drecton of the statstcal approach. The lteratre has several ecellent eamples of ths approach, n the area of measres of assocaton. For nstance, Pearson s correlaton coeffcent s perfectly well defned n realzatons, as well as n random varables. kewse, Spearman s [6], Kendal s [7], are other eamples of measres of assocaton well defned n both the realzaton and the random varables. However, the problem wth ths approach s that the statstcal propertes of the

2 > < measres n the random varables are not drectly known, and may not be easly obtaned. A good eample of the latter s the generalzed measre of assocaton, whch s well defned n the realzatons, bt not all of the propertes are known n the random varables [8]. Therefore, there are advantages and dsadvantages n each approach, bt from a practcal pont of vew, the non-parametrc approach s very appealng becase we can go beyond the framework of statstcal reasonng to defne new operators and stll cross-valdate the soltons wth the avalable data sng non-parametrc hypothess tests. A good eample s least sqares verss regresson. One can always apply least sqares to any data type, determnstc or stochastc. If the data s stochastc the solton s called regresson, bt the reslt wll be the same, becase the atocorrelaton fncton s a property of the data, ndependent of ts type. The dfference shows p only n the nterpretaton of the solton; most mportantly, the statstcal sgnfcance of the reslt can only be assessed sng regresson. A more recent alternatve s to appromate the dstrbtons sng non-parametrc, data-centered fnctons, sch as partcle flters [9], entropy-based nformaton-theoretc learnng [5], etc. On the other hand, partally tryng to address the same problems, n 965. Zadeh ntrodced fzzy sets theory [0], whch completely departed from obectve observatons and moved (smlarly to the belef-based theory [8] ntrodced a bt later) to the sbectvst defnton of ncertanty. A later strand of fzzy set theory (data drven approach developed manly n 990s) attempted to defne the membershp fnctons based on epermental data. It stands n between probablstc and fzzy representatons [], however, ths approach reqres an assmpton on the type of membershp fncton. An mportant challenge s the posteror dstrbton appromaton. Appromate nference can be done employng mamm a posteror crtera whch reqres comple optmzaton schemes nvolvng, for eample, the epectaton mamzaton algorthm [] [3]. In ths paper, we present a systematc methodology of non-parametrc estmators recently ntrodced n [] [5] for dscrete sets sng ensemble statstcal propertes of the data derved entrely from the epermental dscrete observatons and etend them to contnos spaces. These nclde the cmlatve promty (q), centralty (C), sqare centralty (q - ), standardzed eccentrcty ( ), densty ( ) as well as typcalty, () whch can be etended to contnos spaces, resemblng the nformaton potental obtaned from Parzen wndows [] [4] n Informaton Theoretc earnng (IT) [5]. Its dscrete verson sms p to whle ts contnos verson ntegrates to and s always postve; however, ts vales are always less than nlke the pdf vales that can be greater than. Addtonally, the typcalty s only defned for feasble vales of the ndependent varable whle the pdf can etend to nfeasble vales, e.g. negatve heght, dstance, weght, absolte temperatre, etc. nless specfcally constrant [] [5]. We frther consder dscrete local () and global ( ) versons. Then, we ntrodce an atomatc procedre for dentfyng the local modes/mama of as well as a procedre for redcng the amont of the local mama/modes and etend the non-parametrc estmators to the contnos doman by ntrodcng the contnos global densty, and typcalty,, whch frther nvolves ntegral for normalzaton. Frthermore, we demonstrate that the contnos global typcalty does ntegrate to eactly as the tradtonal pdf (whle beng free form the restrctons the latter has). Ths s a new and sgnfcant reslt whch makes contnos global typcalty an alternatve to the pdf. Ths strengthens the ablty of the emprcal data analyss (EA) framework for obectvely nvestgatng the nknown data pattern behnd the data and opens p the framework for nference. The methodology s eemplfed wth a aïve EA classfer based on. II. THEORETICA BASIS - ISCRETE SETS In ths secton, we start by presentng EA fondatons n dscrete sets []-[5] for completeness and frther clarty. K Frstly, let s consder a real metrc space R and assme a K,,..., R ; wth partclar data set or stream T,,,,...,, K ;,,,, where sbscrpts denote data samples (for a set) or the tme nstances when they arrve (for a stream). Wthn the data set/stream, some data samples may repeat more than once, namely,,. The set of sorted nqe data samples, denoted by,,..., (where, ) and the nmber of occrrence, denoted by f f, f,..., f can be determned atomatcally based on the data. Wth and f, the prmary data set/stream can be reconstrcted. In the remander of ths paper, all the dervatons are condcted n the n th tme nstance ecept when specfcally declared otherwse. The most obvos K choce of R, s the Ecldan space wth the Ecldean dstance, bt we can also etend EA defntons to Hlbert spaces, and Reprodcng Kernel Hlbert spaces. We can, moreover, consder dfferent types of dstances wthn these spaces motvated by the prposes of the analyss that eplot nformaton avalable from the sorce that generated the samples or defntons that are approprate for data analyss. Wthn EA, we ntrodce: a) cmlatve promty, q [] [5]; q b) sqare centralty, ; c) eccentrcty, ξ [] [5]; d) standardzed eccentrcty, ε [] [5]; e) dscrete local densty, [] [5]; f) dscrete local typcalty, [4], [5]; g) dscrete global typcalty, [4], [5]; h) contnos local densty, ; ) contnos global densty,, and ) contnos global typcalty,.

3 > < 3 The dscrete global typcalty, addresses the global propertes of the data and wll be ntrodced n the net secton. For nference, the contnos local ( ), global densty ( ) and the contnos global typcalty, ( ) wll be descrbed n detal n secton IV. A. Cmlatve Promty and Sqare Centralty For every pont ;,,..., one may want to qantfy how close or smlar ths pont s to all other data ponts from. In graph theory, centralty s sed to ndcate the most mportant vertces wthn a graph. A measre of centralty [6], [7] s defned as a sm of dstances from a pont to all other ponts: c ; ; ; () d, where, d s the dstance/smlarty between and, whch can be, bt not lmted to Ecldean, Mahalanobs, cosne, etc. Its mportance comes from the fact that t provdes centralty nformaton abot each data sample n a scalar or vector form. We prevosly defned [] [5] the cmlatve promty q as, q d, ; ; () whch can be seen as nverse centralty wth a sqare dstance Cmlatve promty [] [5] s a very mportant assocaton measre derved emprcally from the observed data wthot makng any pror assmptons abot ther generaton model and plays a fndamental role n dervng other EA qanttes. The complety for comptng the cmlatve promtes of all samples n s O. As a reslt, the comptatonal complety of other EA qanttes for, whch can be derved drectly from cmlatve promty s O. For many types of dstance/smlarty,.e. Ecldean dstance, Mahalanobs dstance, cosne smlarty, etc., wth whch the cmlatve promty can be calclated recrsvely [4], the complety for calclatng the cmlatve promtes O as well. of all the samples n s redced to In a very smlar manner, we can consder sqare centralty as the nverse of the cmlatve promty, defned as follows: q ; (3) d, B. Eccentrcty The eccentrcty,, defned as a normalzed cmlatve promty, s another very mportant assocaton measre derved emprcally from the observed data wthot makng any pror assmptons abot ther generaton model [] [5]. It qantfes data samples away from the mode, sefl to represent dstrbton tals and anomales/otlers. It s derved by normalzng q and takng nto accont all possble data samples. It plays an mportant role n anomaly detecton [4], [5] as well as for the estmaton of the typcalty as t wll be detaled below. The eccentrcty ( ) of a partclar data sample [] [5]: n the set ( ) s calclated as follows d, q q d, h h ; where the coeffcent s nclded to normalze eccentrcty between 0 and,.e.: 0 (5) Here, we also ntrodce standardzed eccentrcty, ε, whch does not decrease as fast as eccentrcty wth the ncrease of the amont of data, and s calclated as follows: q ; q Based on the epresson of the standard eccentrcty (namely, eqaton (6)) one can see that the data samples whch are far away from the maorty tend to have hgher standard eccentrcty vales compared wth others. Ths, the standard eccentrcty can serve as an effectve measre of the tal of data dstrbton wthot the need of clsterng the data n advance. Combnng the standard eccentrcty wth the well-known Chebyshev neqalty [8], whch dscrbes the probablty that certan data sample s more than n ( denotes the standard devaton) dstance away from the mean, we get the EA verson of the Chebyshev neqalty as follows [], [4]: P n (7) n The Chebyshev neqalty epressed by the standard eccentrcty provdes a more elegant form for anomaly detecton. For eample, f 0, has eceeded the 3 lmtaton, and can be categorzed as an anomaly. C. screte ocal ensty screte local densty s defned as the nverse of standardzed eccentrcty and plays an mportant role n data analyss sng EA (,,..., ; ): q d, l l q d, l l For eample, f the Ecldean dstance s sed, the densty can be epressed as (,,..., ; ): (4) (6) (8)

4 > < 4 where X T s the mean of ; T X s the mean of and X can be pdated recrsvely sng [9]: (9) ; k k k k ; ; k k k,,..., k T T X k X k k k ; X ; k k As we can see from eqaton (9), the dscrete local densty tself can be vewed as a nvarate Cachy fncton whle there s no assmpton or any pre-defned parameter nvolved n the dervaton besdes the defnton of the dstance fncton (Ecldean dstance sed here).. screte ocal Typcalty screte local typcalty was frstly ntrodced n [3], and called nmodal typcalty. In ths paper, t s redefned as the normalzed local densty (,,..., ; ): q q (0) The dscrete local typcalty resembles the tradtonal nmodal probablty mass fncton (pmf), bt t s atomatcally defned n the data spport nlke the pmf whch may have non-zero vales for nfeasble vales of the random varable nless specfcally constrant. The dscrete local densty resembles membershp fnctons of a fzzy set havng vale of for = whle the dscrete local typcalty resembles pmf wth the sm of vales beng eqal to and vales for both and beng from the nterval [0,]. As an eample, the sqare centralty, standardzed eccentrcty, dscrete local densty and typcalty of real clmate dataset (wnd chll and wnd gst) measred n Manchester, UK for the perod [0] are presented n Spplementary Fg.. In these eamples, Ecldean dstance s sed. III. THEORETICA BASIS: ISCRETE OBA TYPICAITY In ths secton, we wll consder the more realstc case when data dstrbtons are mltmodal. Tradtonally, ths reqres dentfyng local peaks/modes by clsterng, epectaton mamzaton, optmzaton, etc. [] [3], [] [3]. Wthn EA, the dscrete global typcalty (τ ) s derved atomatcally from the data wth no ser npt and can qantfy mltmodalty. It s based on the local cmlatve promty, sqare centralty, eccentrcty and standardzed eccentrcty. The only reqrements to defne the dscrete global typcalty are the raw data and the type of dstance metrc (whch can be any). A. screte lobal Typcalty Epressons (9)-(0) provde defntons of local operators that are very approprate to qantfy the peak pont ( ) of nmodal dscrete fnctons. Moreover, f the peak concdes wth the global mean ( ), then the vale of the local densty s eqal to :. A smlar property havng a mamm, thogh ts vale s, s also vald for the tradtonal probablty by defnton and accordng to the central lmt theorem [] [3]. In realty, data dstrbtons are sally mltmodal [] [4], therefore the local descrpton shold be mproved. In order to address ths sse, the tradtonal probablty theory often nvolves mtre of nmodal dstrbtons, whch reqres estmaton of nmber of modes and t s not easy [4]. Wthn the EA framework, we provde the dscrete global typcalty, τ, drectly from the dataset, whch provdes mltmodal dstrbtons atomatcally wthot the need of ser decsons and only reqres a threshold for robstness aganst otlers. The dscrete global typcalty of a nqe data sample s epressed as a combnaton of the normalzed dscrete local densty weghted by the correspondng freqency of occrrence of ths nqe data sample (,,..., ; ) : f f f q where q and fq () are the sqare centralty and the dscrete local densty of a partclar data sample, calclated from only. Ths epresson s very fndamental, becase, n fact, t combnes nformaton abot repeated data vales and the scatterng across the data space, and resembles the well-known membershp fnctons of fzzy sets. We frther eplan ths lnk n a pblcaton that s crrently nder revew [5]. a) Hstogram b) screte global typcalty Fg.. Hstogram and dscrete global typcalty of the real clmate data [0] sng Ecldean dstance

> < 5 One can easly apprecate from Fg., the dfferences between the and hstogram wth a qantzaton step eqal to 5 for both dmensons.

The sze of the grd/as s a ser-specfed parameter. The hstogram takes only vales from a fnte set 0; ; ; ;, whle can take any real vale.

or problem-specfc threshold and parameters; v) s free from some peclartes of tradtonal probablty theory (ts vale never gets and non-zero postve for nfeasble vales [4], [5]) ; v) can be recrsvely

5 > < 5 One can easly apprecate from Fg., the dfferences between the and hstogram wth a qantzaton step eqal to 5 for both dmensons. ote that, the hstogram reqres the selecton of one parameter (the qantzaton step) per dmenson, whle none s needed for the dscrete global typcalty. For large dmensons (), ths can be a bg problem. The sze of the grd/as s a ser-specfed parameter. The hstogram takes only vales from a fnte set 0; ; ; ;, whle can take any real vale. The dscrete global typcalty has the followng propertes: ) sms p to ; ) the vale s wthn 0, ; ) provdes a closed analytc form, eqaton (); v) there s no reqrement for pror assmptons as well as any ser- or problem-specfc threshold and parameters; v) s free from some peclartes of tradtonal probablty theory (ts vale never gets and non-zero postve for nfeasble vales [4], [5]) ; v) can be recrsvely calclated for varos types of metrcs. When all the data samples n the dataset have dfferent vales ( f ; ), and the hstogram qantzaton step parameter s not properly set, the hstogram s nable to show any sefl nformaton, whle the dscrete global typcalty can stll show the mtal dstrbton nformaton of the dataset, see Fg. (a) and (b). Ths s a maor advantage of dscrete global typcalty becase t s parameter free. Here the fgres are based on the nqe data samples of the same clmate dataset. As we can see, the data samples whch are closer to the mean of the dataset wll have hgher vale of global typcalty and vce versa. It s also nterestng to notce that for eqally dstant data, the dscrete global typcalty, s eactly the same as the freqentstc form of probablty. Then eqaton () redces to (a) Hstogram wth very small qantzaton Fg. Hstogram and dscrete global typcalty for the nqe data samples f f. Spplementary Fg. shows a smple eample of the dscrete global typcalty and pmf of an artfcal clmate dataset 50 wth only data of wnd chll, whch have nqe data samples, 0;0 ( o C ), whle f 50 0; (b) screte global typcalty, Obvosly, q q d, 0, and o C ; o C 0.6. Indeed, f 0 tmes we observe wnd chll s 0 o C and 30 tmes 0 o C the lkelhood for wnd chll of 0 o C wll be 40% and for wnd chll of 0 o C wll be 60%, respectvely. The dscrete global typcalty 00 of the otcome of throwng dces for 00 tmes s presented n Spplementary Fg. 3 as an addtonal llstratve eample. In ths eperment, for, we can se ; 0; 0; 0; 0; 0; T, for, we can se T 0; ; 0; 0; 0; 0;, etc. et the otcome of throwng dces tmes be f 6 ; ; ; ; ;, the vales of the dscrete global typcalty of the s otcomes are eqal to ther correspondng freqences, see the Spplementary Fg. 3. B. Identfyng ocal Modes of screte lobal Typcalty In ths sb-secton, an atomatc procedre for dentfyng all local mama of the dscrete global typcalty, defned n the prevos sb-secton wll be descrbed. It reslts n the formaton of data clods (samples assocated wth the local mama) [9], [6]. ata clods are free shape whle clsters, are sally hyper-sphercal, hyper-ellpsodal. Ths data parttonng resembles Vorono tessellaton [7]. They are also sed n the AnYa type nero-fzzy predctve [9], [6], classfers and controllers. The llstratve fgres n ths secton are based on the same clmate dataset [0] that was sed earler n Fg., whch has two featres/attrbtes: wnd chll ( o C ) and wnd gst (mph). In all cases, the Ecldean dstance s sed, thogh, the prncple s vald for any metrc. The proposed algorthm can be smmarzed as follows: Step : Identfyng the global mamm of the dscrete global typcalty For every nqe data sample of the dataset dscrete global typcalty, ts (,,..., ) can be calclated sng eqaton (). The data sample wth the hghest s selected as the reference data sample n the ranked collecton : () arg ma (),,...,

> < 6 () where s the data sample wth the hghest vale of dscrete global typcalty (n fact, the global mamm), and we set m (). In case when there are more than one mama, we can start wth any one of them.

6 > < 6 () where s the data sample wth the hghest vale of dscrete global typcalty (n fact, the global mamm), and we set m (). In case when there are more than one mama, we can start wth any one of them. Step : Rankng the dscrete global typcalty Then, we fnd the nqe data sample that s nearest to m denoted by from. () and pt t nto, meanwhle, remove t () s set to be the global mamm m (). The rankng operaton contnes by fndng the net data sample, whch s closest to, m, pttng t nto removng t from and settng t as the new global mamm. By applyng the rankng operaton ntl becomes empty, we can fnally get the ranked nqe data samples, ( denoted as ),,..., and ther correspondng ranked dscrete global typcalty collecton: () (),,..., ( ). Step 3: Identfyng all local mama The ranked dscrete global typcalty s fltered sng eqaton (3) to detect all local mama of : A IF THE s a local mama of (3) We denote the set of the local mama (can be sed as a bass for formng data clods and, frther, AnYa type fzzy rle-based models [9], [6]) of as the set ( ),,..., P ; P s the nmber of the dentfed local mama and P. The ranked dscrete global typcalty s depcted n Fg. 3(a), the correspondng local mama are depcted n Fg. 3(b). (a) Ranked dscrete global typcalty Fg.3. Identfyng local mama of the dscrete global typcalty, P Step 4: Formng data clods Each local mama,, s then set as a prototype of a data clod. All other data ponts are assgned to the nearest prototype (local mamm) formng data clods sng eqaton (4). wnnng label arg mn d ( ), (4),,..., P ata clods can be sed to form AnYa models [9], [6]. After all the data samples wthn are assgned to the data clods, the center (mean), the standard devaton and spport S (,,..., P ) per clod can be calclated. Step 5: Selectng the man local mama of the dscrete global typcalty, We then calclate at the data clod centers, denoted by sng eqaton () wth the corrpesondng spports as ther freqences. Then, we se the followng operaton to take ot the less promnent local mama. For each center, we check the condton (,,,..., P ; ): IF A THE R (5) Ths condton means that f there s another center wth hgher located wthn the area of, ths new more promnent center replaces the estng one. Ths condton garantees that the nflence areas of neghborng data clods wll not overlap sgnfcantly (t s well known that accordng to the Chebyshev neqalty for arbtrary dstrbton the maorty of the data samples (>75%) le wthn dstance from the mean [] [3]). By fndng ot all the centers satsfyng the above condton, we get the fltered data clod and assgnng them to R centers denoted by by ecldng R from P (b) ocal mama/peaks/modes of P R,,..., P ; P P P and ( P R ), where P s the nmber of remanng P centers. After that, we set P, P P P and repeat Steps 4-5 ntl the data clod centers do not change any more. Fnally, we can get the composed reslt, re-named as o, and se the o as the prototypes to

> < 7 Fg.4. Fnal flterng reslt (The black denotes the centers of the data clods, the data samples from dfferent data clods are plotted wth dfferent bld colors) data clods sng eqaton (4).

7 > < 7 Fg.4. Fnal flterng reslt (The black denotes the centers of the data clods, the data samples from dfferent data clods are plotted wth dfferent bld colors) data clods sng eqaton (4). The fnal data clod centers for each selecton rond s presented n the Spplementary Vdeo, whch can also be downloadable from: _Vdeo.ppt?dl=0. The fnal reslt s presented n Fg. 4. Compared wth Fg. 3(b), n the fnal rond, there are only two man modes left broadly correspondng to the two man seasons n orthern England and all the detals are fltered ot. Even f f,, the dscrete global typcalty can stll be etracted sccessflly from the data samples, despte the fact that the reslt may not be eactly the same becase of the changng data strctre, see Spplementary Fg. 4, whch ses the same real clmate dataset n Fg. 4. The smmary of atomatc mode dentfcaton algorthm s as follows. Atomatc mode dentfcaton algorthm:. Calclate,,,..., sng eqaton ();. Fnd the nqe data sample sng eqaton ();. Send and delete () nto () v. m ; v. Whle () wth global mamm of () and nto () from ; Fnd the nqe data sample(s) whch s/are nearest to m ; Send the data sample(s) and the correspondng nto and ; elete data sample(s) from ; Set the latest element n v. End Whle v. Flter and P as P m ; sng eqaton (3) and obtan as centers of data clods; P v. Whle are not fed Use P and form the data clods from P sng eqaton (4); Obtan the new centers P and spports P P Calclate standard devatons S of the data clods;,,..., P sng eqaton (); Fnd R satsfyng eqaton (5); Eclde R from P ; P P P ;. End Whle o. ; P P and obtan ;. Bld the data clods wth o sng eqaton (4); C. Propertes of EA Operators Havng ntrodced the basc EA operators, we wll now otlne ther propertes. They are entrely based on the emprcally observed epermental data and ther mtal dstrbton n the data space; They do not reqre any ser- or problem-specfc thresholds and parameters to be pre-specfed; They do not reqre any model of data generaton (random or determnstc), only the type of dstance metrc sed (however, t can be any); The ndvdal data samples (observatons) do not need to be ndependent or dentcally dstrbted (d); on the contrary, ther mtal dependence s taken nto accont drectly throgh the mtal dstance between the data samples; The method does not reqre nfnte nmber of observatons and can work wth st a few eemplars; Wthn EA, we stll can consder cross valdaton and non-parametrc statstcal tests based on the realzatons of epermentally observed data smlarly to the sgnfcance tests tlzed on the random varable assmed n the tradtonal probablty theory and statstcs. As a conclson, EA can be seen as an advanced data analyss framework whch can work effcently wth any feasble data and any type of dstance or smlarty metrc. IV. THEORETICA BASIS - COTIUOUS ESITY A TYPICAITY Up to ths pont, all EA defntons are sefl to descrbe data sets or data streams made p of a dscrete nmber of observatons. However, they cannot be sed for nference becase they are only defned on ponts where samples occr (dscrete spaces). In ths secton, we defne the contnos local and global densty and global typcalty whch can be P

obtaned the spport of these data clods. Therefore, the etenson to the contnos doman s nherently local (per data clod).

8 > < 8 Fg.5. The process of etractng dstrbton from data n EA sed for nference on the contnos doman of the varable. At ths stage, we depart from the entrely data based and assmptons-free approach we sed so far, however, ths s done after we dentfed the local modes, formed data clods arond these focal ponts and obtaned the spport of these data clods. Therefore, the etenson to the contnos doman s nherently local (per data clod). We assme that the local mode consdered as the mean and the spport consdered as freqency pls the devaton of the emprcal data do provde the trplet of parameters (μ, X, ). We do recognze that these trplets are condtonal on the specfc data samples observed and assocated wth the partclar data clod, bt ths wll be pdated when new data s avalable. ow, havng ths trplet of parameters we, frstly, defne the contnos local densty, as: q q ; (5) ke eqaton (9), for the case of Ecldean dstance, the contnos local densty, s smplfed to a contnos Cachy type fncton over any feasble vale of the varable wth the parameters μ and X etracted from avalable data samples as descrbed earler:, ;,,..., C ; (6),, where, X,, ;, and X, are the mean and the contnos space for each local mamm per data clod. Frthermore, we ntrodce the contnos global densty as a weghted sm of the local densty of each data clod wth weghts beng the spport (nmber of data samples) of the respectve data clod. Fnally, we ntrodce the contnos global typcalty based on. The contnos global densty and typcalty play a smlar role to the mtre of pdfs. However, the qestons how many dstrbtons n the mtre, whch are ther parameters and what type of dstrbtons see Fg. 5 are all answered from the data drectly, free from any ser or problem-specfc pre-defned parameters, pror assmptons, knowledge or pre-processng technqes lke the cases of clsterng, EM, etc. A. Contnos lobal ensty Contnos global densty s a mtre that arses smply from the metrc of the space sed to measre sample dstance and the densty of samples that est n the space. However, t works for all types of dstance/smlarty metrc. As we can see from eqaton (6) the local densty s Cachy type when the Ecldean dstance s employed therefore, the smplest of the procedres s to defne the contnos global densty as a mtre of Cachy dstrbtons. The contnos global densty enables nference of new samples anywhere n the space. For any and any type of dstance sed, we defne contnos global densty n a general form very mch lke the mtre dstrbtons, as a weghted combnaton of contnos local denstes: the average vale of scalar prodcts of the data samples wthn we mpose the condton S the th,. The contnos global data clod; C s the nmber of data clods; the sbscrpt means the local denstes are derved from densty s defned non-parametrcally from each of the observed data samples. It s obvos that wth more data samples observed, the parameters wll change and have to be pdated reglarly. ote that eqaton (6) s defned based on Ecldean dstance. The modes of the data ( ) and near the peaks; t s a very good appromaton of, bt t wll devate progressvely from t n trogh regons. As an eample, the global densty for the same clmate dataset sed before [0] s presented n Fg. 6(a). epresson of contnos local densty vares from the type of dstance sed. onetheless, n general, the contnos local densty of the data can be epressed n the same form as the dscrete local densty bt n the contnos space. The contnos local (a) Contnos global densty (b) Contnos global typcalty densty s defned on Fg.6. Contnos global densty and global typcalty of the real clmate dataset [0] sng Ecldean type dstance. C S,, ; (7) where, s the local densty of n the th data clod; C s the nmber of data clods at the th tme nstance; S, s the spport (nmber of members) of the th data clod based on the avalable epermental/actal data. For normalzaton, C

> < 9 Compared wth the dscrete local densty ntrodced n secton II whch s dscrete and nmodal by defnton, s more effectve to detect the natral mltmodal data strctre sch as abnormal data samples becase

Comparson between the contnos global typcalty, dscrete global typcalty, hstogram and tradtonal pdf. whch can be vewed as the man modes of the data patterns, can have hgher vales of f,,..., Kdd,.

9 > < 9 Compared wth the dscrete local densty ntrodced n secton II whch s dscrete and nmodal by defnton, s more effectve to detect the natral mltmodal data strctre sch as abnormal data samples becase only the data samples that are close (a) wnd chll ( o C) (b) wnd gst (mph) to the larger data clods, Fg.7. Comparson between the contnos global typcalty, dscrete global typcalty, hstogram and tradtonal pdf. whch can be vewed as the man modes of the data patterns, can have hgher vales of f,,..., Kdd,... dk (0) contnos global densty. Ths featre s clearly depcted by K the vale of of those data samples located n the space between the two man modes n the fgres below, whle for the Based on (7)-(9), we ntrodce the normalzed contnos local densty as follows: local densty, see Spplementary Fg. (c), t s eactly the K opposte case. K,, K B. Contnos lobal Typcalty K, () Havng ntrodced the contnos global densty, we can T also defne the contnos global typcalty, as well. It s Here, X,,, for the Ecldean dstance. also defned as a normalzed form of the densty (smlarly to We can, fnally, get the epresson of the contnos global the weghted typcalty,, eqaton ()) bt wth the se of ntegral nstead of the sm. As stated n secton II, the weghted typcalty, densty as: n terms of the normalzed contnos global C typcalty, s dscrete and sms to. The global typcalty s K K epressed as follows: S,, C, C S C, K K S,,, S,, d (8) C d S,, d () For the Ecldean dstance, eqaton () becomes It s mportant to notce that eqaton (8) s general and vald for any type of dstance/smlarty metrc. For a general mltvarate case, t s mportant to normalze the mtre of to make ntegrate to. contnos local denstes, By fndng ot the ntegral of the contnos global densty wthn the metrc space and dvdng by ts ntegral, one can always garantee nt ntegral, regardless the type of dstance/smlarty metrc sed. As we sad before, we consder the well-known epresson of the mltvarate Cachy dstrbton [] [3] to transform the wthot loss of generalty. f, d K where T,,..., K constant and K K T (9) ; s the well-known mathematcal s the gamma fncton; scalar parameter. Ths garantees that: E ; s K C, K K K,,, S (3) The contnos global typcalty of the real clmate dataset wth Ecldean dstance s presented n Fg.6(b) The comparsons between the contnos global typcalty (the modes are etracted by the approach ntrodced n secton III), dscrete global typcalty, hstogram and tradtonal pdf are presented n form for vsal clarty n Fg. 7 sng the same the real clmate dataset [0]. As shown n Fg. 7, compared wth the tradtonal pdf sng a assan model, the global typcalty derved drectly from the dataset wthot any pror assmpton abot the nmber of local modes or type of dstrbton represents very well the two modes n the data pattern and gves reslts very close to what a hstogram wold gve and sgnfcnatly better to what a sngle nmodal dstrbton wold provde. In smmary, the proposed contnos global typcalty has the followng propertes, many of whch t shares wth the dscrete global typcalty ntrodced n secton III:

> < 0 ) ntegrates to ; ) provdes a closed analytc form; ) no reqrement for pror assmptons as well as any ser or problem-specfc threshold and parameters; these are derved from the data entrely; v) can

APPICATIOS In ths sbsecton, we wll gve several eamples of the contnos global typcalty, of dfferent datasets etracted by the proposed atomatc mode dentfcaton algorthm.

As the dmensonalty of the orgnal datasets s >, for a better vsalzaton, we se the prncpal component analyss (PCA) method [3] to redce the dmensonalty and se the frst prncpal components n the fgres as

10 > < 0 ) ntegrates to ; ) provdes a closed analytc form; ) no reqrement for pror assmptons as well as any ser or problem-specfc threshold and parameters; these are derved from the data entrely; v) can be recrsvely calclated for varos types of metrcs. A. Eamples V. APPICATIOS In ths sbsecton, we wll gve several eamples of the contnos global typcalty, of dfferent datasets etracted by the proposed atomatc mode dentfcaton algorthm. The contnos global typcalty of the Seeds dataset [8] and Combned Cycle Power Plant dataset [9] and Wne Qalty dataset [30] wth Ecldean dstance s presented n Fg. 8. As the dmensonalty of the orgnal datasets s >, for a better vsalzaton, we se the prncpal component analyss (PCA) method [3] to redce the dmensonalty and se the frst prncpal components n the fgres as the -as and y-as. Spplementary Fg. 5 (a) and (b) present the derved from the frst /3 and the frst /3 the Wne Qalty dataset. Spplementary Fg. 5 (c) depcts the derved by scramblng the order of the data samples. The contnos global typcalty of dmensonal benchmark datasets A, S and S [3] are also presented n Spplementary Fg. 6. If we want more detals from the contnos global typcalty, we can also stop the atomatc mode dentfcaton algorthm descrbed n secton III early,.e. before the fnal teraton, and bld the contnos global typcalty based on more detaled data parttonng reslts. The Spplementary Vdeo referred n secton III.B also depcts evolton of the global contnos typcalty based on the reslts of dfferent teraton tmes of the proposed mode dentfcaton algorthm. B. Inference Prmer Assmng, there are 3 arbtrary non-nteger vales of wnd ( o C), whch does not est n the chll data 7.5;.5;4.7 dataset, we can qckly obtan the correspondng contnos global typcalty sng eqaton (8), , , and the nferences made are presented n Fg. 9. Here we only consder the two man modes. That means that wnd chll of -7.5 o C s less lkely whle the wnd chll of.5 o C s more lkely. In addton, f we want to know the contnos global typcalty of all the vales larger than t, we can ntegrate as follows: T t d (4) t For eample, when Ecldean dstance s sed, and here we only consder one-dmensonal data for smpler dervaton, eqaton (4) can be re-wrtten as: C S,, t d T t C t, S, arctan, (5) et s contne the eample n Fg. 9. If we want to know the global contnos typcalty of all the data samples above 0 o C, whch s the green area of ths fgre, we can calclate the vale sng eqaton (5) to yeld T That means that the lkelhood, a vale to be eqal to or greater than 0 o C s 4.47%. One can see that the contnos global typcalty can serve as a form of probablty. C. aïve EA Classfer In ths sb-secton, we borrow the concept of naïve Bayes classfers [] [3] and propose a new verson of naïve EA classfer. In contrast wth the orgnal naïve EA classfer proposed n [5], whch reles for nference on the dscrete global typcalty and lnear nterpolaton and/or etrapolaton, the naïve EA classfer n ths paper ses the contnos global typcalty nstead, whch s based on the local modes of the dscrete global typcalty dentfed by an atomatc procedre as descrbed n secton III.B. Ths procedre s more effectve n reflectng the ensemble featres of the dstrbton of the data samples of dfferent classes n the data space. As the proposed approach accommodates varos type of dstance/smlarty metrcs, one can se the crrent knowledge n the area to choose the desred dstance measre for a reasonable appromaton that smplfes the processng. Moreover, one can change to other dstance measres easly (a) Seeds dataset (b) Combned Cycle Power Plant dataset (c) Wne Qalty dataset Fg.8. Contnos global typcalty of the Seeds dataset [8], Combned Cycle Power Plant dataset [9] and Wne Qalty dataset [30]

11 > < Fg.9. Contnos global typcalty nferences and compare the reslts obtaned by the classfer wth dfferent type of measres. For consstence, n the followng nmercal eamples, we se the Ecldean dstance. et s assme H classes at the th tme nstance, where some classes may have many data clods. The contnos global typcalty per class can be defned as (,,..., H ):, where, class label, W W S,,,,,,,, S d (6) W s the nmber of data clods sharng the same th H W C ;,, clod havng the th class label; contnos local densty. For any nlabeled data sample followng epresson: label S s the spport of the th data,, arg ma,,..., H, of wnd chll data and smple s the correspondng, ts label s decded by the (7) The plots (wnd chll and wnd gst) of the contnos global typcalty wth Ecldean type of dstance of the real clmate dataset are gven n Spplementary Fg.7. The performance of the proposed naïve EA classfer s frther tested on the followng problems: ) Banknote Athentcaton dataset [33]; ) Pma dataset [34]; ) Clmate dataset [0]; v) Pen-Based Handwrtten gts Recognton dataset [35]; TABE I CASSIFICATIO PERFORMACE- 3 PRICIPA COMPOETS COSIERE Overall Accracy ataset aïve EA classfer SVM classfer aïve Bayes classfer Banknote Pma Clmate Pendgt Madelon Optdgt Occpancy detecton testng set Occpancy detecton testng set v) Madelon dataset [36]; v) Optcal Handwrtten gts Recognton dataset [37]; v) Occpancy etecton dataset [38]. The proposed naïve EA classfer s compared wth a SVM classfer wth assan radal bass fncton and a naïve Bayes classfer n terms of ther performance. The detals of the datasets sed n the classfcaton are demonstrated n Spplementary Secton B. In the eperments, PCA [3] s appled as a pre-processng step to redce the dmensonalty and balance the varances of the datasets. It has to be stressed that PCA s not a part of the proposed method and s not necessary for smpler problems. For Banknote Athentcaton, Pma and Clmate datasets, we randomly select 70% of the data for tranng and se the rest for valdaton. The performance s evalated after 0-fold cross-valdaton. For Pen-Based gts, Madelon, Optcal gts and Occpancy etecton datasets, we tran the classfers wth the tranng sets and condct the valdaton wth the testng/valdaton sets. The overall performance of the 3 classfers s tablated n Table I, where we consder the frst 3 prncpal components for classfcaton. Consderng the frst 5 prncpal components, the overall reslts obtaned by the classfers are tablated n Table II. As t s shown n Tables I and II, the proposed naïve EA classfer otperforms the SVM classfer and naïve Bayes classfer on dfferent problems n the maorty of the nmercal eamples. The performance of the proposed naïve EA classfer s the best. In addton, t s worth to note that the classfcaton condcted by the naïve EA classfer s totally free from nrealstc assmptons, restrctons or pror knowledge. VI. COCUSIO A FUTURE IRECTIO In ths paper, we propose a new systematc approach to derve ensemble propertes of data wthot any pror assmptons abot data sorces, amont of data and ser- or problem- specfc parameters. The EA (Emprcal ata Analytcs) framework consders the relatve poston of data n a metrc space only and etracts from the raw epermental dscrete observatons a seres of measres of ther ensemble propertes, sch as the cmlatve promty (q), centralty (C), sqare centralty (q - ), standardzed eccentrcty ( ), densty ( ) as well as typcalty, (). The local and global versons of TABE II CASSIFICATIO PERFORMACE - 5 PRICIPA COMPOETS COSIERE Overall Accracy ataset aïve EA classfer SVM classfer aïve Bayes classfer Pma Clmate Pendgt Madelon Optdgt Occpancy detecton testng set Occpancy detecton testng set

12 > < the typcalty, ( and ) are both consdered orgnally n dscrete form and then n contnos form appromatng the actal data-drven dscrete estmators by a mtre of local fnctons. It was demonstrated that for the case when the dstance metrc sed s Ecldean, the densty (both n ts dscrete form that s eactly descrbng the actal data and n ts contnos form whch s appromatng the entre data space densty) takes the form of a Cachy fncton. However, mportantly, ths s not an assmpton made a pror, bt s drven and parameterzed by the data and the selected metrc. Frthermore, we propose an atonomos algorthm for dentfyng all local modes/mama of the global dscrete typcalty, as well as for flterng ot the man local mama based on the closeness of each local mamm. Fnally, we present a nmber of nmercal eamples amng to verfy the methodology and demonstrate ts advantages. We ntrodce a new type of classfer, whch we call naïve EA for nvestgatng the nknown data pattern behnd the large amont of data n a data-rch envronment. In conclson, the proposed EA framework and methodology provdes an effcent alternatve that s entrely based on the epermental data and the evdence. It toches the very fondatons of data mnng and analyss and, ths, has a wde area of applcatons, especally, n the era of bg data and data streams where handcraftng offlne methods and makng detaled assmptons s often not an opton. onetheless, we have to admt that the bottlenecks of the proposed methodology are the lack of theoretcal confdence levels for the analyss and the theoretcal dea of relablty and generalzaton, whch are the nherted lmtatons of nonparametrc approaches. In ths paper, we only provde the prelmnary algorthms and reslts on data parttonng, analyss, nference and classfcaton. As a ftre work, we wll focs on developng more advanced algorthms wthn the EA framework for varos applcatons of dfferent areas, ncldng, bt not lmted to, hgh freqency tradng data processng, foregn crrency tradng problem, handwrtten dgts recognton, remote sensng, etc. REFERECES [] T. Haste, R. Tbshran, and J. Fredman, The elements of statstcal learnng: ata mnng, nference, and predcton. Brln: Sprnger, 009. [] C. M. Bshop, Pattern recognton. ew York: Sprnger, 006. [3] R. O. da, P. E. Hart, and.. Stork, Pattern classfcaton, nd ed. Chchester, West Ssse, UK,: Wley-Interscence, 000. [4] T. Bayes, An essay towards solvng a problem n the doctrne of chances, Phlos. Trans. R. Soc., vol. 53, p. 370, 763. [5] J. Prncpe, Informaton theoretc learnng: Reny s entropy and kernel perspectves. Sprnger, 00. [6] C. Spearman, The proof and measrement of assocaton between two thngs, Am. J. Psychol., vol. 5, pp. 7 0, 904. [7] M.. Kendall, A new measre of rank correlaton, Bometrka, vol. 30, no., pp. 8 93, 938. [8]. A.. oodman and W. H.. Krskal, Measres of assocaton for cross classfcatons, J. Am. Stat. Assoc., vol. 49, no. 68, pp , 954. [9] P. el Moral, onlnear flterng: nteractng partcle resolton, Comptes Rends l Académe des Sc. - Ser. I - Math., vol. 35, no. 6, pp , 997. [0]. A. Zadeh, Fzzy sets, Inf. Control, vol. 8, no. 3, pp , 965. [] M. Chen and. A. nkens, Rle-base self-generaton and smplfcaton for data-drven fzzy models, Fzzy Sets Syst., vol. 4, no., pp , 004. [] P. P. Angelov, Anomaly detecton based on eccentrcty analyss, n 04 IEEE Symposm Seres n Comptatonal Intellgence, IEEE Symposm on Evolvng and Atonomos earnng Systems, EAS, SSCI 04, 04, pp. 8. [3] P. Angelov, Otsde the bo: an alternatve data analytcs framework, J. Atom. Mob. Robot. Intell. Syst., vol. 8, no., pp , 04. [4] P. Angelov, X., and. Kangn, Emprcal data analytcs, Int. J. Intell. Syst., OI 0.00/nt.899, 07. [5] P. P. Angelov, X., J. Prncpe, and. Kangn, Emprcal data analyss - a new tool for data analytcs, n IEEE Internatonal Conference on Systems, Man, and Cybernetcs, 06, pp [6]. Sabdss, The centralty nde of a graph, Psychometrka, vol. 3, no. 4, pp , 966. [7]. C. Freeman, Centralty n socal networks conceptal clarfcaton, Soc. etworks, vol., no. 3, pp. 5 39, 979. [8] J.. Saw, M. C. K. Yang, and T. S. E. C. Mo, Chebyshev neqalty wth estmated mean and varance, Am. Stat., vol. 38, no., pp. 30 3, 984. [9] P. Angelov, Atonomos learnng systems: from data streams to knowledge n real tme. John Wley & Sons, td., 0. [0] Clmate ataset n Manchester, [] S. adaraah and S. Kotz, Probablty ntegrals of the mltvarate t dstrbton, Can. Appl. Math. Q., vol. 3, no., pp , 005. [] C. ee, Fast smlated annealng wth a mltvarate Cachy dstrbton and the confgraton s ntal temperatre, J. Korean Phys. Soc., vol. 66, no. 0, pp , 05. [3] S. Y. Shatskkha, Mltvarate Cachy dstrbtons as locally assan dstrbtons, J. Math. Sc., vol. 78, no., pp. 0 08, 996. [4] A. Cordnean and C. M. Bshop, Varatonal Bayesan model selecton for mtre dstrbtons, Proc. Eghth Int. Conf. Artf. Intell. Stat., pp. 7 34, 00. [5] P. P. Angelov and X., Emprcal fzzy sets, nder revew, 07. [6] P. Angelov and R. Yager, A new type of smplfed fzzy rle-based system, Int. J. en. Syst., vol. 4, no., pp , 0. [7] A. Okabe, B. Boots, K. Sghara, and S.. Ch, Spatal tessellatons: concepts and applcatons of Vorono dagrams, nd ed. Chchester, England: John Wley & Sons., 999. [8] Seeds ataset, [9] Combned Cycle Power Plant ataset, [30] Wne Qalty ataset, [3] I. Jollffe, Prncpal component analyss. John Wley & Sons, td., 00. [3] Clsterng datasets, [33] Banknote Athentcaton ataset, [34] Pma Indans abetes ataset, [35] Pen-Based Recognton of Handwrtten gts ataset, wrtten+gts. [36] Madelon ataset, [37] Optcal Recognton of Handwrtten gts ataset, rtten+gts. [38] Occpancy etecton ataset, Plamen P. Angelov (F 6, SM'04, M'99) s a Char Professor n Intellgent Systems wth the School of Comptng and Commncatons, ancaster Unversty, UK. He obtaned hs Ph (993) and hs Sc (05) from the Blgaran Academy of Scence. He s the Vce Presdent of the Internatonal eral etworks Socety and a member of the Board of overnors of the Systems, Man and Cybernetcs Socety of the IEEE, a stngshed ectrer of IEEE. He s Edtor-n-Chef of the Evolvng Systems ornal (Sprnger) and Assocate Edtor of IEEE Transactons on Fzzy Systems as well as of IEEE Transactons on Cybernetcs and several other ornals. He

13 > < 3 receved varos awards and s nternatonally recognzed poneerng reslts nto on-lne and evolvng methodologes and algorthms for knowledge etracton n the form of hman-ntellgble fzzy rle-based systems and atonomos machne learnng. He holds a wde portfolo of research proects and leads the ata Scence grop at ancaster. Xaowe receved the B.E. and M.E. degrees from the Hangzho anz Unversty, Hangzho, Chna. He s crrently prsng the Ph.. degree n compter scence wth ancaster Unversty, UK. José C. Príncpe (F 00) s a stngshed Professor of Electrcal and Compter Engneerng at the Unversty of Florda. He s also the Ecks Professor and Fondng rector of Comptatonal eroengneerng aboratory (CE), Unversty of Florda. Hs prmary research nterests are advanced sgnal processng wth nformaton theoretc crtera (entropy and mtal nformaton), adaptve models n the reprodcng kernel Hlbert spaces (RKHS) and the applcaton of these advanced algorthms n Bran Machne Interfaces (BMI). r. Prncpe s a Fellow of the IEEE, ABME and AIBME. He s the past Edtor-n-Chef of the IEEE Transactons on Bomedcal Engneerng, past Char of the Techncal Commttee on eral etworks of the IEEE Sgnal Processng Socety and past Presdent of the Internatonal eral etwork Socety. He receved the IEEE EMBS Career Award, and the IEEE eral etwork Poneer Award.

Modeling Local Uncertainty accounting for Uncertainty in the Data

Modeling Local Uncertainty accounting for Uncertainty in the Data Modelng Local Uncertanty accontng for Uncertanty n the Data Olena Babak and Clayton V Detsch Consder the problem of estmaton at an nsampled locaton sng srrondng samples The standard approach to ths problem