Multi-Agent Decision Tree Learning from Distributed Autonomous Data. Sources. D. Caragea, A. Silvescu, and V. Honavar

Size: px
Start display at page:

Download "Multi-Agent Decision Tree Learning from Distributed Autonomous Data. Sources. D. Caragea, A. Silvescu, and V. Honavar"

Transcription

1 Multi-Aget Decisio Tree Learig from Distributed Autoomous Data Sources D. Caragea, A. Silvescu, ad V. Hoavar Iowa State Uiversity Computer Sciece Departmet Artificial Itelligece Research Group Ames, IA INTRODUCTION Recet advaces i computig, commuicatios, ad digital storage techologies, together with developmet of high throughput data acquisitio techologies have made it possible to gather ad store large volumes of data i digital form. For example, advaces i high throughput sequecig ad other data acquisitio techologies have resulted i gigabytes of DNA, protei sequece data, ad gee expressio data beig gathered at steadily icreasig rates i biological scieces. Orgaizatios have begu to capture ad store a variety of data about various aspects of their operatios (e.g., products, customers, ad trasactios). Complex distributed systems (e.g., computer systems, commuicatio etworks, power systems) are equipped with sesors ad measuremet devices that gather ad store, a variety of data for use i moitorig, cotrollig, ad improvig the operatio of such systems. These developmets have resulted i uprecedeted opportuities for large-scale data-drive kowledge acquisitio with the potetial for fudametal

2 gais i scietific uderstadig (e.g., characterizatio of macromolecular structure-fuctio relatioships i biology) i may data-rich domais. I such applicatios, the data sources of iterest are typically physically distributed ad are ofte autoomous. Give the large size of these data sets, gatherig all of the data i a cetralized locatio is geerally either desirable or feasible because of badwidth ad storage requiremets. I such domais, there is a eed for kowledge acquisitio systems that ca perform the ecessary aalysis of data at locatios where the data ad the computatioal resources are available ad trasmit the results of aalysis (kowledge acquired from the data) to locatios where they are eeded. I other domais, the ability of autoomous orgaizatios to share raw data may be limited due to a variety of reasos (e.g., privacy cosideratios). I such cases, there is a eed for kowledge acquisitio systems that ca lear from statistical summaries of data (e.g., couts of istaces that match certai criteria) that are made available as eeded from the distributed data sources i the absece of access to raw data. Thus distributed learig systems have to itegrate may heterogeeous, relatively autoomous compoets (e.g., data repositories, servers, user-supplied data trasformatios ad learig algorithms) ad provide support for data aalysis where the data ad computatioal resources are available. Aget-orieted software egieerig (Jeigs ad Wooldridge, 2001; Hoavar et al., 1998) offers a attractive approach to implemetig modular ad extesible distributed computig systems. For the purpose of this discussio, a itelliget aget is a ecapsulated iformatio processig system that is situated i some eviromet ad is capable of flexible, autoomous actio withi the costraits of the eviromet so as to achieve its desig obectives. The distributed learig scearios described above call for the desig of distributed itelliget learig agets with provable covergece properties with respect to the batch sceario (i.e., whe all of the data is available i a cetral locatio).

3 Agaist this backgroud, this chapter presets a approach to the desig of distributed itelliget learig agets. We precisely formulate a class of distributed learig problems, discuss the commo types of data fragmetatio that result from the distributed ature of the data (vertical fragmetatio ad horizotal fragmetatio), ad preset a geeral strategy for trasformig a large class of traditioal batch learig algorithms ito distributed learig algorithms. The we demostrate a applicatio of this strategy to devise itelliget learig agets for decisio tree iductio (usig a variety of commoly used splittig criteria) from horizotally or vertically fragmeted distributed data, ad show that the algorithms uderlyig the devised agets are provably exact i that the decisio tree costructed from distributed data is idetical to that obtaied by the correspodig algorithm i the batch settig. We also provide a aalysis of the time ad commuicatio complexity of the algorithms uderlyig the proposed agets. The distributed decisio tree iductio agets described i this chapter have bee implemeted as part of INDUS, a aget-based system for data-drive kowledge acquisitio from heterogeeous, distributed, ad autoomous data sources. INDUS is beig applied to several kowledge discovery problems i molecular biology ad etwork-based itrusio detectio. We coclude this chapter with a discussio of related research ad brief outlie of future research directios. DISTRIBUTED LEARNING The problem of learig from distributed data sets ca be summarized as follows: The data sets D, D, 1 2, D are distributed across multiple autoomous sites 1,2,, ad the learer's task is to acquire useful kowledge from this data, h. For istace, such kowledge might take the form of a decisio tree or a set of rules for patter classificatio. I such a settig learig ca be

4 accomplished by a aget A that visits the differet sites to gather the iformatio eeded to geerate a suitable model (e.g., a decisio tree) from the data (serial distributed learig, Figure 1). h A I 1 I 2 I D 1 D 2 D Figure 1: Serial distributed learig. Alteratively, the differet sites ca trasmit the iformatio ecessary for costructig the decisio tree to the learig aget A situated at a cetral locatio (parallel distributed learig, Figure 2). h A I 1 I 2 I D 1 D 2 D Figure 2: Parallel distributed learig. We assume that it is ot feasible to trasmit raw data betwee sites. Cosequetly, the learer has to rely o iformatio I, 1, I 2, I (e.g., statistical summaries such as couts of data tuples that satisfy particular criteria) extracted from the sites. Our approach to learig from distributed data sets ivolves idetifyig the iformatio requiremets of existig learig algorithms, ad

5 desigig efficiet meas of providig the ecessary iformatio to the learer while avoidig the eed to trasmit large quatities of data (Caragea et al., 2000). Exact Distributed Learig We say that a distributed learig algorithm L d (e.g., for decisio tree iductio from distributed data sets), embedded ito a aget A, is exact with respect to the hypothesis iferred by a batch learig algorithm L (e.g., for decisio tree iductio from a cetralized data set) if the hypothesis produced by L d usig distributed data sets D, D, 2, D 1, stored at sites,2,, 1 (respectively), is the same as that obtaied by L from the complete data set D obtaied by appropriately combiig the data sets D D, D,, 1 2. Similarly, we ca defie exact distributed learig with respect to other criteria of iterest (e.g., expected accuracy of the leared hypothesis). More geerally, it might be useful to cosider approximate distributed learig i similar settigs. However, the discussio that follows is focused o exact distributed learig. Horizotal ad Vertical Data Fragmetatio I may applicatios, the data set cosists of a set of tuples where each tuple stores the values of relevat attributes. The distributed ature of such a data set ca lead to at least two commo types of data fragmetatio: horizotal fragmetatio wherei subsets of data tuples are stored at differet sites; ad vertical fragmetatio wherei subtuples of data tuples are stored at differet sites. Assume that a data set D is distributed amog the sites 1,2,, cotaiig data set fragmets D, D, 1 2, D. We assume that the idividual data sets D, D, 1 2, D collectively cotai eough iformatio to geerate the complete dataset D. I may applicatios, it might be the case that the

6 idividual data sets are autoomously owed ad maitaied. Cosequetly, the access to the raw data may be limited ad oly summaries of the data (e.g., umber of istaces that match some criteria of iterest) may be made available to the learer. Eve i cases where access to raw data may ot be limited, the large size of the data sets makes it ifeasible to assemble the complete data set D at a cetral locatio. Horizotal Fragmetatio I the case of horizotal fragmetatio, the data is distributed i such a maer that each site cotais a set of data tuples. The uio of all these sets costitutes the complete dataset. If the idividual data sets (horizotal fragmets) are deoted by D D, D,, 1 2, ad the correspodig complete data set by D, the Horizotally Distributed Data (HDD) has the followig property: D D D 1 2 D, where deotes set uio. Hece, i this case, a distributed learig algorithm L d is exact with respect to the hypothesis iferred by a learig algorithm L if it is the case that: L D, D,, D ) L( D D D ). The challege is to achieve this guaratee d ( without providig L d with simultaeous access to D D, D,, 1 2. Vertical Fragmetatio I this case, each data tuple is fragmeted ito several subtuples each of which shares a uique key or idex. Thus, differet sites store vertical fragmets of the data set. Each vertical fragmet correspods to a subset of the attributes that describe the complete data set. It is possible for some attributes to be shared (duplicated) across more tha oe vertical fragmet, leadig to overlap betwee the correspodig fragmets. Let A A, A,, 1 2 idicate the set of attributes whose values are stored at sites 1,2,,, respectively, ad let A deote the set of attributes that are used to

7 describe the data tuples of the complete data set. The i the case of Vertically Distributed Data (VDD), we have: A A A 1 2 A. Let D, D, 1 2, D deote the fragmets of the dataset stored at sites 1,2,,, respectively, ad let D deote the complete data set. Let the ith tuple i a data fragmet D be deoted as t i D. Let t i idex D. deote the uique idex associated with tuple t i D ad let deote the oi operatio. The the followig properties hold for VDD: i i D D D D, D, D, t. idex t idex. Thus, the subtuples from the vertical data 1 2 k D D. fragmets stored at differet sites ca be put together usig their uique idex to form the correspodig data tuples of the complete dataset. It is possible to evisio scearios i which a vertically fragmeted data set might lack uique idices. I such a case, it might be ecessary to use combiatios of attribute values to ifer associatios amog tuples (Bhatagar ad Sriivasa, 1997). I what follows, we will assume the existece of uique idices i vertically fragmeted distributed data sets. I the case of vertically fragmeted data, a distributed learig algorithm L d is exact with respect to the hypothesis iferred by a learig algorithm L if it is the case that: L ( D, D2,, D ) L( D1 D2 D ). The challege is to guaratee this without providig L d d 1 with simultaeous access to D 1, D2,, D. Trasformig Batch Learig Algorithms ito Exact Distributed Learig Algorithms Our geeral strategy for trasformig a batch learig algorithm (e.g., a traditioal decisio tree iductio algorithm) ito a exact distributed learig algorithm ivolves idetifyig the iformatio requiremets of the algorithm ad desigig efficiet meas for providig the eeded iformatio to the learig aget while avoidig the eed to trasmit large amouts of data. Thus,

8 we decompose the distributed learig task ito distributed iformatio extractio ad hypothesis geeratio compoets. The feasibility of this approach depeds o the iformatio requiremets of the batch algorithm L uder cosideratio ad the (time, memory, ad commuicatio) complexity of the correspodig distributed iformatio extractio operatios. I this approach to distributed learig, oly the iformatio extractio compoet has to effectively cope with the distributed ature of data i order to guaratee provably exact learig i the distributed settig i the sese discussed above. Suppose we decompose a batch learig algorithm L i terms of a iformatio extractio operator I that extracts the ecessary iformatio from data set ad a hypothesis geeratio operator H that uses the extracted iformatio to produce the output of the learig algorithm L. That is, L( D) H ( I( D)). Suppose we defie a distributed iformatio extractio operator I d that geerates from each data set D i, the correspodig iformatio I i =I d (D i ), ad a operator C that combies this iformatio to produce I(D). That is, the iformatio extracted from the distributed data sets is the same as that used by L to ifer a hypothesis from the complete dataset D. That is, C I ( D ), I ( D ),, I ( D )) I( ). Thus, we ( d 1 d 2 d D ca guaratee that L D, D,, D ) H ( C[ I ( D, D,, D )]) will be exact with respect to L( D) H ( I( D)). d ( 1 2 d 1 2 AGENT BASED DECISION TREE INDUCTION FROM DISTRIBUTED DATA Decisio tree algorithms (Quila, 1986; Breima et al., 1984; Bua ad Lee, 2001) represet a widely used family of machie learig algorithms for buildig patter classifiers from labeled traiig data. They ca also be used to lear associatios amog differet attributes of the data. Some of their advatages over other machie learig techiques iclude their ability to: select from all attributes used to describe the data, a subset of attributes that are relevat for classificatio;

9 idetify complex predictive relatios amog attributes; ad produce classifiers that ca be traslated i a straightforward maer, ito rules that are easily uderstood by humas. A variety of decisio tree algorithms have bee proposed i the literature. However, most of them select recursively, i a greedy fashio, the attribute that is used to partitio the data set uder cosideratio ito subsets util each leaf ode i the tree has uiform class membership. The ID3 (Iterative Dichotomizer 3) algorithm proposed by Quila (Quila, 1986) ad its more recet variats represet a widely used family of decisio tree learig algorithms. The ID3 algorithm searches i a greedy fashio, for attributes that yield the maximum amout of iformatio for determiig the class membership of istaces i a traiig set S of labeled istaces. The result is a decisio tree that correctly assigs each istace i S to its respective class. The costructio of the decisio tree is accomplished by recursively partitioig S ito subsets based o values of the chose attribute util each resultig subset has istaces that belog to exactly oe of the M classes. The selectio of attribute at each stage of costructio of the decisio tree maximizes the estimated expected iformatio gaied from kowig the value of the attribute i questio. Differet algorithms for decisio tree iductio differ from each other i terms of the criterio that is used to evaluate the splits that correspod to tests o differet cadidate attributes. The choice of the attribute at each ode of the decisio tree greedily maximizes (or miimizes) the chose splittig criterio. Ofte, decisio tree algorithms also iclude a pruig phase to alleviate the problem of over fittig the traiig data. For the sake of simplicity of expositio, we limit our discussio to decisio tree costructio without pruig. However, it is relatively straightforward to modify the proposed algorithms to icorporate a variety of pruig methods.

10 Splittig Criteria Some of the popular splittig criteria are based o etropy (Quila, 1986), which is used by Quila's ID3 algorithm ad its variats, the Gii Idex (Breima et al., 1984) which is used by Breima's CART algorithm, amog others. More recetly, additioal splittig criteria that are useful for exploratory data aalysis have bee proposed (Bua ad Lee, 2001). Cosider a set of istaces S that is partitioed ito M disoit subsets (classes) C, C, 2, C 1 M such that S M C i i1 ad C i. The estimated probability that a radomly chose i C istace s S belogs to the class C is p C S, where X deotes the cardiality of the set X. The estimated etropy of a set S measures the expected iformatio eeded to idetify the class C C membership of istaces i S, ad is defied as follows: etropy( S) log 2. S S The estimated Gii idex for the set S cotaiig examples from M classes is defied as follows: 2 ( ) 1 C gii S. Give some impurity measure (either the etropy or Gii idex, or ay S other measure that ca be defied based o the probabilities p ) we ca defie the estimated iformatio gai for a attribute a, relative to a collectio of istaces S as follows: IGai( S, a) I ( S) vvalues( a) Sv I ( S S v ), where Values(a) is the set of all possible values for attribute a, S v is the subset of S for which attribute a has value v, ad I(S) ca be etropy(s), gii(s) or ay other suitable measure.

11 It follows that the iformatio requiremets of decisio tree learig algorithms are the same for both these splittig criteria; i both cases, we eed the relative frequecies computed from the relevat istaces. I fact, additioal splittig criteria that correspod to other impurity measures ca be used istead, provided that these measures ca be computed based o the statistics that ca be obtaied from the data sets. Examples of such splittig criteria iclude misclassificatio rate, oe-sided purity, oe-sided extremes (Bua ad Lee, 2001). This turs out to be quite useful i practice sice differet criteria ofte provide differet isights about data. Furthermore, as we show below, the iformatio ecessary for decisio tree costructio ca be efficietly obtaied from distributed data sets. This results i provably exact algorithms for decisio tree iductio from horizotally or vertically fragmeted distributed data sets. Distributed Iformatio Extractio Assume that give a partially costructed decisio tree, we wat to choose the ext best attribute for splittig. Let a () deote the attribute at the th ode alog a path startig from the attribute a 1 () that correspods to the root of the decisio tree, leadig up to the ode i questio a l () at depth l. Let v(a ()) deote the value of the attribute a (), correspodig to the th ode alog the path. For addig a ode below a l (), the set of examples beig cosidered satisfy the followig costraits o values of attributes: L ) [ a ( ) v( a ( ))] [ a ( ) v( a ( ))] [ a ( ) v( a ( ))], ( l l where [ a ( ) v( a ( ))] deotes the fact that the value of the th attribute alog the path is [ a ( ) v( a ( ))]. It follows from the precedig discussio that the iformatio required for costructig decisio trees are the couts of examples that satisfy specified costraits o the values of particular

12 attributes. These couts have to be obtaied oce for each ode that is added to the tree startig with the root ode. If we ca devise distributed iformatio extractio operators for obtaiig the ecessary couts from distributed data sets, we ca obtai exact distributed decisio tree learig algorithms. Thus, the decisio tree costructed from a give data set i the distributed settig is exactly the same as that obtaied i the batch settig whe usig the same splittig criterio i both cases. Horizotally Distributed Data Whe the data is horizotally distributed, examples correspodig to a particular value of a particular attribute are scattered at differet locatios. I order to idetify the best split of a particular ode i a partially costructed tree, all the sites are visited ad the couts correspodig to cadidate splits of that ode are accumulated. The learer uses these couts to fid the attribute that yields the best split to further partitio the set of examples at that ode. Thus, give L(), i order to split the ode correspodig to a ( ) v( a ( )), the iformatio extractio compoet l has to obtai the couts of examples that belog to each class for each possible value of each cadidate attribute. Let D be the total umber of examples i the distributed data set; A, the umber of attributes; V the maximum umber of possible values per attribute; the umber of sites; M the umber of classes; ad size(t) the umber of odes i the decisio tree. For each ode i the decisio tree T, the iformatio extractio compoet has to sca the data at each site to calculate the correspodig couts. We have: D i D. Therefore, i the case of serial distributed learig, i1 the time complexity of the resultig algorithm is D A size( T ). This ca be further improved i l

13 the case of parallel distributed learig sice each site ca perform iformatio extractio i parallel. For each ode i the decisio tree T, each site has to trasmit the couts based o its local data. These couts form a matrix of size M A V. Hece, the commuicatio complexity (the total amout of iformatio that is trasmitted betwee sites) is give by M A V size( T ). It is worth otig that some of the bouds preseted here ca be further improved so that they deped o the height of the tree istead of the umber of odes i the tree by takig advatage of the sort of techiques that are itroduced i (Shafer et al., 1996; Gehrke et al., 1999). Vertically Distributed Data I vertically distributed datasets, we assume that each example has a uique idex associated with it. Subtuples of a example are distributed across differet sites. However, correspodece betwee subtuples of a tuple ca be established usig the uique idex. As before, give L(), i order to split the ode correspodig to a ( ) v( a ( )), the iformatio extractio compoet has to l l obtai the couts of examples that belog to each class for each possible value of each cadidate attribute. Sice each site has oly a subset of the attributes, the set of idices correspodig to the examples that match the costrait L() have to be trasmitted to the sites. Usig this iformatio, each site ca compute the relevat couts that correspod to the attributes that are stored at the site. The hypothesis geeratio compoet uses the couts from all the sites to select the attribute to further split the ode correspodig to a ( ) v( a ( )). For each ode i the decisio tree T, each l site has to compute the relevat couts of examples that satisfy L() for the attributes stored at that site. The umber of subtuples stored at each site is D ad the umber of attributes at each site is bouded by the total umber of attributes A. I the case of serial distributed learig, time complexity is give by D A size( T ). This ca be further improved i the case of parallel l

14 distributed learig sice the various sites ca perform iformatio extractio i parallel. For each ode i the tree T, we eed to trasmit to each site, the set of idices for the examples that satisfy correspodig costrait L() ad get back the relevat couts for the attributes that are stored at that site. The umber of idices is bouded by D ad the umber of couts is bouded by M A V. Hece, the commuicatio complexity is give by ( D M A V ) size( T ). Agai, it is possible to further improve some of these bouds so that they deped o the height of the tree istead of the umber of odes i the tree usig techiques similar to those itroduced i (Shafer et al., 1996; Gehrke et al., 1999). Distributed versus Cetralized Learig Our approach to learig decisio trees from distributed data based o a decompositio of the learig task ito a distributed iformatio extractio compoet ad a hypothesis geeratio compoet sites provides a effective way to deal with scearios i which the sites provide oly statistical summaries of the data o demad ad prohibit access to raw data. Eve whe it is possible to access the raw data, the distributed algorithm compares favorably with the correspodig cetralized algorithm, which eeds access to the etire data set wheever its commuicatio cost is less tha the cost of collectig all of the data i a cetral locatio. It follows from the precedig aalysis that i the case of horizotally fragmeted data, the distributed algorithm has a advatage whe M V size( T ) D sice the cost of shippig the data is give by its actual size, which is give by D A. I the case of vertically fragmeted data, the correspodig coditios are give by size( T ) A sice the cost of shippig the data is give by its actual size, which has a lower boud of D A. These coditios are ofte met i the case of large, high-dimesioal data sets.

15 SUMMARY AND DISCUSSION Efficiet learig algorithms with provable performace guaratees for kowledge acquisitio from distributed data sets costitute a key elemet of ay attempt to traslate recet advaces i our ability to gather ad store large volumes of data ito a ability to effectively use the data to advace our uderstadig of the respective domais (e.g., biological scieces, atmospheric scieces) ad decisio support tools. I this paper, we have precisely formulated a class of distributed learig problems ad preseted a geeral strategy for trasformig a class of traditioal machie learig algorithms ito distributed learig algorithms. We have demostrated the applicatio of this strategy to devise itelliget agets for decisio tree iductio (usig a variety of splittig criteria) from distributed data. The resultig agets are based o algorithms that are provably exact i that the decisio tree costructed from distributed data is idetical to that obtaied by the correspodig algorithm whe it is used i the batch settig. This esures that the etire body of theoretical (e.g., sample complexity, error bouds) ad empirical results obtaied i the batch settig carry over to the distributed settig. The proposed distributed decisio tree iductio agets have bee implemeted as part of INDUS, a aget-based system for data-drive kowledge acquisitio from heterogeeous, distributed, ad autoomous data sources. I the proposed approach to learig from distributed data, the hypothesis geeratio compoet ca be viewed as the cotrol part of the learig process, which deploys the iformatio compoet as eeded. The boudary that defies the divisio of labor betwee distributed iformatio extractio ad hypothesis geeratio compoets depeds o the hypothesis class used for learig, ad the batch learig algorithm ad the particular decompositio used. At oe extreme,

16 if o iformatio extractio is performed, the hypothesis geeratio compoet eeds to access the raw data. A example of this sceario is provided by distributed istace based learig of k earest eighbor classifiers from a horizotally fragmeted data set. Here, the data set fragmets are simply stored at the differet sites. Classificatio of a ew istace is performed by the hypothesis geeratio compoet, which computes the k earest eighbors of the istace to be classified (based o some specified distace metric) by visitig the differet sites. The classificatio assiged to the istace is the same as the maority class amog the k earest eighbors of the istace. At the other extreme, if the iformatio extractio compoet does most of the work, the task of the hypothesis geeratio compoet becomes trivial. For example, cosider the fid-s algorithm for learig purely couctive cocepts, which, startig with the couctio of all literals successively elimiates the literals that lead to misclassificatio of positive examples (Mitchell, 1997). A straightforward adaptatio of this algorithm results i a provably exact distributed learig of couctive cocepts from horizotally fragmeted data sets. I this cotext, it is iterestig to explore the optimal divisio of labor betwee iformatio extractio ad hypothesis geeratio compoets for differet learig problems uder differet coditios. Distributed learig problem has begu to receive cosiderable attetio i recet years. However, may of the algorithms proposed i the literature (Davies ad Edwards, 1999; Domigos, 1997; Prodromidis et al., 2000) do ot guaratee geeralizatio accuracies that are provably close to those obtaiable i the cetralized settig. Typically, they deal with oly horizotally fragmeted data. Furthermore, several of them are motivated by the desire to scale up batch learig algorithms to work with large data sets by partitioig the data ad parallelizig the algorithm. I this case, the algorithm typically starts with the etire data set i a cetral locatio; the data set is the distributed

17 across multiple processors to take advatage of parallel processig. I cotrast, i the distributed sceario discussed i this paper, the algorithm may be prohibited from accessig the raw data; eve whe it is possible to access the raw data, it may be ifeasible to gather all of the data at a cetral locatio (because of the badwidth ad storage costs ivolved). A algorithm based o Fourier expasio of Boolea fuctios (Kargupta et al., 1999) deals with vertically distributed data sets. However, i its preset form, it is computatioally very expesive ad offers o prove guaratees of performace relative to the batch algorithm. Furthermore, sice a give set of coefficiets ca correspod to multiple decisio trees, it does ot yield a uique decisio tree from a give data set. I cotrast, the algorithms proposed i this paper guaratee provably exact learig from horizotally or vertically fragmeted distributed data sets. The algorithm proposed i (Bhatagar ad Sriivasa, 1997), is closely related to our algorithm for learig decisio trees from vertical fragmeted data usig etropy or iformatio gai as the splittig criterio. It provides a mechaism for obtaiig couts from implicit tuples i the absece of a uique idex for each tuple i the data set by simulatig the effect of oi operatio o the sites without eumeratig the tuples. I cotrast, our algorithms assume the existece of a uique idex, but are more geeral i other respects (ability to deal with both horizotal ad vertical fragmetatio, icorporatio of multiple splittig criteria). Our approach ca be modified usig a approach similar to that used i (Bhatagar ad Sriivasa, 1997) i the absece of uique idices. Work i progress is aimed at the elucidatio of the ecessary ad sufficiet coditios that guaratee the existece of exact or approximate distributed learig algorithms i terms of the properties of data ad hypothesis represetatios as well as iformatio extractio ad learig operators; characterizatio of iformatio requiremets for distributed learig uder various assumptios; ivestigatio of optimum divisio of labor betwee the iformatio gatherig ad

18 hypothesis geeratio compoets of the algorithm uder differet assumptios; desig of ew classes of theoretically well-fouded algorithms for distributed learig; itegratio of distributed learig algorithms with databases usig ew database operators for distributed iformatio extractio (e.g., operators for obtaiig couts proposed i (Graefe et al., 1998)); addressig the issues that arise i dealig with large databases whe the processig has to be doe uder sigificat memory or processig costraits (Gehrke et al., 1999); itegratio of machie learig with visualizatio for exploratory data aalysis; icorporatio of domai, ad possibly applicatiospecific otologies to bridge sytactic ad sematic mismatches across distributed data sets; ad applicatio of the resultig techiques to large-scale data-drive kowledge discovery tasks i applicatios such as computatioal biology ad itrusio detectio. REFERENCES Bhatagar, R., Sriivasa, S.(1997). Patter discovery i distributed databases. I proceedigs of AAAI 1997 coferece, Providece, RI. Breima, L., Friedma, J.H., Olshe, R.A., Stoe, C.J. (1984). Classificatio ad regressio trees. Wadsworth, Pacific Grove, CA. Bua, A. Lee, Y.S. (2001). Data Miig Criteria for Tree-Based Regressio ad Classificatio. I proceedigs of KDD 2001 coferece, Sa Fracisco, CA. Caragea, D., Silvescu, A., Hoavar, V. (2000). Toward a theoretical framework for aalysis ad sythesis of distributed ad parallel kowledge discovery. I proceedigs of KDD 2000 Workshop o DPKD, Bosto, MA. Caragea, D., Silvescu, A., & Hoavar, V. (2001). Aalysis ad Sythesis of Agets That Lear from Distributed Dyamic Data Sources. I: Emerget Neural Computatioal Architectures Based o Neurosciece 2001: Spriger. Davies, W., Edwards, P. (1999). Dagger: a ew approach to combiig multiple models leared from disoit subsets. I proceedigs of ICML 1999, Bled, Sloveia. Domigos, P. (1997). Kowledge acquisitio from examples via multiple models. I proceedigs of ICML 1997, Nashville, TN.

19 Gehrke, J., Gati, V., Ramakrisha, R. Loh, W.Y. (1999). Boat - optimistic decisio tree costructio. I proceedigs of SIGMOD 1999 coferece, Philadelphia, Pesylvaia. Graefe, G. Fayyad, U. Chaudhuri, S. (1998). O the efficiet gatherig of sufficiet statistics for classificatio from large sql databases. I proceedigs of the KDD 1998 coferece, Melo Park, CA, Hoavar, V., Miller, L. Wog, J.S. (1998). Distributed kowledge etworks. I proceedigs of the IEEE coferece o IT, Syracuse, NY, Jeigs, N., Wooldridge, M. (2001). Aget-orieted software egieerig. I Bradshaw, J. (Ed.), Hadbook of aget techology. AAAI/MIT Press, Kargupta, H., Park, B.H., Hershberger, D., Johso, E. (1999). Collective data miig: A ew perspective toward distributed data miig. I Kargupta, H. Cha, P. (Eds.), Advaces i distributed ad parallel kowledge discovery. MIT/AAAI Press, Mitchell, T.M. (1997). Machie learig. McGraw Hill. Prodromidis, A.L., Cha, P., Stolfo, S.J. (2000). Meta-learig i distributed data miig systems: Issues ad approaches. I Kargupta, H. Cha, P. (Eds.), Advaces i distributed ad parallel kowledge discovery. MIT/AAAI Press, Quila, R. (1986). Iductio of decisio trees. Machie Learig, 1: , Shafer, J.C., Agrawal, R., Mehta, M. Sprit: a scalable parallel classifier for data miig. I proceedigs of 22th iteratioal coferece o VLDB, Mumbai (Bombay), Idia. Morga Kaufma.

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation 6-0-0 Kowledge Trasformatio from Task Scearios to View-based Desig Diagrams Nima Dezhkam Kamra Sartipi {dezhka, sartipi}@mcmaster.ca Departmet of Computig ad Software McMaster Uiversity CANADA SEKE 08

More information

Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources

Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources Doina Caragea, Adrian Silvescu, and Vasant Honavar Artificial Intelligence Research Laboratory, Computer Science Department,

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University CSCI 5090/7090- Machie Learig Sprig 018 Mehdi Allahyari Georgia Souther Uiversity Clusterig (slides borrowed from Tom Mitchell, Maria Floria Balca, Ali Borji, Ke Che) 1 Clusterig, Iformal Goals Goal: Automatically

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Outline. Research Definition. Motivation. Foundation of Reverse Engineering. Dynamic Analysis and Design Pattern Detection in Java Programs

Outline. Research Definition. Motivation. Foundation of Reverse Engineering. Dynamic Analysis and Design Pattern Detection in Java Programs Dyamic Aalysis ad Desig Patter Detectio i Java Programs Outlie Lei Hu Kamra Sartipi {hul4, sartipi}@mcmasterca Departmet of Computig ad Software McMaster Uiversity Caada Motivatio Research Problem Defiitio

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Mining from Quantitative Data with Linguistic Minimum Supports and Confidences

Mining from Quantitative Data with Linguistic Minimum Supports and Confidences Miig from Quatitative Data with Liguistic Miimum Supports ad Cofideces Tzug-Pei Hog, Mig-Jer Chiag ad Shyue-Liag Wag Departmet of Electrical Egieerig Natioal Uiversity of Kaohsiug Kaohsiug, 8, Taiwa, R.O.C.

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

BOOLEAN MATHEMATICS: GENERAL THEORY

BOOLEAN MATHEMATICS: GENERAL THEORY CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees.

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees. Comp 135 Machie Learig Computer Sciece Tufts Uiversity Fall 2017 Roi Khardo Some of these slides were adapted from previous slides by Carla Brodley Our secod algorithm Let s look at a simple dataset for

More information

ISSN (Print) Research Article. *Corresponding author Nengfa Hu

ISSN (Print) Research Article. *Corresponding author Nengfa Hu Scholars Joural of Egieerig ad Techology (SJET) Sch. J. Eg. Tech., 2016; 4(5):249-253 Scholars Academic ad Scietific Publisher (A Iteratioal Publisher for Academic ad Scietific Resources) www.saspublisher.com

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 26 Ehaced Data Models: Itroductio to Active, Temporal, Spatial, Multimedia, ad Deductive Databases Copyright 2016 Ramez Elmasri ad Shamkat B.

More information

1 Enterprise Modeler

1 Enterprise Modeler 1 Eterprise Modeler Itroductio I BaaERP, a Busiess Cotrol Model ad a Eterprise Structure Model for multi-site cofiguratios are itroduced. Eterprise Structure Model Busiess Cotrol Models Busiess Fuctio

More information

What are Information Systems?

What are Information Systems? Iformatio Systems Cocepts What are Iformatio Systems? Roma Kotchakov Birkbeck, Uiversity of Lodo Based o Chapter 1 of Beett, McRobb ad Farmer: Object Orieted Systems Aalysis ad Desig Usig UML, (4th Editio),

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Mapping Publishing and Mapping Adaptation in the Middleware of Railway Information Grid System

Mapping Publishing and Mapping Adaptation in the Middleware of Railway Information Grid System Mappig Publishig ad Mappig Adaptatio i the Middleware of Railway Iformatio Grid ystem You Gamei, Liao Huamig, u Yuzhog Istitute of Computig Techology, Chiese Academy of cieces, Beijig 00080 gameiu@ict.ac.c

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

Octahedral Graph Scaling

Octahedral Graph Scaling Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM MATEC Web of Cofereces 79, 01014 (016) DOI: 10.1051/ mateccof/0167901014 T 016 BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM Staislav Shidlovskiy 1, 1 Natioal Research

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Goals of this Lecture Activity Diagram Example

Goals of this Lecture Activity Diagram Example Goals of this Lecture Activity Diagram Example Object-Orieted Aalysis ad Desig - Fall 998 Preset a example activity diagram Ð Relate to requiremets, use cases, ad class diagrams Also, respod to a questio

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

New Results on Energy of Graphs of Small Order

New Results on Energy of Graphs of Small Order Global Joural of Pure ad Applied Mathematics. ISSN 0973-1768 Volume 13, Number 7 (2017), pp. 2837-2848 Research Idia Publicatios http://www.ripublicatio.com New Results o Eergy of Graphs of Small Order

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

1 Graph Sparsfication

1 Graph Sparsfication CME 305: Discrete Mathematics ad Algorithms 1 Graph Sparsficatio I this sectio we discuss the approximatio of a graph G(V, E) by a sparse graph H(V, F ) o the same vertex set. I particular, we cosider

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

Quorum Based Data Replication in Grid Environment

Quorum Based Data Replication in Grid Environment Quorum Based Data Replicatio i Grid Eviromet Rohaya Latip, Hamidah Ibrahim, Mohamed Othma, Md Nasir Sulaima, ad Azizol Abdullah Faculty of Computer Sciece ad Iformatio Techology, Uiversiti Putra Malaysia

More information

Data Warehousing. Paper

Data Warehousing. Paper Data Warehousig Paper 28-25 Implemetig a fiacial balace scorecard o top of SAP R/3, usig CFO Visio as iterface. Ida Carapelle & Sophie De Baets, SOLID Parters, Brussels, Belgium (EUROPE) ABSTRACT Fiacial

More information

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c Advaces i Egieerig Research (AER), volume 131 3rd Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 2017) Pruig ad Summarizig the Discovered Time Series Associatio Rules

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

Combination Labelings Of Graphs

Combination Labelings Of Graphs Applied Mathematics E-Notes, (0), - c ISSN 0-0 Available free at mirror sites of http://wwwmaththuedutw/ame/ Combiatio Labeligs Of Graphs Pak Chig Li y Received February 0 Abstract Suppose G = (V; E) is

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Enhancements to basic decision tree induction, C4.5

Enhancements to basic decision tree induction, C4.5 Ehacemets to basic decisio tree iductio, C4.5 1 This is a decisio tree for credit risk assessmet It classifies all examples of the table correctly ID3 selects a property to test at the curret ode of the

More information

Goals of the Lecture UML Implementation Diagrams

Goals of the Lecture UML Implementation Diagrams Goals of the Lecture UML Implemetatio Diagrams Object-Orieted Aalysis ad Desig - Fall 1998 Preset UML Diagrams useful for implemetatio Provide examples Next Lecture Ð A variety of topics o mappig from

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

Reconciling Continuous Attribute Values from Multiple Data Sources

Reconciling Continuous Attribute Values from Multiple Data Sources Associatio for Iformatio Systems AIS Electroic Library (AISeL PACIS 2008 Proceedigs Pacific Asia Coferece o Iformatio Systems (PACIS July 2008 Recocilig Cotiuous Attribute Values from Multiple Data Sources

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised

More information

Counting the Number of Minimum Roman Dominating Functions of a Graph

Counting the Number of Minimum Roman Dominating Functions of a Graph Coutig the Number of Miimum Roma Domiatig Fuctios of a Graph SHI ZHENG ad KOH KHEE MENG, Natioal Uiversity of Sigapore We provide two algorithms coutig the umber of miimum Roma domiatig fuctios of a graph

More information

Dynamic Programming and Curve Fitting Based Road Boundary Detection

Dynamic Programming and Curve Fitting Based Road Boundary Detection Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Prevention of Black Hole Attack in Mobile Ad-hoc Networks using MN-ID Broadcasting

Prevention of Black Hole Attack in Mobile Ad-hoc Networks using MN-ID Broadcasting Vol.2, Issue.3, May-Jue 2012 pp-1017-1021 ISSN: 2249-6645 Prevetio of Black Hole Attack i Mobile Ad-hoc Networks usig MN-ID Broadcastig Atoy Devassy 1, K. Jayathi 2 *(PG scholar, ME commuicatio Systems,

More information

Designing a learning system

Designing a learning system CS 75 Machie Learig Lecture Desigig a learig system Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square, x-5 people.cs.pitt.edu/~milos/courses/cs75/ Admiistrivia No homework assigmet this week Please try

More information

Analysis of Documents Clustering Using Sampled Agglomerative Technique

Analysis of Documents Clustering Using Sampled Agglomerative Technique Aalysis of Documets Clusterig Usig Sampled Agglomerative Techique Omar H. Karam, Ahmed M. Hamad, ad Sheri M. Moussa Abstract I this paper a clusterig algorithm for documets is proposed that adapts a samplig-based

More information

On Nonblocking Folded-Clos Networks in Computer Communication Environments

On Nonblocking Folded-Clos Networks in Computer Communication Environments O Noblockig Folded-Clos Networks i Computer Commuicatio Eviromets Xi Yua Departmet of Computer Sciece, Florida State Uiversity, Tallahassee, FL 3306 xyua@cs.fsu.edu Abstract Folded-Clos etworks, also referred

More information

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued

More information

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network Available olie at www.sciecedirect.com Eergy Procedia 6 (202) 60 64 202 Iteratioal Coferece o Future Eergy, Eviromet, ad Materials Adaptive Resource Allocatio for Electric Evirometal Pollutio through the

More information

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods. Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig

More information

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis IOSR Joural of Egieerig Redudacy Allocatio for Series Parallel Systems with Multiple Costraits ad Sesitivity Aalysis S. V. Suresh Babu, D.Maheswar 2, G. Ragaath 3 Y.Viaya Kumar d G.Sakaraiah e (Mechaical

More information

A New Bit Wise Technique for 3-Partitioning Algorithm

A New Bit Wise Technique for 3-Partitioning Algorithm Special Issue of Iteratioal Joural of Computer Applicatios (0975 8887) o Optimizatio ad O-chip Commuicatio, No.1. Feb.2012, ww.ijcaolie.org A New Bit Wise Techique for 3-Partitioig Algorithm Rajumar Jai

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical

More information

c-dominating Sets for Families of Graphs

c-dominating Sets for Families of Graphs c-domiatig Sets for Families of Graphs Kelsie Syder Mathematics Uiversity of Mary Washigto April 6, 011 1 Abstract The topic of domiatio i graphs has a rich history, begiig with chess ethusiasts i the

More information

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP Nature-Ispired Computig Hadlig Costraits Dr. Şima Uyar September 2006 Itroductio may practical problems are costraied ot all combiatios of variable values represet valid solutios feasible solutios ifeasible

More information

Hashing Functions Performance in Packet Classification

Hashing Functions Performance in Packet Classification Hashig Fuctios Performace i Packet Classificatio Mahmood Ahmadi ad Stepha Wog Computer Egieerig Laboratory Faculty of Electrical Egieerig, Mathematics ad Computer Sciece Delft Uiversity of Techology {mahmadi,

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

Ontology-based Decision Support System with Analytic Hierarchy Process for Tour Package Selection

Ontology-based Decision Support System with Analytic Hierarchy Process for Tour Package Selection 2017 Asia-Pacific Egieerig ad Techology Coferece (APETC 2017) ISBN: 978-1-60595-443-1 Otology-based Decisio Support System with Aalytic Hierarchy Process for Tour Pacage Selectio Tie-We Sug, Chia-Jug Lee,

More information

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme Improvig Iformatio Retrieval System Security via a Optimal Maximal Codig Scheme Dogyag Log Departmet of Computer Sciece, City Uiversity of Hog Kog, 8 Tat Chee Aveue Kowloo, Hog Kog SAR, PRC dylog@cs.cityu.edu.hk

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Lecture 1: Introduction

Lecture 1: Introduction Lecture 1: Itroductio g Class orgaizatio Istructor cotact Course objectives ad outcomes Lectures outlie Laboratory outlie Gradig system Tetative schedule g Lab schedule g Itelliget sesor systems (ISS)

More information

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article Available olie www.jocpr.com Joural of Chemical ad Pharmaceutical Research, 2013, 5(12):745-749 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 K-meas algorithm i the optimal iitial cetroids based

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Y.K. Patil* Iteratioal Joural of Advaced Research i ISSN: 2278-6244 IT ad Egieerig Impact Factor: 4.54 HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Prof. V.S. Nadedkar** Abstract: Documet clusterig is

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Sprig 2017 A secod course i data miig http://www.it.uu.se/edu/course/homepage/ifoutv2/vt17/ Kjell Orsbor Uppsala Database Laboratory Departmet of Iformatio Techology, Uppsala Uiversity,

More information

Investigating methods for improving Bagged k-nn classifiers

Investigating methods for improving Bagged k-nn classifiers Ivestigatig methods for improvig Bagged k-nn classifiers Fuad M. Alkoot Telecommuicatio & Navigatio Istitute, P.A.A.E.T. P.O.Box 4575, Alsalmia, 22046 Kuwait Abstract- We experimet with baggig knn classifiers

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

Σ P(i) ( depth T (K i ) + 1),

Σ P(i) ( depth T (K i ) + 1), EECS 3101 York Uiversity Istructor: Ady Mirzaia DYNAMIC PROGRAMMING: OPIMAL SAIC BINARY SEARCH REES his lecture ote describes a applicatio of the dyamic programmig paradigm o computig the optimal static

More information

Reliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1

Reliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1 Reliable Trasmissio Sprig 2018 CS 438 Staff - Uiversity of Illiois 1 Reliable Trasmissio Hello! My computer s ame is Alice. Alice Bob Hello! Alice. Sprig 2018 CS 438 Staff - Uiversity of Illiois 2 Reliable

More information

Fuzzy Linear Regression Analysis

Fuzzy Linear Regression Analysis 12th IFAC Coferece o Programmable Devices ad Embedded Systems The Iteratioal Federatio of Automatic Cotrol September 25-27, 2013. Fuzzy Liear Regressio Aalysis Jaa Nowaková Miroslav Pokorý VŠB-Techical

More information

Computers and Scientific Thinking

Computers and Scientific Thinking Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput

More information

Neuro Fuzzy Model for Human Face Expression Recognition

Neuro Fuzzy Model for Human Face Expression Recognition IOSR Joural of Computer Egieerig (IOSRJCE) ISSN : 2278-0661 Volume 1, Issue 2 (May-Jue 2012), PP 01-06 Neuro Fuzzy Model for Huma Face Expressio Recogitio Mr. Mayur S. Burage 1, Prof. S. V. Dhopte 2 1

More information

Description of some supervised learning algorithms

Description of some supervised learning algorithms Descriptio of some supervised learig algorithms Patrick Keekayoro patrick.keekayoro@outlook.com Statistical Cybermetrics Research Group Uiversity of Wolverhampto 1. Supervised learig Supervised machie

More information

Τεχνολογία Λογισμικού

Τεχνολογία Λογισμικού ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Τεχνολογία Λογισμικού, 7ο/9ο εξάμηνο 2018-2019 Τεχνολογία Λογισμικού Ν.Παπασπύρου, Αν.Καθ. ΣΗΜΜΥ, ickie@softlab.tua,gr

More information

VALIDATING DIRECTIONAL EDGE-BASED IMAGE FEATURE REPRESENTATIONS IN FACE RECOGNITION BY SPATIAL CORRELATION-BASED CLUSTERING

VALIDATING DIRECTIONAL EDGE-BASED IMAGE FEATURE REPRESENTATIONS IN FACE RECOGNITION BY SPATIAL CORRELATION-BASED CLUSTERING VALIDATING DIRECTIONAL EDGE-BASED IMAGE FEATURE REPRESENTATIONS IN FACE RECOGNITION BY SPATIAL CORRELATION-BASED CLUSTERING Yasufumi Suzuki ad Tadashi Shibata Departmet of Frotier Iformatics, School of

More information

Symmetric Class 0 subgraphs of complete graphs

Symmetric Class 0 subgraphs of complete graphs DIMACS Techical Report 0-0 November 0 Symmetric Class 0 subgraphs of complete graphs Vi de Silva Departmet of Mathematics Pomoa College Claremot, CA, USA Chaig Verbec, Jr. Becer Friedma Istitute Booth

More information

Clustering and Classifying Diabetic Data Sets Using K-Means Algorithm

Clustering and Classifying Diabetic Data Sets Using K-Means Algorithm Article ca be accessed olie at http://www.publishigidia.com Clusterig ad Classifyig Diabetic Data Sets Usig K-Meas Algorithm M. Kothaiayaki*, P. Thagaraj** Abstract The k-meas algorithm is well kow for

More information

New Fuzzy Color Clustering Algorithm Based on hsl Similarity

New Fuzzy Color Clustering Algorithm Based on hsl Similarity IFSA-EUSFLAT 009 New Fuzzy Color Clusterig Algorithm Based o hsl Similarity Vasile Ptracu Departmet of Iformatics Techology Tarom Compay Bucharest Romaia Email: patrascu.v@gmail.com Abstract I this paper

More information

Assignment Problems with fuzzy costs using Ones Assignment Method

Assignment Problems with fuzzy costs using Ones Assignment Method IOSR Joural of Mathematics (IOSR-JM) e-issn: 8-8, p-issn: 9-6. Volume, Issue Ver. V (Sep. - Oct.06), PP 8-89 www.iosrjourals.org Assigmet Problems with fuzzy costs usig Oes Assigmet Method S.Vimala, S.Krisha

More information

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c Iteratioal Coferece o Computatioal Sciece ad Egieerig (ICCSE 015) Harris Corer Detectio Algorithm at Sub-pixel Level ad Its Applicatio Yuafeg Ha a, Peijiag Che b * ad Tia Meg c School of Automobile, Liyi

More information

1&1 Next Level Hosting

1&1 Next Level Hosting 1&1 Next Level Hostig Performace Level: Performace that grows with your requiremets Copyright 1&1 Iteret SE 2017 1ad1.com 2 1&1 NEXT LEVEL HOSTING 3 Fast page loadig ad short respose times play importat

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information